Titel: DutchSemCor: Building a semantically annotated corpus for Dutch
Personen:Vossen, Piek /Görög, Attila/Laan, Fons/van Gompel, Maarten/Izquierdo, Rubén/van den Bosch, Antal
Jahr: 2011
Typ: Aufsatz
Verlag: Trojina, Institute for Applied Slovene Studies/ Lexical Computing Ltd.
Ortsangabe: Ljubljana/ Brighton
In: Kosem, Iztok/Kosem, Karmen (Hgg.): Electronic lexicography in the 21st Century: New Applications for New Users. Proceedings of eLex2011, Bled, Slowenien, 10 - 12 November 2011
Seiten: 286-296
Untersuchte Sprachen: Niederländisch*Dutch
Schlagwörter: Datenbank*data base
Disambiguierung*disambiguation
Einzelbedeutung/Lesart*sense
korpusbasierte Lexikografie*corpus-based lexicography
URI: http://elex2011.trojina.si/Vsebine/proceedings.html
Zuletzt besucht: 10.09.2018
Abstract: State of the art Word Sense Disambiguation (WSD) systems require large sense-tagged corpora along with lexical databases to reach satisfactory results. The number of English language resources for developed WSD increased in the past years, while most other languages are still under-resourced. The situation is no different for Dutch. In order to overcome this data bottleneck, the DutchSemCor project will deliver a Dutch corpus that is sense-tagged with senses from the Cornetto lexical database. Part of this corpus (circa 300K examples) is manually tagged. The remainder is automatically tagged using different WSD systems and validated by human annotators. The project uses existing corpora compiled in other projects; these are extended with Internet examples for word senses that are less frequent and do not (sufficiently) appear in the corpora. We report on the status of the project and the evaluations of the WSD systems with the current training data.