Titel: Spiralling towards perfection: an incremental approach for mutual lexicon-tagger improvement
Personen:Mörth, Karlheinz/Procházka, Stephan/Siam, Omar/Declerck, Thierry
Jahr: 2013
Typ: Aufsatz
Verlag: Trojina, Institute for Applied Slovene Studies/Eesti Keele Instituut
Ortsangabe: Ljubljana/Tallinn
In: Kosem, Iztok/Kallas, Jelena/Gantar, Polona/Krek, Simon/Langemets, Margit/Tuulik, Maria (Hgg.): Electronic lexicography in the 21st century: thinking outside the paper. Proceedings of the eLex 2013 conference, 17 - 19 October 2013, Tallinn, Estonia
Seiten: 225-242
Untersuchte Sprachen: Arabisch*Arabic
Schlagwörter: Datenmodellierung*data modelling
Informationssystem*information system
Internet-Lexikografie/Online-Lexikografie*internet lexicography/online lexicography
korpusbasierte Lexikografie*corpus-based lexicography
Nutzerbeteiligung*user contribution
URI: http://eki.ee/elex2013/conf-proceedings/
Zuletzt besucht: 17.09.2018
Abstract: Our paper describes an experiment in which four different digital language resources are used to incrementally create added value in one another. The resources are a digital dictionary, a morphological analyser, a tagger and a digital corpus. We will show how the dictionary is used to improve the tagger, how the tagger is used to annotate a collaboratively produced digital text collection, i.e. the Egyptian language Wikipedia, thus improving easily available open data, and lastly how the results of the annotation process are, in turn, utilised to enhance and improve the dictionary. The paper touches on several issues related to the particular tasks involved in the process: we discuss problems of dealing with data retrieved from the Internet, we give details on the lemmatisation, the creation of word-class information and the generation of frequency data from the corpus, and we touch on issues of dictionary creation and aspects of the dictionary-corpus-interface. A final topic is the standards for the representation of statistical information in the digital dictionary.