Titel: Automatic generation of the Estonian Collocations Dictionary database
Personen:Kallas, Jelena/Kilgarriff, Adam/Koppel, Kristina/Kudritski, Elgar/Langemets, Margit/Michelfeit, Jan/Tuulik, Maria/Viks, Ülle
Jahr: 2015
Typ: Aufsatz
Verlag: Trojina, Institute for Applied Slovene Studies/ Lexical Computing Ltd.
Ortsangabe: Ljubljana/ Brighton
In: Kosem, Iztok/Jakubíček, Miloš/Kallas, Jelena/Krek, Simon (Hgg.): Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11 - 13 August 2015, Herstmonceux Castle, United Kingdom
Seiten: 1-20
Untersuchte Sprachen: Estnisch*Estonian
Schlagwörter: Beispiel*example
Datenbank*data base
einsprachige Lexikografie*monolingual lexicography
Kollokationen/Phraseologismen/Wortverbindungen*collocations/phraseologisms/multi word items
korpusbasierte Lexikografie*corpus-based lexicography
Lernerlexikografie*learner's lexicography
XML/SGML*XML/SGML
URI: https://elex.link/elex2015/conference-proceedings/
Zuletzt besucht: 22.10.2018
Abstract: This paper reports on the process of the automatic generation of the Estonian Collocations Dictionary (ECD) database. The database has been compiled by the Institute of the Estonian Language in collaboration with Lexical Computing Ltd. The ECD is a monolingual online scholarly dictionary aimed at learners of Estonian as a foreign or second language at the upper intermediate and advanced levels. The dictionary contains about 10,000 headwords, including single and multi-word lexical items. The collocates within each headword are grouped according to the lexico-grammatical structure formed by the collocational phrase, and for collocations example sentences are provided. For the automatic generation of the ECD database, the corpus query system Sketch Engine (Kilgarriff et al., 2004) functions Word List, Word Sketch and Good Dictionary Example (GDEX) were used. The data were automatically extracted in an XML format from the 463-million-word Estonian National Corpus and imported into the XML-based EELex dictionary writing system. To make the importing of automatically extracted data from Sketch Engine into EELex possible, the XML structure for extracted data was matched with the XML structure of ECD in EELex. The ECD project started in 2014 and the dictionary is scheduled to be published in 2018.