Titel: Mining a parallel corpus for automatic generation of Estonian grammar exercises
Personen:Chalvin, Antoine/Eensoo, Egle/Stuck, François
Jahr: 2013
Typ: Aufsatz
Verlag: Trojina, Institute for Applied Slovene Studies/Eesti Keele Instituut
Ortsangabe: Ljubljana/Tallinn
In: Kosem, Iztok/Kallas, Jelena/Gantar, Polona/Krek, Simon/Langemets, Margit/Tuulik, Maria (Hgg.): Electronic lexicography in the 21st century: thinking outside the paper. Proceedings of the eLex 2013 conference, 17 - 19 October 2013, Tallinn, Estonia
Seiten: 280-295
Untersuchte Sprachen: Estnisch*Estonian - Französisch*French
Schlagwörter: Didaktische Nutzung*educational purposes
Frequenz*frequency
Grammatik im Wörterbuch*grammar in dictionaries
Internet-Lexikografie/Online-Lexikografie*internet lexicography/online lexicography
korpusbasierte Lexikografie*corpus-based lexicography
Lernerlexikografie*learner's lexicography
zweisprachige bzw. mehrsprachige Lexikografie*bilingual/multilingual lexicography
URI: http://eki.ee/elex2013/conf-proceedings/
Zuletzt besucht: 17.09.2018
Abstract: The aim of our research is to develop a system to generate Estonian grammar exercises for French-speaking learners, based on a large lemmatised parallel corpus (http://corpus.estfra.ee) and on the data of the Comprehensive French–Estonian Dictionary (http://www.estfra.ee). We concentrate on exercises on nominal and verbal morphology. Although the corpus is not syntactically tagged, we also explore the possibilities of generating some types of syntax exercises. The system generates on demand exercises consisting of a specified number of Estonian sentences, in which relevant word forms are replaced by their lemmas. The learner has to construct the right form and can check his or her answers. Sentences are accompanied by their French translation. In this article, we concentrate on the problems related to the definition and tuning of sentence selection criteria. Exercises can be generated at three levels of difficulty. Relevant sentences are picked up in the corpus according to their length and the "frequency" of the lemmas they contain, i.e. the presence of the lemmas in one of the four subsets of headwords specified in the data of the dictionary: basic vocabulary (4000 words), small dictionary (10 000 words), lower-medium dictionary (15 000 words), and upper-medium dictionary (40 000 words).