Titel: Automation of lexicographic work: an opportunity for both lexicographers and crowd-sourcing
Personen:Kosem, Iztok/Gantar, Polona/Krek, Simon
Jahr: 2013
Typ: Aufsatz
Verlag: Trojina, Institute for Applied Slovene Studies/Eesti Keele Instituut
Ortsangabe: Ljubljana/Tallinn
In: Kosem, Iztok/Kallas, Jelena/Gantar, Polona/Krek, Simon/Langemets, Margit/Tuulik, Maria (Hgg.): Electronic lexicography in the 21st century: thinking outside the paper. Proceedings of the eLex 2013 conference, 17 - 19 October 2013, Tallinn, Estonia
Seiten: 32-48
Untersuchte Sprachen: Slowenisch*Slovenian
Schlagwörter: Datenbank*data base
Datenmodellierung*data modelling
Grammatik im Wörterbuch*grammar in dictionaries
korpusbasierte Lexikografie*corpus-based lexicography
Redaktionssystem*lexicographic editor
URI: http://eki.ee/elex2013/conf-proceedings/
Zuletzt besucht: 17.09.2018
Abstract: A new approach to lexicographic work, in which the lexicographer is seen more as a validator of the choices made by computer, was recently envisaged by Rundell and Kilgarriff (2011). In this paper, we describe an experiment using such an approach during the creation of the Slovene Lexical Database (Gantar & Krek, 2011). The corpus data, i.e. grammatical relations, collocations, examples, and grammatical labels, were automatically extracted from the 1.18-billion-word Gigafida corpus of Slovene. An evaluation of the extracted data consisted of making a comparison between a manual entry and a (semi)-automatic entry, and identifying potential improvements in the extraction algorithm and in the presentation of data. An important finding was that the automatic approach was far more effective than the manual approach, without any significant loss of information. Based on our experience, we would propose a slightly revised version of the approach envisaged by Rundell and Kilgarriff in which the validation of data is left to lower-level linguists or crowd-sourcing, whereas high-level tasks such as meaning description remain the domain of lexicographers. Such an approach indeed reduces the scope of lexicographers' work; however, it also results in the ability of making content available to the users more quickly.