Titel: Comparable Corpora BootCaT
Personen:Kilgarriff, Adam/P.V.S, Avinesh/Pomikálek, Jan
Jahr: 2011
Typ: Aufsatz
Verlag: Trojina, Institute for Applied Slovene Studies/ Lexical Computing Ltd.
Ortsangabe: Ljubljana/ Brighton
In: Kosem, Iztok/Kosem, Karmen (Hgg.): Electronic lexicography in the 21st Century: New Applications for New Users. Proceedings of eLex2011, Bled, Slowenien, 10 - 12 November 2011
Seiten: 122-128
Untersuchte Sprachen: Verschiedene*various
Schlagwörter: Datenmodellierung*data modelling
korpusbasierte Lexikografie*corpus-based lexicography
Übersetzung*translation
URI: http://elex2011.trojina.si/Vsebine/proceedings.html
Zuletzt besucht: 10.09.2018
Abstract: The BootCaT method (Baroni and Bernardini, 2004) has proved a fast, effective and versatile approach to corpus building. The method has been applied to small specialist corpora for finding terminology and translations (as originally envisaged by Baroni and Bernardini), and to large, general corpora, for large numbers of languages. First we review BootCaT, and present some figures for the sizes of corpora that can be built in a few minutes, on various parameter-settings. To date BootCaT has not been applied multilingually. We explore this by building matching corpora for different languages from matching seeds. We consider three ways of obtaining matching seeds: manual translation, automatic translation, and by finding keywords from corresponding Wikipedia articles. In one experiment, we present a bilingual word sketch based on seed-translation by Google Translate. In another, seeds are from Wikipedia, and we evaluate the corpora by seeing, firstly, how many domain terms they deliver, and secondly, by seeing how often the terms in the one language are translation equivalents of the terms in the other.