Titel: Automatisierte Rohdatengewinnung für die Lexikographie
Personen:Quasthoff, Uwe
Jahr: 2010
Typ: Aufsatz
Periodikum: Lexicographica. Internationales Jahrbuch für Lexikographie. International annual for lexicography. Revue internationale de lexicographie
Seiten: 47-64
Band: 26
Untersuchte Sprachen: Deutsch*German
Schlagwörter: automatische Sprachverarbeitung*automatic speech processing
Frequenz*frequency
Kookkurrenzanalyse*collocation analysis
korpusbasierte Lexikografie*corpus-based lexicography
Abstract: Large corpora are of increasing interest for lexicography. If a large corpus is to be used for several lexicography projects, quality is crucial. The corpus pre-processing pipeline as used in the corpora project "Deutscher Wortschatz" is discussed in detail. The resulting full-form dictionary also contains statistical information like word frequencies and word co-occurrences. Present and forthcoming usage scenarios for manual and automatic look-up are presented. Having different corpora for different text genres or different time spans, a joint lookup of these corpora will show variations in word usage. From the lexicographer's point of view, the statistical data can be used to provide raw data for several kinds of dictionaries, including thesauri, collocation dictionaries, phraseology and, of course, frequency dictionaries.