Titel: A speech corpus as a source of lexical information
Personen:Verdonik, Darinka/Sepesy Maučec, Mirjam
Jahr: 2017
Typ: Aufsatz
Periodikum: International Journal of Lexicography
Seiten: 143-166
Band: 30
Heft: 2
Untersuchte Sprachen: Slowenisch*Slovenian
Schlagwörter: Frequenz*frequency
korpusbasierte Lexikografie*corpus-based lexicography
Lemmatisierung*lemmatisation
Abstract: This paper presents an investigation of what is gained in the process of dictionary creation by using a speech reference corpus of one million words in conjunction with a huge written reference corpus. It also analyses how much additional effort this requires. Collecting spoken data takes a great deal of effort, and existing speech corpora are rather insignificant in size compared to written corpora, which represent the main and often only source of lexicographic information. However, it is clear that the use of spoken and written language differs with regard to lexical patterns and collocations, and that written corpora, irrespective of their size and structure, cannot provide sufficient data for the description of lexical features in spoken language use. The results demonstrate that even a small speech corpus of one million words provides additional information about the most common, the most general and the most typical spoken usages.