Titel: |
A speech corpus as a source of lexical information |
Personen: | Verdonik, Darinka/Sepesy Maučec, Mirjam |
Jahr: |
2017 |
Typ: |
Aufsatz |
Periodikum: |
International Journal of Lexicography |
Seiten: |
143-166 |
Band: |
30 |
Heft: |
2 |
Untersuchte Sprachen: |
Slowenisch*Slovenian |
Schlagwörter: |
Frequenz*frequency
korpusbasierte Lexikografie*corpus-based lexicography
Lemmatisierung*lemmatisation
|
Abstract: |
This paper presents an investigation of what is gained in the process of dictionary creation by using a speech reference corpus of one million words in conjunction with a huge written reference corpus. It also analyses how much additional effort this requires. Collecting spoken data takes a great deal of effort, and existing speech corpora are rather insignificant in size compared to written corpora, which represent the main and often only source of lexicographic information. However, it is clear that the use of spoken and written language differs with regard to lexical patterns and collocations, and that written corpora, irrespective of their size and structure, cannot provide sufficient data for the description of lexical features in spoken language use. The results demonstrate that even a small speech corpus of one million words provides additional information about the most common, the most general and the most typical spoken usages. |