Titel: Automatic Extraction of Lexical Patterns from Corpora
Personen:Renau, Irene/Nazar, Rogelio
Jahr: 2016
Typ: Aufsatz
Verlag: Ivane Javakhishvili Tbilisi State University
Ortsangabe: Tbilisi
In: Margalitadze, Tinatin/Meladze, George (Hgg.): Proceedings of the 17th EURALEX International Congress: Lexicography and Linguistic Diversity. Tbilisi, Georgia 6 - 10 September 2016
Seiten: 823-830
Untersuchte Sprachen: Englisch*English - Spanisch*Spanish
Schlagwörter: automatische Sprachverarbeitung*automatic speech processing
Kollokationen/Phraseologismen/Wortverbindungen*collocations/phraseologisms/multi word items
korpusbasierte Lexikografie*corpus-based lexicography
Redaktionssystem*lexicographic editor
Medium: Online
URI: http://euralex.org/category/publications/euralex-2016/
Zuletzt besucht: 22.10.2018
Abstract: We present our first attempt to extract lexical patterns using corpus statistics. A pattern is a structure that combines syntactic and semantic features and is linked to a conventional meaning of a word. This means, for example, that the verb to die does not have intrinsic meanings, but potential meanings which are activated by the context: in 'His mother died when he was five', the meaning of the verb differs from 'His mother is dying to meet you', due to collocational restrictions and syntactic differences. With the automatic analysis of thousands of concordances per verb, we can make a first approach to the problem of detecting these structures in corpora, a very time-consuming task for lexicographers. The average precision is around 50%. The next step to increase precision is adding a dependency parser to the system and make adjustments to the automatic taxonomy we have created for semantic labeling.