Titel: Using machine learning for semi-automatic expansion of the Historical Thesaurus of the Oxford English Dictionary
Personen:McCracken, James
Jahr: 2015
Typ: Aufsatz
Verlag: Trojina, Institute for Applied Slovene Studies/ Lexical Computing Ltd.
Ortsangabe: Ljubljana/ Brighton
In: Kosem, Iztok/Jakubíček, Miloš/Kallas, Jelena/Krek, Simon (Hgg.): Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11 - 13 August 2015, Herstmonceux Castle, United Kingdom
Seiten: 211-235
Untersuchte Sprachen: Englisch*English
Schlagwörter: Datenbank*data base
Datenmodellierung*data modelling
historische Lexikografie*historical lexicography
lexikografischer Prozess*lexicographic process
semantische Relationen im Wörterbuch*semantic/sense relations in dictionaries
Medium: Online
URI: https://elex.link/elex2015/conference-proceedings/
Zuletzt besucht: 22.10.2018
Abstract: The Historical Thesaurus of the Oxford English Dictionary (HTOED) provides a highly granular taxonomic classification of the contents of the OED. However, HTOED was based largely on the first edition of the OED (plus supplements), and has not been updated to include content added more recently, or changed content emerging from third-edition revisions. This means that 32% of lexical items in the current OED data set are unclassified. We use the existing HTOED classifications as training data to classify this 'missing' content. The classification system works as a two-stage process. Firstly, for a given input sense, a Bayesian classifier identifies the general topic (high-level thesaurus branch) to which the sense belongs; secondly, a battery of similarity measures identifies possible target nodes within this branch. The system looks for consensus or proximity among the outputs of these methods, in order to pinpoint the optimal node(s) to which the sense should be assigned. The system is currently able to classify 25% of input senses to the correct node, and a further 40% of input senses to the right neighbourhood (a parent, child, or sibling of the correct node). A web-based UI facilitates the manual checking, approval, and adjustment of proposed classifications.