Titel: Candidate Ranking for Maintenance of an Online Dictionary
Personen:Broad, Claire/Langone, Helen/Brizan, David G.
Jahr: 2018
Typ: Aufsatz
Verlag: European Language Resources Association (ELRA)
Ortsangabe: Miyazaki
In: Calzolari, Nicoletta/Choukri, Khalid/Cieri, Christopher/Declerck, Thierry/Goggi, Sara/Hasida, Koiti/Isahara, Hitoshi/Maegaard, Bente/Mariani, Joseph/Mazo, Hélène/Moreno, Asunción/Odijk, Jan/Piperidis, Stelios/Tokunaga, Takenobu (Hgg.): Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, 7 - 12 May 2018
Seiten: 839-843
Untersuchte Sprachen: Englisch*English
Schlagwörter: automatische Sprachverarbeitung*automatic speech processing
Neologismen*neologisms
Orthografie im Wörterbuch*orthography/spelling information in dictionaries
Medium: Online
URI: http://www.lrec-conf.org/proceedings/lrec2018/papers.html
Zuletzt besucht: 19.11.2018
Abstract: Traditionally, the process whereby a lexicographer identifies a lexical item to add to a dictionary -- a database of lexical items -- has been time-consuming and subjective. In the modern age of online dictionaries, all queries for lexical entries not currently in the database are indistinguishable from a larger list of misspellings, meaning that potential new or trending entries can get lost easily. In this project, we develop a system that uses machine learning techniques to assign these "misspells" a probability of being a novel or missing entry, incorporating signals from orthography, usage by trusted online sources, and dictionary query patterns.