Titel: |
Error Correction for Arabic Dictionary Lookup |
Personen: | Rytting, C. Anton/Rodrigues, Paul/Buckwalter, Tim/Zajic, David/Hirsch, Bridget/Carnes, Jeff/Lynn, Nathanael/Wayland, Sarah/Taylor, Chris/White, Jason/Blake III, Charles/Browne, Evelyn/Miller, Corey/Purvis, Tristan |
Jahr: |
2010 |
Typ: |
Aufsatz |
Verlag: |
European Language Resources Association (ELRA) |
Ortsangabe: |
Valletta, Malta |
In: |
Barbu Mititelu, Verginica/Pekar, Viktor/Barbu, Eduard (Hgg.): Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valetta, 17 - 23 May 2010 |
Seiten: |
263-268 |
Untersuchte Sprachen: |
Arabisch*Arabic |
Schlagwörter: |
Benutzungsforschung*usage research
Fremdspracherwerb*foreign/second language acquisition
Lernerlexikografie*learner's lexicography
Orthografie im Wörterbuch*orthography/spelling information in dictionaries
|
URI: |
http://www.lrec-conf.org/proceedings/lrec2010/pdf/440_Paper.pdf |
Zuletzt besucht: |
10.09.2018 |
Abstract: |
We describe a new Arabic spelling correction system which is intended for use with electronic dictionary search by learners of Arabic. Unlike other spelling correction systems, this system does not depend on a corpus of attested student errors but on student- and teacher-generated ratings of confusable pairs of phonemes or letters. Separate error modules for keyboard mistypings, phonetic confusions, and dialectal confusions are combined to create a weighted finite-state transducer that calculates the likelihood that an input string could correspond to each citation form in a dictionary of Iraqi Arabic. Results are ranked by the estimated likelihood that a citation form could be misheard, mistyped, or mistranscribed for the input given by the user. To evaluate the system, we developed a noisy-channel model trained on students' speech errors and use it to perturb citation forms from a dictionary. We compare our system to a baseline based on Levenshtein distance and find that, when evaluated on single-error queries, our system performs 28% better than the baseline (overall MRR) and is twice as good at returning the correct dictionary form as the top-ranked result. We believe this to be the first spelling correction system designed for a spoken, colloquial dialect of Arabic. |