Titel: Multilingual Projections
Personen:Bhattacharyya, Pushpak
Jahr: 2015
Typ: Aufsatz
Verlag: Springer
Ortsangabe: Heidelberg/Berlin
In: Gala, Núria/Rapp, Reinhard/Bel-Enguix, Gemma (Hgg.): Language production, cognition, and the lexicon
Seiten: 175-200
Untersuchte Sprachen: Hindi*Hindi - Indische Sprachen*Indian Languages - Marathi*Marathi
Schlagwörter: Bedeutungserläuterung/Definition*paraphrase/definition
Disambiguierung*disambiguation
korpusbasierte Lexikografie*corpus-based lexicography
zweisprachige bzw. mehrsprachige Lexikografie*bilingual/multilingual lexicography
Abstract: Languages of the world, though different, share structures and vocabulary. Today's NLP depends crucially on annotation which, however, is costly, needing expertise, money and time. Most languages in the world fall far behind English, when it comes to annotated resources. Since annotation is costly, there has been worldwide effort at leveraging multilinguality in developement and use of annotated corpora. The key idea is to project and utilize annotation from one language to another. This means parameters learnt from the annotated corpus of one language is made use of in the NLP of another language. We illustrate multilingual projection through the case study of word sense disambiguation (WSD) whose goal is to obtain the correct meaning of a word in the context. The correct meaning is usually denoted by an appropriate sense id from a sense repository, usually the wordnet. In this paper we show how two languages can help each other in their WSD, even when neither language has any sense marked corpus. The two specific languages chosen are Hindi and Marathi. The sense repository is the IndoWordnet which is a linked structure of wordnets of 19 major Indian languages from Indo-Aryan, Dravidian and Sino-Tibetan families. These wordnets have been created by following the expansion approach from Hindi wordnet. The WSD algorithm is reminiscent of expectation maximization. The sense distribution of either language is estimated through the mediation of the sense distribution of the other language in an iterative fashion. The WSD accuracy arrived at is better than any state of the art accuracy of all words general purpose unsupervised WSD.