Titel: Lexical Profiling for Arabic
Personen:Attia, Mohammed/Pecina, Pavel/Tounsi, Lamia/Toral, Antonio/van Genabith, Josef
Jahr: 2011
Typ: Aufsatz
Verlag: Trojina, Institute for Applied Slovene Studies/ Lexical Computing Ltd.
Ortsangabe: Ljubljana/ Brighton
In: Kosem, Iztok/Kosem, Karmen (Hgg.): Electronic lexicography in the 21st Century: New Applications for New Users. Proceedings of eLex2011, Bled, Slowenien, 10 - 12 November 2011
Seiten: 23-33
Untersuchte Sprachen: Arabisch*Arabic
Schlagwörter: Datenbank*data base
Grammatik im Wörterbuch*grammar in dictionaries
korpusbasierte Lexikografie*corpus-based lexicography
Lemmatisierung*lemmatisation
URI: http://elex2011.trojina.si/Vsebine/proceedings.html
Zuletzt besucht: 10.09.2018
Abstract: We provide lexical profiling for Arabic by covering two important linguistic aspects of Arabic lexical information, namely morphological inflectional paradigms and syntactic subcategorization frames, making our database a rich repository of Arabic lexicographic details. First, we provide a complete description of the inflectional behaviour of Arabic lemmas based on statistical distribution. We use a corpus of 1,089,111,204 words, a pre-annotation tool, knowledge-based rules, and machine learning techniques to automatically acquire lexical knowledge about words' morpho-syntactic attributes and inflection possibilities. Second, we automatically extract the Arabic subcategorization frames (or predicate-argument structures) from the Penn Arabic Treebank (ATB) for a large number of Arabic lemmas, including verbs, nouns and adjectives. We compare the results against a manually constructed collection of subcategorization frames designed for an Arabic LFG parser. The comparison results show that we achieve high precision scores for the three word classes. Both morphological and syntactic specifications are combined and connected in a scalable and interoperable lexical database suitable for constructing a morphological analyser, aiding a syntactic parser, or even building an Arabic dictionary. We build a web application, AraComLex (Arabic Computer Lexicon), available at: http://www.cngl.ie/aracomlex, for managing and maintaining the standardized and scalable lexical database