Titel: RIDIRE. Corpus and Tools for the Acquisition of Italian L2
Personen:Panunzi, Alessandro/Cresti, Emanuela/Gregori, Lorenzo
Jahr: 2014
Typ: Aufsatz
Verlag: Institute for Specialised Communication and Multilingualism
Ortsangabe: Bolzano/Bozen
In: Abel, Andrea/Vettori, Chiara/Ralli, Natascia: Proceedings of the 16th EURALEX International Congress: The User in Focus, Bolzano/Bozen, Italien 15 - 19 July 2014
Seiten: 447-462
Untersuchte Sprachen: Italienisch*Italian
Schlagwörter: Datenmodellierung*data modelling
Fremdspracherwerb*foreign/second language acquisition
Informationssystem*information system
Internet-Lexikografie/Online-Lexikografie*internet lexicography/online lexicography
Kollokationen/Phraseologismen/Wortverbindungen*collocations/phraseologisms/multi word items
korpusbasierte Lexikografie*corpus-based lexicography
Lexikographische Anwendungen/Applikationen*lexicographic tools/applications
Medium: Online
URI: http://euralex.org/category/publications/euralex-2014/
Zuletzt besucht: 22.10.2018
Abstract: This paper introduces the RIDIRE corpus, built by means of an open source tool (RIDIRE-CPI) for creating specifically designed web corpora through a targeted crawling strategy. The RIDIRE-CPI architecture combines existing open source tools with specifically developed modules, comprising a robust crawler, a user friendly web interface, several conversion and cleaning tools, an anti-duplicate filter, a language guesser, and a PoS-tagger. The RIDIRE corpus is a balanced Italian web corpus (1.5 billion tokens) designed for enhancing the study of Italian as a second language, while also being exploitable for lexicographic purposes. The targeted crawling was performed through content selection, metadata assignment, and validation procedures. These features allowed the construction of a large corpus with a specific design, covering a variety of language usage domains (News, Business, Administration and Legislation, Literature, Fiction, Design, Cookery, Sport, Tourism, Religion, Fine Arts, Cinema, Music). The RIDIRE query system allows research to be carried out on the whole corpus itself or on the sub-corpora. Specifically, available queries comprehend all the functions usually exploited in corpus-based lexicography: frequency lists, concordances and patterns, collocations, Sketches, and Sketch Differences.