Titel:	Large web corpora for Indian languages
Personen:	Kilgarriff, Adam/Duvuru, Girish
Jahr:	2011
Typ:	Aufsatz
Verlag:	Springer
Ortsangabe:	Heidelberg/Berlin
In:	Singh, Chandan/Singh Lehal, Gurpreet/Sengupta, Jyotsna/Veer Sharma, Dharam/Goyal, Vishal (Hgg.): Information Systems for Indian Languages: Proceedings of the International Conference, ICISIL 2011, Patiala, India, 9 - 11 March 2011
Untersuchte Sprachen:	EnglischEnglish - Indische SprachenIndian Languages
Schlagwörter:	Frequenzfrequency Internet-Lexikografie/Online-Lexikografieinternet lexicography/online lexicography Kollokationen/Phraseologismen/Wortverbindungencollocations/phraseologisms/multi word items korpusbasierte Lexikografiecorpus-based lexicography
URI:	https://www.sketchengine.eu/wp-content/uploads/Large_Web_Corpora_2011.pdf
Zuletzt besucht:	10.09.2018
Abstract:	For many languages there are no large, general-language corpora available. Until the web, all but the richest institutions could do little but shake their heads in dismay as corpus-building was long, slow and expensive. But with the advent of the Web it can be highly automated and thereby fast and inexpensive. In this demo we describe the 'corpus factory' method we use for collecting large web corpora for Indian and other languages. We have recently collected corpora for Hindi, Telugu, Kannada, Urdu, Gujarati, Tamil, Malayalam and Bengali. We also describe the Sketch Engine, a corpus tool that offer lots of language analysis function, and CQL, the advanced query language used by this system.