Titel: Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia's Edit History
Personen:Ferschke, Oliver/Zesch, Torsten/Gurevych, Iryna
Jahr: 2011
Typ: Aufsatz
Verlag: Association for Computational Linguistics
Ortsangabe: Portland, Oregon, USA
In: Kurohashi, Sadao (Hg.): Proceedings of System Demonstrations: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), Portland, Oregon, USA, 21 June 2011
Seiten: 97-102
Untersuchte Sprachen: Englisch*English
Schlagwörter: Datenmodellierung*data modelling
Internet-Lexikografie/Online-Lexikografie*internet lexicography/online lexicography
Nutzerbeteiligung*user contribution
URI: http://www.aclweb.org/anthology/P11-4017
Zuletzt besucht: 10.09.2018
Abstract: We present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format, our toolkit massively decreases the data volume to less than 2% of the original size, and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows to process any language represented in Wikipedia. We expect this work to consolidate NLP research using Wikipedia in general, and to foster research making use of the knowledge encoded in Wikipedia's edit history.