WikiDB: Building Interoperable Wiki-Based Knowledge Resources for Semantic Databases
Alexander Mehler / Alexandra Ernst / Rüdiger Gleim / Ulli Waltinger, Universität Bielefeld
Termin: 30.09.2008, 14:40 - 15:20 Uhr
Veranstaltungsort: Berlin-Brandenburgische Akademie der Wissenschaften, Raum 1, Jägerstr. 22/23, D-10117 Berlin
Recently, the need for interoperable releases of the Wikipedia which make its content accessible to machine
learning and corpus querying has been stated. This research is in the line
of efforts to utilize the Wikipedia, wiktionaries, wikimanuals and other special wikis as large resources of linguistic
and encyclopedic knowledge in NLP. The present article follows this approach from the perspective
of cognitive interaction technologies. The aim is to enable artificial agents to explore crowdsourced knowledge resources generated by large communities
of web users. Theoretically spoken, this research tackles the grounding problem of cognitive
science by interfacing artificial agents with social ontologies. That is, object, linguistic
and metalinguistic knowledge is exploited in a way that enables virtual agents to identify, label, track
and continue the topic of a dialogue to which they participate as the interlocutor of a human user.
That way virtual agents become beneficiaries of crowdsourcing so that their human users gain in turn
from the increase of their communicative competence.
In order to meet this goal wiki-based knowledge resources have to be preprocessed on three interrelated
levels: (i) on the syntactic level of their elementary building blocks (concerning pages and their
links), (ii) on the semantic level of the content relations of these building blocks and (iii) on the pragmatic
level of (co-)authorship relations. That is, NLP and related approaches demand highly reliable
knowledge resources subject to a low effort of preprocessing them as a precondition of their reliability.
Thus, fine-grained syntactic, semantic and pragmatic annotations are demanded which make explicit
relevant while they filter out irrelevant information. The present article describes an approach to this
threefold task of preprocessing, annotating and retrieving data from wiki-based knowledge resources.
It addresses the following subtasks:
- Firstly, the article describes a unified representation format for modeling structure formation on the three semiotic levels.
- Secondly, it provides algorithms for automatizing the related syntactic, semantic and pragmatic annotations.
- Finally, the article describes a database in conjunction with an application programming interface which allows maintaining, exploring and further processing these annotations.