WikiDB: Building Interoperable Wiki-Based Knowledge Resources for Semantic Databases

Alexander Mehler / Alexandra Ernst / Rüdiger Gleim / Ulli Waltinger, Universität Bielefeld

Termin: 30.09.2008, 14:40 - 15:20 Uhr

Veranstaltungsort: Berlin-Brandenburgische Akademie der Wissenschaften, Raum 1, Jägerstr. 22/23, D-10117 Berlin

Recently, the need for interoperable releases of the Wikipedia which make its content accessible to machine learning and corpus querying has been stated. This research is in the line of efforts to utilize the Wikipedia, wiktionaries, wikimanuals and other special wikis as large resources of linguistic and encyclopedic knowledge in NLP. The present article follows this approach from the perspective of cognitive interaction technologies. The aim is to enable artificial agents to explore crowdsourced knowledge resources generated by large communities of web users. Theoretically spoken, this research tackles the grounding problem of cognitive science by interfacing artificial agents with social ontologies. That is, object, linguistic and metalinguistic knowledge is exploited in a way that enables virtual agents to identify, label, track and continue the topic of a dialogue to which they participate as the interlocutor of a human user. That way virtual agents become beneficiaries of crowdsourcing so that their human users gain in turn from the increase of their communicative competence.
In order to meet this goal wiki-based knowledge resources have to be preprocessed on three interrelated levels: (i) on the syntactic level of their elementary building blocks (concerning pages and their links), (ii) on the semantic level of the content relations of these building blocks and (iii) on the pragmatic level of (co-)authorship relations. That is, NLP and related approaches demand highly reliable knowledge resources subject to a low effort of preprocessing them as a precondition of their reliability. Thus, fine-grained syntactic, semantic and pragmatic annotations are demanded which make explicit relevant while they filter out irrelevant information. The present article describes an approach to this threefold task of preprocessing, annotating and retrieving data from wiki-based knowledge resources. It addresses the following subtasks:

  1. Firstly, the article describes a unified representation format for modeling structure formation on the three semiotic levels.
  2. Secondly, it provides algorithms for automatizing the related syntactic, semantic and pragmatic annotations.
  3. Finally, the article describes a database in conjunction with an application programming interface which allows maintaining, exploring and further processing these annotations.
As a result of solving these interrelated tasks the present article provides a model of a wiki-based semantic database - henceforth called WikiDB - which makes accessible crowdsourced knowledge resources to machine learning and related approaches.