XML Schema driven Database Management of Speech Corpus Metadata

Joachim Gasch, Institut für deutsche Sprache (IDS) Mannheim

Termin: 30.09.2008, 11:50 - 12:30 Uhr

Veranstaltungsort: Berlin-Brandenburgische Akademie der Wissenschaften, Raum 1, Jägerstr. 22/23, D-10117 Berlin

Electronic speech corpora need to bring together several heterogeneous data formats like audio and video data, corpus-, event- and speaker documentation and time aligned media annotations. The metadata management system has to drive data capture, XML native database storage, dynamic publishing and information retrieval processes. This article describes an XML schema based standardization approach where metadata (documentation and annotation information) of different speech corpora is centrally validated and natively stored within an object-relational XML database.