Faculdade de Letras da Universidade do Porto - OCS, 15th INTERNATIONAL ISKO CONFERENCE

Font Size: 
PROPOSAL FOR THE INTEGRATION OF THE SEMANTIC STRUCTURE OF WIKIPEDIA CATEGORIES INTO WIKIDATA USING SKOS
Juan Antonio Pastor Sánchez, Tomás Saorín Pérez

Last modified: 2017-12-18

Abstract


Objectives

DBpedia is the most prominent dataset in Linked Open Data ecosystem. Wikidata is a Wikimedia movement  initiative to represent knowledge as data, structured, defined and maintained collaboratively. Both are cross-domain Open Knowledge Graphs.

Both projects may seem similar in scope and uses, but their approaches are different. DBpedia aims to align the contents of Wikipedia in an ontology to formalize the knowledge of Wikipedia content using wide-spread semantic technologies. Wikidata is a platform for collaborative creation of data mainly corresponding to each Wikipedia content, but not only, applying its own data model and acting as a “Data Commons” for all the Wikimedia projects.

There are also significative differences between the way both handle Wikipedia categories. DBpedia represents those categories and their hierarchical relationships making use of SKOS, as well-established standard “de facto”, for formalizing concept schemes. Currently, Wikidata just represents the categories of Wikipedia as entities but does not include the hierarchical relationships between them.

WikiData is intended to change the way in which data is used by the Wikimedia editors, and it will probably have influence in how DBpedia data is collected. So, improving the semantic of Wikipedia Categories in Wikidata will have a deep impact in Knowledge Organization. In this work we propose a methodology for the SKOS integration into de WikiData data model of the Wikipedia Category semantic structure.

Methodology

First we have accomplished a bibliographic research and also the technical guides, drafts and documents about the data models and data curation model of both projects.

Besides, an analytic comparative report of how Wikipedia categories are treated in DBpedia and WikiData is conducted, including not only the terms itself, but also their relationships and correspondences. Equivalence patterns between the objects or entities of the three sources - Wikipedia, DBpedia and Wikidata - are identified. Other content elements, such as articles, participation records or external links are also take in consideration.

Then, keeping in mind the autodescriptive nature of WikiData infrastructure a set of entities and properties that allow the use of SKOS to represent semantic relationships are suggested.

Last, we have designed and tested a methodology for the automatic design of an automatic process of reuse of semantic representation of semantic relations in DBpedia.

Main results

An holistic view of interweavings between Wikipedia, DBpedia and Wikidata is fulfilled, that may build up further research on top of it. Also the many technical issues outlined are helpful to other researchers interested in understanding the dynamics and interconnections of these three projects.

Our proposal details the SKOS properties and classes that should be included as WikiData entities, in order to represent categories semantic relationships and its mappings and cross-references with other SKOS datasets.

Technical advice of how obtain and adapt DBpedia categories RDF statements in order to be incorporated in WikiData is exposed. This methodology, whose results are a kind of alignment between DBpedia and WikiData categories, is built upon the matching between their different representations. A bunch of tasks, procedures and tools are developed, that allow manage not only concepts, but also SKOS labels and semantic relationships existing in DBpedia.

Methodology is tested setting up a dataset of easy integration in Wikidata, mixing online tools and a lightweight framework for DBpedia RDF statement processing.

Conclusions

Adding semantic relationships of Wikipedia categories in Wikidata is viable, and DBpedia is a worthy tool for do this. Including semantic relationships in Wikidata improve its quality as a Knowledge Organization resource.

WikiData platform offer valuable chances for consuming and reuse structured data freshly available, and enabling a wide range of apps.

WikiData make possible an online platform for crowdsourced knowledge management. In spite of its high level of formalization and structure, it could be enriched by adding the semantic of other value vocabularies.

Wikidata categories, whose source is Wikipedia, may be used as an element to align thesauri, subject headings, taxonomies, etc. Due this reason, apply SKOS in category representation could be the first step for Wikidata not only to be a factual database, but also a knowledge organization platform.

Keywords

DBpedia, Semantic Web, SKOS, Wikidata, Wikipedia