Last modified: 2018-06-20
Abstract
In these days, knowledge representation and organization are facing turbulent times for various reasons. In the library field, for example, subject indexing is primarily seen as a cost factor and therefore has slipped into a defensive position (Wiesenmüller, 2017). This is reinforced on the one hand by the fact that the amount of publications constantly grows, and the quantity of homogenously indexed publications continuously decreases. On the other hand this defensive position is further strengthened by the fact that library stocks and databases less and less exist as encapsulated silos but increasingly form only one component in a complex interplay of numerous databases and information sources; one of the core reasons why interoperability of content-descriptive metadata has become more and more important.
Various strategies are pursued in order to lead knowledge representation and organization in this context out of this defensive position. One strategy proceeded is to increase the value of subject information when using content-specific authority data by mapping different authority files onto each other by establishing relationships between the concepts of one vocabulary and those of another. This legitimate and desirable approach has found its most prominent expression in the latest international thesaurus standard 25964-2 (ISO 25964-2), including mapping scenarios of equivalence as well as hierarchical and associative mappings and dealing with various mapping candidates such as classification schemes, taxonomies, subject heading schemes, and name authority lists. Not least because of internationalization efforts, vocabulary reuse and the demand for an integrated search interoperability of knowledge organization systems indeed has become very important and numerous vocabulary mapping projects and initiatives have been launched and partly continued for years already.
Another strategy pursues the goal to extend the scope of application of authority data, for example by bringing authority data into the Semantic web. In Wikipedia, for instance, almost 400,000 links exist to the German Integrated Authority File (GND). One concrete example, where these links are used, are the webpages of persons in the German Digital Library, which are based on the data service Entity Facts provided by the German National Library and which gives information extracted from the German Integrated Authority File and provides external links to other information systems, in which the GND IDs are also integrated.
One information environment, in which both strategies converge and which recently has become very prominent, is the knowledge base Wikidata. Launched by the Wikimedia foundation in the year 2012 it could be enhanced and edited by everybody and serves as a shared knowledge base to provide structured data for the different language versions of Wikipedia. To various types of entities which are part of Wikipedia articles, authority data also used in bibliographic databases could be assigned. This includes, for example, authority data for persons and corporate bodies. This way, Wikidata serves as a linking hub, allowing for relations of equivalence between the Wikidata item identified by an abstract identifier and the item in an external database identified for its part by an external identifier.
As numerous mapping initiatives show, Wikidata has enormous potential as linking hub for so-called “individual material entities” (see Neubert, 2017: 9), i.e. records representing disambiguated persons as well as unambiguous corporate data. But what about subject entities in Wikidata, which with regard to their labels could also be found in subject headings authority files or thesauri?
The paper deals with this question from a methodological and empirical perspective. It starts with a structural comparison of the key characteristics and guiding principles of construction and design underlying a thesaurus on the one hand and a knowledge base like Wikidata on the other hand. It then addresses this question on an empirical basis referring to the STW Thesaurus for Economics as a discipline-specific knowledge organization system with a rather high degree of terminological dynamics. It concludes with a summary of the strengths and weaknesses of mapping discipline-specific subject authority data to Wikidata and discusses possible practical implications.
References:
Neubert, Joachim. 2017. "Wikidata as a linking hub for knowledge organization systems? Integrating an authority mapping into Wikidata and learning lessons for KOS mappings", 17th European Networked Organization Systems (NKOS) workshop, 21st September 2017, as part of TPDL 2017 Conference, Thessaloniki, Greece, http://ceur-ws.org/Vol-1937/paper2.pdf
Wiesenmüller, Heidrun. 2017. "Erschließung in schwierigen Zeiten. Ansichten und Einsichten", Workshop "Computerunterstützte Inhaltserschließung", 08., 09.05.2017, Stuttgart. https://www.ub.uni-stuttgart.de/wirueberuns/downloads/veranstaltungen/da-workshop/Wiesenmueller_Keynote.pdf
International Organization for Standardization. 2013. ISO 25964-2: Information and Documentation – Thesauri and Interoperability with other Vocabularies – Part 2: Interoperability with other Vocabularies. Geneva: International Organization for Standardization.