Last modified: 2018-02-07
Abstract
Person identities in the linked open data environment are the result of a process of semantic representation that often requires a complex interplay of data association, reconciliation and interlinking. This paper intends to explore and offer insights on various aspects of this process in the context of linked data applied research in the domain of cultural heritage, starting from the atomic level of individual identifiers to graph structures and datasets interlinking.
Starting at the most fundamental level, when native data is generated, linked data technologies provide an open convention for “naming” entities—essentially any type of “thing” consisting of a unique string of characters called Uniform Resource Identifier (URI). This method enables humans and machines to read and process a discrete unit of information unambiguously. When it comes to person entities, that URI is the first identity marker of an individual. It wears in its syntax the mark of its underlying schema, ontology or conceptual model with the range of all the possible statements that a namespace entails. Only semantic-agnostic URIs carry meaning embedded in the sequence of the characters they are made of. As Gitelman and Jackson (2013) argue, data is never “raw” as it is the result of the cultural process of generation, curation and interpretation. In the context of digital semantic representation, the typology of URIs, the choice of coining new ones versus reusing, and the options available for entity reconciliation all have significant implications for the way person entity data is conceptualized, processed, shared, interlinked, and reused.
Thanks to the openness and extensibility of the RDF data model, which is at the core of the linked data paradigm, people profiles can be semantically enriched through cross-referencing and interlinking using a combination of knowledge bases and name authorities. Semantic web open standards offer the technical platform for semantic stratifications through the combination of predicates from multiple vocabularies and ontology extensions. Multiple descriptive layers can be associated with a person entity coming from different data spaces and thus virtually, and almost effortlessly, expanding the view of that individual from many contexts and angles. Key to this enrichment is the co-reference mechanism where information asserted about the same entity can be reconciled using a relationship of equivalence, owl:sameAs being the most common.
In general, the practice of identity management, where multiple URIs referring to the same entity are either resolved to eliminate ambiguity or co-referenced in order to provide multiple access points, has interesting implications for those concerned with the representation of gender and marginality (Pattuelli, Hwang and Mille, 2017). To this end, use cases help to illustrate the descriptive capabilities as well as the limitations of popular data sources, from biblio-centric Library name authorities (e.g., Congress’ NAF and VIAF) to domain specific vocabularies (Getty’s ULAN) to general knowledge bases (DBpedia and Wikidata). The use cases discussed here are drawn from two projects: Linked Jazz[1], an ongoing linked data project based at Pratt Institute focused on the domain of jazz music, and the Drawings of the Florentine Painters, a recently completed project based at the Harvard University Center for Italian Renaissance Studies and conducted in collaboration with the Semantic Lab at Pratt. The latter converted Bernard Berenson’s catalog of Florentine Renaissance drawings into an RDF dataset and developed a linked data-driven online catalog[2].
As linked data is based on a connectionist model, it is almost axiomatic to note that individuals, as other entities populating a networked environment, are shaped through the web of the associative relationships that link one to another. In other words, a person is forged in the context of a community. Linked data graph views offer a wide lens with which to analyze the construction of person identities and the social structures they are part of. Multi-faceted and dynamic network graphs representing the community of jazz artists were developed in the context of the Linked Jazz Project.
The full potential of linked data is achieved when data is interlinked. Bringing different datasets together can open up a infinite field of connections between individuals to ignite discovery while at the same time enabling new forms of analysis through data aggregations. In a more diversified discovery context, people and their historical and social contexts emerge and can be more fully understood through a composite of perspectives rather than through a single dominant narrative. Again, case studies of dataset integration via interlinking recently developed by the author and her research team in the domain of music will be described. An analysis of the integration process will highlight some of the methodological and technical challenges encountered while illustrating the anticipated benefits derived from semantic enhancement of person-centered graphs and related knowledge bases.
As we analyzed the construction of person identities as linked open data in a progression of steps that go from person uniform identifiers to networked social structures and interlinked datasets, the work involved in data wrangling and consolidation will also be highlighted as it presents a significant methodological dimension of linked data development which can be complex, highly labor-intensive and lacking established best practices. Several examples from the linked data research projects from the Semantic Lab at Pratt will be used to illustrate the tenets of the paper.
REFERENCES
Gitelman, L. & Jackson, V. Introduction. In L. Gitelman (Ed.), ‘Raw data’ is an oxymoron (pp. 1014). Cambridge, MA: The MIT Press.
Klic, L., Miller, M., Nelson, J., Pattuelli, M. C. and Provo, A. (2017). The drawings of the Florentine painters: From print catalog to Linked Open Data. The Code4Lib Journal, (38) October 2017.
Pattuelli, M. C., Hwang, K., and Miller M. (2017). Accidental discovery, intentional inquiry: Leveraging linked data to uncover the women of jazz. DSH: Digital Scholarship in the Humanities, (32)4, 918-924.
[1] https://linkedjazz.org/
[2] http://florentinedrawings.itatti.harvard.edu