Faculdade de Letras da Universidade do Porto - OCS, 15th INTERNATIONAL ISKO CONFERENCE

Font Size: 
Marcia Lei Zeng, Sophy Shu-Jiun Chen

Last modified: 2018-02-20


The goal of this research is to define an effective method and framework for the derivative interpretation of online biographical sketches (bios) which will lead to innovative information access. A related objective is to reuse digital archive and digital library resources which have been established all over the world during the last decade. Specifically, the research aims to present bios using structured data based on the resources available from open data on the Web. When dealing with un-centralized data sources, a unified framework representing interoperable models and schemas is essential. Meanwhile, machine-understandable languages enable the structured bio data to be accessible to human users while also process-able by machines. The impact of these outcomes will go beyond publishing bios on the web because the bios embedded with the structured data will enable innovative information access across the boundaries of language, geographical location, format, and discipline.

The research project was initiated from a desire to expose information about native Taiwanese artists in an effective method and reuse the resources of digital archives or digital libraries. Although new domain-specific portals, digital collections and online exhibitions have been developed using digital archives, and access to them is traditional and limited due to individually applied models and schemas, as well as isolated tasks and projects. How can we convey new levels of usefulness for these digital resources to the international community across the boundaries of language, geography, format, and discipline? With this question in mind, this project chose to use structured data to share, connect, and reuse digital resources. It also aimed to enable the discovery and rediscovery of individuals and their significant roles in culture and history by international users.

The authors examined several important ontologies for “person” that were developed during the last two decades with different focuses, including Friend of a Friend (FOAF),[1] DBpedia Ontology,[2] BIO,[3] Biography Light,[4],[5] Union List of Artist Names (ULAN),[6] ,[7] CIDOC Conceptual Reference Model (CRM),[8] and Schema.org.[9] Their models might be built as agent-centered or event-centered, person-restricted or integrated into a high-level schema that requires interactions with other ontological classes. Related models have also brought attentions of the authors as they may reflect the needs for presenting and connecting the creative works (in all possible media, formats, productivity layers, and reuse cases) by, about, or has focus of, a particular person. In addition, the authors explored many openly available online bio collections and websites. After a comprehensive review and comparative study, the authors created an application profile based on Schema.org to support the framework which accommodates the necessary components with an interoperable model for anyone to use. To achieve the result of interactive encoding and generating desired source codes on-the-fly, the project developed interactive form templates as the communication channel between a human editor and a computer. Two encoding formats, RDFa and Microdata, are used to realize the creation of structured data.[10]

At the first stage of the pilot project, a set of twelve Taiwanese artists’ bios available on an online exhibition[11] presented by the Academia Sinica Center for Digital Cultures (ASCDC) were used as the base for generating structured data, testing the templates, and establishing useful structured datasets. The results were presented with webpages first. The English bios and structured data values were also submitted to Wikipedia and Wikidata manually, involving a mapping process, which led to a dramatic increase (after one month) of Google knowledge graphs of the Taiwanese artists previously less-known to Westerners.

The second stage of the pilot project extended to other humanity domains and included a set of well-known Taiwanese agents from humanity domains (such as writer, poet, dancer, choreographer, and musician). The sources for the bios are gathered from openly available websites related to the agents, special archival collections, and event announcements related to performances and exhibitions from other countries. Challenges and unanswered questions have also been identified when dealing with historical figures, non-English text, and the inverse relationship’s coding. The project’s framework and the tool embodying the schema were revised and further tested by invited users. More importantly, the efforts have resulted in visualized knowledge graphs as well as new knowledge bases that are generated by sophisticated querying to the new bio datasets generated by using the templates we developed. The datasets have become knowledge bases, and the potential usage of these kinds of datasets in research seem to have no end. When the entities (such as a person, a place, an institution, or a work) of these datasets are mapped to the data values in DBpedia, Wikidata, GeoNames, and other existing datasets, these knowledge bases can be enriched greatly and also be shared with, or reused by, anyone in the world.

In conclusion, the derivative interpretation of biographical sketches could effectively support innovative information access. The approach of using un-centralized resources to generate datasets has special meaning to digital archives and digital libraries in the current stage, because digital resources can be further used without waiting for a large project, or being a one-time effort, which would be difficult to sustain after funding ends. The accumulated efforts will have no limits or boundary for those who contribute, while the outcomes can be maximized. The authors understand that wider participation will happen only if the tools (such as the online bio platforms and datasets builders) are handy to use and the roadmaps are clear to follow; therefore, the interoperable framework enables great potential for wider participation and for innovative bio information exploration. From these derivative datasets, search engines will be able to index and connect the structured data, reuse them in forming knowledge graphs, and provide useful and accurate references along with searching processes. In addition to search engines, any digital library, portal, or website that is interested in contents related to and about individuals and cultural heritage can use structured data that has not been well-exposed or connected before.

[1] Brickley, Dan and Libby Miller. (2000- ). FOAF Vocabulary Specification 0.99. Namespace Document 14 January 2014. http://xmlns.com/foaf/spec/ (accessed 2017-10-17).

[2] DBpedia Ontology. (2008-).  http://wiki.dbpedia.org/services-resources/ontology (accessed 2017-10-17).

[3] Davis, Ian and David Galbraith. (2003-). BIO, A Vocabulary for Biographical Information. http://vocab.org/bio/0.1/ (accessed 2017-10-17).

[4] Ramos, Michele R. (2009). “Biography Light Ontology: An Open Vocabulary for Encoding Biographic Texts.” http://metadata.berkeley.edu/BiographyLightOntology.pdf (accessed 2017-10-17).

[5] “Bringing Lives to Light: Biography in Context. A National Leadership Project for Libraries funded by the Institute of Museum and Library Services. Grant No. LG-06-06-0037-06 Final Report for the Period October 1, 2006 through September 30, 2009.” (2009). University of California, Berkeley. http://metadata.berkeley.edu/Biography_Final_Report.pdf (accessed 2017-10-17).

[6] Cuno, J. (2015). Getty Union List of Artist Names (ULAN) Released as Linked Open Data. http://blogs.getty.edu/iris/getty-union-list-of-artist-names-ulan-linked-open-data/ (accessed 2017-10-17).

[7] Getty Vocabularies: Linked Open Data. Semantic Representation. (2014 -) ULAN Specifics.  http://vocab.getty.edu/doc/#ULAN_Specifics (accessed 2017-10-17).

[8] CIDOC. (2006- ). Conceptual Reference Model (CRM). http://www.cidoc-crm.org/ (accessed 2017-10-17).

[9] Schema.org. (2012-). https://schema.org/ (accessed 2017-10-17).

[10] Structured Data for Bios, 2016-. http://www.metadataetc.org/bios-data-project/ (accessed 2017-10-16).

[11] Starting Out from 23.5°N: Chen Cheng-po. Academia Sinica Digital Center (ASDC), 2014. http://chenchengpo.asdc.sinica.edu.tw/main_en (accessed 2017-10-16).