Last modified: 2018-02-21
Abstract
Objectives
We describe a method to facilitate checking relationships for quality control in a large thesaurus. A large thesaurus includes 10K – 50K concepts (descriptors) (many represented by multiple terms) and even more relationships between concepts. FAO's AGROVOC, which we use as an example, has ~33K concepts and ~72K) conceptual relationships. It would take a lot of resources to check each relationship manually. We developed a method that can assist in the selection of problematic cases that can then be checked manually.
Our method targets only specific conceptual relationships (properties), such as
respiration <includes> gas exchange
For these AGROVOC has ~23K occurrences.
Methods
Our method applies the examination of valence of verbs in linguistics to the analysis of relationships. The valence of a verb refers to the categorical characteristics that determine the thematic roles of the verbs and the types of concepts that can fill these thematic roles. In short, our method works as follows: For each relationship occurrence (triple) we map each concept to an entity type. Then we examine the patterns of entity type pairings associated with a relationship type. If an entity type pairing in a relationship occurrence does not fit one of the main patterns for the relationship type, there might be a problem; Figure 1 shows a simple example. We use AGROVOC as our test environment, specifically the data set (precise identification to be added).