The Lure and Limits of Linked Data: the case of World Historical Gazetteer
Introduction
Spatial humanists have long recognized the enormous integrative potential of using places as common points of reference for heterogeneous information. To realize that potential, collections of named places must be abundant, diverse, collectively assembled, and historically deep. In 2017, the World Historical Gazetteer (WHG) project based at the University of Pittsburgh undertook to build a freely available web platform (https://whgazetteer.org ) that would facilitate the collaborative development of such a collection, and to provide multiple ways of accessing its continuously growing results. The WHG platform has experiences steady growth in content, features, and usage since its 2019 launch.
The approach taken by the WHG project for assembling, linking, and publishing diverse place data as a free web resource utilizes the technological and social elements of the Linked Data paradigm (LD), as its characteristics match the requirements of a comprehensive digital historical gazetteer well. These include (i) extensibility, due to its underlying graph-based conceptual model; (ii) multivocality, by accommodating multiple possibly conflicting statements about the same phenomena; (iii) integration and (iv) sustainability--both facilitated by an expressive standard interchange format expressed in RDF.
The WHG union index has grown to well over 2 million sets of attestations for the same (or closely matched) place from multiple sources. A number of datasets are published in the WHG but not yet added to the index, and many more are at an earlier stage of accessioning. The WHG is in fact collectively assembled, and well on its way to being abundant, diverse, and historically deep.
A different kind of gazetteer
The WHG is not so much a gazetteer as it is a collection of gazetteers, generically termed place datasets in the platform. The records from datasets published in the WHG are to a large extent internally linked by their creators in its union index, and accessed via faceted search and an application programming Interface (API). Individual datasets are also presented as publications within the system and can be browsed and queried as such. The WHG platform provides features for performing the linking of data and disseminating the results as truly "linked open usable data" as described in Sanderson (2020).
The lure and promise
Knowledge about the past derived from research outputs, archives, and library holdings can be brought together indirectly with linked data methodology by common references to places. In Figure 1, each project (clear circle) has some information pertaining to Tbilisi, concerning perhaps museum holdings or historical events. Each project has within its research output a listing of all the places its work references--including Tbilisi. For each place they have identified one or more identifiers from an "authority" resource such as Getty Thesaurus of Geographic Names (TGN), Bibliothèque Nationale de France (BnF), GeoNames, or Wikidata (green circle). By publishing their place records in the WHG, projects are in effect announcing "we have information about {x} and Tbilisi." A search for "Tbilisi" -- or any of the 70 name variants gathered from linked records -- will currently return a set of 7 attestations, each from a different source.
Multivocality. Linked Data methodologies facilitate the surfacing of suppressed place names and difficult histories by supporting peoples’ discoveries about past places. It can allow genealogists and others to discover common historical connections to places, even if ancestors had different experiences at them and may have called them by different names. A visitor to the WHG who searches for Ayers Rock finds information about Uluru. A search for Tenochtitlan returns Mexico City and Ciudad de México (and vice versa), and a search for Batavia includes links to Jakarta.
Teaching. An index of linked gazetteers is a powerful teaching tool. By exploring how the same name recurs across the globe, students can trace contours of immigration and conquest. The WHG Place Collection and Collection Group features support classroom exercises for creating and annotating collections of thematically linked places.
The limits
Sparse temporal information. Relatively few historical place attestations include timespans indicating a period of existence; publication year of the source is often all that is available. For this reason, it is not possible to get comprehensive results when filtering on a year, timespan, or period.
LD is (often) not curated. A stated premise of the original RDF model design was that "anyone can say anything about anything" (W3C 2002). This is a blessing and a curse: it affords essential multivocality, but the quality of an information resource can suffer, and contributors to an LD graph have no control over who says what about their statements.
Disambiguation and conflation. The requirement for one record per place is a burden for many potential collaborators. Places can have multiple names, types, extents/locations and relationships over time. Aggregating these attributes within a single record can be difficult.
Semantics. There is little agreement as to some essential categorizations, e.g. of place type. The WHG allows any term to be added for type, but because mapping distinctive terms to the common vocabulary we offer can be difficult, the quality of reconciliation results and place type search filtering are somewhat hampered.
Looking Forward
Historian Jo Guldi recently asked how to take a digital, quantitative approach to history that still maintains the complexity of past human experience and the heterogenous, ambiguous, and ideologically embedded sources in which it is represented (Guldi 2023). Geographer Ruth Wilson Gilmore argues that struggles for racial justice are always also struggles for place (Gilmore 2022). Linking multiple digital humanities projects together is on its face a worthwhile goal, but there is still work to be done to determine how best and most ethically to do that while honoring the fact that each project has its own unique and organic relationship with a data-sharing community, one that may be vulnerable and may have a history of exploitation (Smith 1999). This is a complex practical and epistemological challenge, one that linked data makes both easier and more complex in various ways, and with which the WHG continues to wrestle.