Contributing to World-Historical Gazetteer: a Preview

The World-Historical Gazetteer project (WHG) will soon begin aggregating and indexing historical gazetteer datasets, and exposing them as Linked Open Data via graphical and programmatic web interfaces — just as Pelagios Commons’ Peripleo project has done for a few years. And like Peripleo, WHG will also index contributions of annotation records that associate historical “items” with place identifiers. Typical items for Peripleo have included coins, coin hoards, and inscriptions of the Classical Mediterranean. Items records WHG will focus on include journey events, regions, and datasets. In fact, annotated items could be anything for which location is relevant, e.g. people and various types of events.

We are almost ready to begin accepting contributions; this post previews the pipeline and formats involved.

Contributions to WHG can include, in some combination: 1) gazetteer data, i.e. place records drawn from historical sources; 2) annotation records that associate a published record about an item with a place identifier; 3) collections of item metadata records referenced in annotations; and 4) a file describing the contributed dataset(s) in Vocabulary of Interlinked Datasets” format (VoID).

Over the past several weeks we have collaboratively developed a new Linked Places format (LPF here for short) with Rainer Simon of Pelagios, to be used for contributions of historical place data to both WHG and Pelagios’ Peripleo. The Linked Places format is designed around the JSON-LD syntax of RDF (it is also valid GeoJSON, with temporal extensions, as explained in the GitHub README). The new format makes use of several existing vocabularies and also introduces some terms specific to our shared purposes.

Several expert colleagues contributed valuable input, including Graham Klyne, Richard Light, Lex Berman, Arno Bosse, and Rob Sanderson [1]. We are in the process of updating the template Peripleo has used for annotation contributions (formerly Open Annotation in RDF Turtle, now its next-generation W3C Web Annotation in JSON-LD). Both are discussed in a little more detail below.

Contributing historical place records

There will be two separate workflows for contributions: from larger projects and from smaller ones. The distinction is whether a project has the capability and resources to meet two criteria which are accepted norms for publishing Linked Data: 1) publishing data in some syntax of RDF (in our case the new LPF); and 2) providing a unique URI and associated “landing page” for each resource described.

Case 1: Larger projects

If your project has (or will have) a web presence that provides public pages describing your individual places and/or “items”, (routes, regions, etc.), then we ask that you perform a transformation and export of your data in the standard formats mentioned above – Linked Places, a future annotation format (see Contributing Annotations below), and VoIF. Upon validation, we will ingest those records, link them with those already in the system, and expose them in a nice GUI and API. Details of WHG interfaces are forthcoming soon.

Case 2: Smaller projects

If your project does not entail creating a web site providing per-record landing pages, then we can accept your data contribution as CSV, mint unique URIs, and provide very basic landing pages for places and other items. The records will also be made available as JSON-LD (bonafide RDF) via our API. We will provide a Python program for converting CSV to LPF, but note that the CSV will have to conform to a template that aligns with LPF (available soon). Conversion from your native format to our CSV template will probably be more manageable than to LPF. In other words, upon submitting CSV data we can parse, a semi-automated conversion and ingest procedure will result their publication as Linked Open Data.

Contributing annotations

WHG will index metadata describing historical “items” annotated with gazetteer record identifiers. These annotation records assert, in effect: “this item is/was associated with this place, in this way;” and optionally, “at this time.”

The result of such annotations can be seen in the current Peripleo interface, where upon navigating to a given place, you can view metadata (including images) for coins and inscriptions associated with it in e.g. a foundAt or hasLocation relation. Annotations exposed in the WHG web interface will include historical journeys for which the given place was a waypoint, and regions, works, and datasets including or referring to the place.

Annotation contributions will comprise two sets of data: 1) collections of brief Item metadata records; and 2) collections of annotation records in W3C Web Annotation format. The contribution template in use by Pelagios’ Peripleo now is currently being updated to better account for typing of items and relations. Details of that new Linked Places annotation format (LPAF?) will be published soon. Collaborators in that modeling effort are most welcome!

[1] Twitter handles, in order: @gklyne, @RichardOfSussex, @mlex, @kintopp, and @azaroth42

Progress and Next Steps, Jan. 2018

The World-Historical Gazetteer project (WHG) has been under way for six months, and we’d like to let people know what progress we have made and our immediate next steps.

Progress

  • We digitized the index of “Atlas of World History,” a 1999 Dorling-Kinderley volume edited by historian Jeremy Black. This has given us approximately 10,000 places to seed our “spine” gazetteer, including cultural places like settlements, states, regions, peoples, and archaeological sites, and natural features like rivers and mountain ranges. Each entry is associated with one or more of the atlas’s 450 maps, each of which has a temporal coverage.
Georeferenced places from Black Index and societies from D-Place
Georeferenced places from Black atlas index (red) and societies from D-Place (green)
  • We have aligned approximately 70% of places in the atlas index with records in GeoNames and/or DBpedia so far, giving us geometry, additional name variants, and Wikipedia abstract text. In the near future we will also align records with the Getty Thesaurus of Geographic Names. A significant proportion of unmatched entries simply aren’t in existing gazetteers — but will be in ours!
  • We have gathered several data sets to augment the spine with additional cultural and natural features that will allow us to contextualize place records in our interfaces in some novel ways:

Societies (peoples) and related language regions (D-Place)
Rivers and lakes (Natural Earth)
Watersheds (World Resources Institute)
Mountain ranges (Natural Earth)
Terrestrial ecoregions (biomes; World Wildlife Federation)
Major ocean currents (NOAA)

  • In September, we held a kickoff meeting in Pittsburgh. Participants included members of our four data partner teams, several experts from the historical Linked Open Data “ecosystem,” and members of the Pitt and Carnegie Mellon digital humanities and library communities. We received a lot of valuable input, much of which we have compiled into a set of 70 “user stories.” These will inform design of our data models, graphical interfaces and API.
  • Our initial mapping of cultural features indicated that, as expected, some regions are somewhat under-represented. We are identifying a few significant historical print gazetteers and maps, digitization of which can help rectify the problem, and facilitating the work to extract and publish that data. The first such project will help fill a gap for 17-18c Latin America – the 1786 “Diccionario geográfico-histórico de las Indias Occidentales ó América,” and its 1812 English translation.
  • We also have worked to position our project in the the ecosystem referred to earlier. Over the past year, Technical Director Karl Grossner served as co-coordinator of the Linked Pasts Working Group of Pelagios Commons, which recently published a white paper – now open for comment as a shared document – “From Linking Places to a Linked Pasts Network.” We view WHG as joining Pelagios’ Peripleo system as a place-centered Linked Pasts Network Hub; more about that framing effort in due course.
  • We have begun experimental integration of the Getty Thesaurus of Geographic Names (TGN) into the WHG system. This extraordinary resource (~2.5 million geo-referenced places; ~4 million place names) has, like most existing gazetteers, limited temporal information. We’d like to facilitate temporal annotations to all indexed place records, as we move towards a truly historical gazetteer system.

Next steps

Following a mid-January 2018 system design charrette, database and software development begins in earnest. Early design plans will be published for comment.

Data development will continue throughout the duration of the project, and discussions with several prospective data partners are ongoing. Our goal is supporting communities of researchers specializing in particular regions and periods, and guiding them to publishing place data we can incorporate in our “union index.” The initial focus is on Colonial Era Latin America and West Africa, Maritime Southeast Asia, and Early Modern Europe, but we welcome inquiries concerning any region/period combination.