World Historical Gazetteer Version 1.0: A Guide

Welcome to Version 1 of the World Historical Gazetteer (WHG) platform. WHG remains a work-in-progress, so expect refinements, new features and new data in subsequent releases. Notes about planned near-future updates appear highlighted like this.

Geography in Historical Research

Some scholars and students of the human past ask explicitly geospatial questions (“Where did that happen?” “What else happened in that place?”). Others simply want to make maps. Either of these cases requires making a list of historical place names and determining the coordinates of those places in order to make them mappable and amenable to spatial analysis. This is an extremely difficult and time-consuming task. The mission of the World Historical Gazetteer is to make it as easy as possible. WHG is a pilot platform for gathering in one system, over time, contributions of place name data—large and small, and for all regions and historical periods—and to provide services for geolocating additional names.

Teaching

WHG is an excellent resource for teachers and students alike. Teachers can use information in WHG to refine lessons and to make custom maps for lectures or resources. Students can make their own historical datasets and then create maps from them. The services offered by WHG will help make historical mapping easier and more accessible to everyone. In short, developing gazetteer datasets is a hecka good classroom tool.

Reference

A "union index" of place attestations drawn from historical sources will over time increasingly link the disparate research of contributors on the dimension of place. Furthermore, by bringing together all the known references for a place without privileging any particular one, it decenters colonized name making. For instance, the WHG index contains 133 modern and historical name variants for the contemporary city of Beijing drawn from multiple sources, and 96 for the city of Istanbul. Searching for either Al Quds or Jerusalem returns contributed references for either.

Data

Two Place Data Stores: Database and Index

Data from files uploaded to WHG in either the full Linked Place format (LP) or abbreviated LP-TSV format are imported into a PostgreSQL relational database and made available to you for viewing and augmenting via our reconciliation process. You cannot edit them in WHG, but in the case of LP-TSV files, you can replace a dataset with a new file reflecting changes you made locally.

The reconciliation process finds prospective matches to your records in Getty TGN and/or Wikidata. When such an external record is accepted as a “closeMatch,” one or more place_link and place_geom records are added to the database, reflecting that match. This augments your dataset within the WHG database with additional attestations to the same place (links), and geographical coordinates, but does not alter the uploaded content in any way. If you flag your dataset as public, it's records will publicly accessible via the WHG API.

Separately from the database, WHG maintains a high-speed index that links records for the same place contained within multiple datasets. This conflation within the index means that for example, a search for either Istanbul, Constantinople, or Byzantium will lead you to a Place Portal page listing attestations for any of those names from multiple sources.

NOTE: We are evaluating ways to permit editing in the future.

Place

As of August, 2020 World Historical Gazetteer includes:

Core (non-historical) data

Historical data (all are in the database, but indexing is partial and in progress for some, as noted)

Several additional historical datasets are in queue for accessioning at this time. We welcome further contributions, large and small. If you have one in the works or in mind, please let us know via the contact form. We will publicize these additions as they occur via our blog and our Twitter account (@WHGazetteer)

Trace annotations (experimental)

Trace annotations are records annotating web resources about historical events, people, works and objects (“traces”) with identifiers for the places that are relevant to them. The annotations assert a relation between the trace and the place (for instance, they note that a place was the waypoint on a journey or the birthplace of an individual) and they attest a year or timespan during which the trace and the place were connected. Each annotation record joins one trace with any number of places. In connection with one another, one or multiple trace records link places together to create spatially explicit historical narratives.

  • Hernán Cortes and the Conquest of the Aztec Empire
  • The Journey of Xuanzang, a seventh century Buddhist Monk
  • The Lifepath of Gautama Buddha
  • The Empire of Alexander

Details for how to use the following services appear in the "Page by Page" section of this Guide and in help screens throughout the application.

Geocoding via reconciliation

The core feature of WHG is its reconciliation services, which allow you to upload place records drawn from your historical sources and find potential matches for them in two authority resources: the Getty Thesaurus of Geographic Names (TGN) and Wikidata. These include, in most cases, geographic coordinates. Potential matches are queued for review, and when you accept a match, your original dataset is augmented with geometry from the authority record as well as the authority identifier. Many Wikidata records also include concordances: identifiers from GeoNames, VIAF, and Library of Congress. Those are also added to the original dataset.

The scripts that we have developed to suggest potential matches do not simply match names; they make use of context provided in uploaded and authority records, including: a) all name variants, b) modern country or study area bounds, c) place type, and d) any provided coordinates.

Downloading augmented data

Having augmented an uploaded dataset with additional geometry and links, its owner can download it for mapping or further deveopment or any research purposes. Note that the file they download is a revised version of the original and users must manage the different versions.

Sharing (publication)

If you flag your uploaded dataset as "public", it is effectively published as Linked Open Data. Each record is assigned a unique permanent numerical identifier within WHG. It also inherits a unique dataset label identifier upon upload, whichcan be combined with the unique src_id you had given it, forming another permanent identifier within WHG. Individual records can therefore be accessed via our API with two URI patterns: /api/place/<place_id> and /api/place/<dataset label>/<src_id>. For example, the DK Atlas of World History record for Abydos, Egypt has an assigned place_id of 81010 and can be accessed at http://whgazeteer/api/place/81010/.It can also be accessed via its dataset label and src_id at http://whgazeteer/api/place/black/10031/

Contributing

All of the above occurs in your private workspace. Once your dataset has as many geometries and links to external authorities as the reconciliation process can discover, you can have it accessioned as a contribution to the WHG union index. Why take this extra important step?

  • Your place attestations now appear in index search results, in many cases linked with other attestations for the same place.
  • Because of this, researchers concerned with a particular place or set of places can learn about other people also interested in the same place(s)
  • If any of your records refer to places that were not previously in the WHG index, they will become new "seed" records. In time, people will spend far less effort geocoding their own place records
  • The WHG API allows the index to be connected to tools such as Pelagios' Recogito. When it is, you and others will have a far easier time annotating historical texts with authority records (and coordinates) for place references.
  • While simply flagging an uploaded dataset as "public" effectively publishes it, the extra step of accessioning to the index makes it that much more useful in the growing ecosystem of linked historical geodata.

Community

The WHG project belongs to a growing community interested in linking information about historical places and linking historical information from multiple disciplines via place. As such we are active partners in the Pelagios Network.

Domains of Interest

Pleiades, the "community-built gazetteer and graph of ancient places," is a trailblazing project that more than a decade ago began gathering, curating and sharing contributed data, focused on the Mediterranean region. Its success has been instructive in several ways. The project was and is the product of a community—in its case classicists, archaeologists, and historians of the region and period. It has a dozen volunteer editors, numerous reviewers, and continues to grow in depth and breadth.

We at World Historical Gazetteer anticipate our own data aggregation and publishing platform will grow by virtue of similar geographic and temporal "domains of interest." We aim to prioritize projects about the Global South. Our list of early and prospective contributors bears this out—it is spatiotemporally clustered. For example, Werner Stangl's HGIS de las Indias has seeded an early Latin American domain, and several other contributions of colonial and pre-colonial Latin America data are expected soon. Other emerging clusters include: the "Atlantic World," the Ottoman Empire, the Islamic World, Central Eurasia, and China.

Expanding coverage of linked data resources to include under-represented areas like the Global South are an important priority.

What now?

WHG is a start—the platform needs more data and it needs further software design and development in response to community needs. We are hopeful that it can be sustained and improved over the long term.

What is a gazetteer?

In its simplest form, a gazetteer is a list of place names. Typically, digital gazetteers provide some level of description for each listed place, e.g. its type and geographic coordinates. Historical gazetteers include prior names and sometimes a time span or period each name was in use. The Linked Places format (spec ; tutorial) used by WHG allows us to record temporally scoped name variants, coordinates, place types, and relations with other places, as well as related descriptions and depictions.

Who uses historical gazetteers?

Historical gazetteers are useful for anyone concerned with the history of a place or series of places, including researchers, teachers, and students. They help connect our present to our past.

What is a place?

The term has multiple related meanings. A few we like: (i) one answer to a where question, (ii) a setting for events and activity, (iii) "...an object resulting from a shared identification of a location. As an object, it may become a part of a network and participate in events" (Purves, Winter & Kuhn 2019 ). Attributes of places include names, locations, and types—all of which routinely change over time.

What is a place in WHG?

A place record represents one or more attestations of a place found in historical sources or in modern gazetteers and name authorities. Due to our use of Linked Places format, a WHG place record may include any number of names, types, locations (geometry), relations, descriptions, depictions, and links with other records. The entire record can be temporally scoped with a "when" assertion, as can any individual name, type, geometry or relation.

What is the geographic and temporal scope of WHG?

The geographic scope is global, and the temporal scope is roughly the span of written history.

Are some traces places, and vice versa?

A number of spatial-temporal entities could be modeled as either. For example: a dynasty, or an historical route.

Are there "preferred names" in WHG?

No. A place record can include any number of name and language variants; in fact they are encouraged. But we ask that each record have an assigned "title," which serves principally as a headword in lists.

Does WHG suport multiple languages?

The name variants found in WHG are of numerous languages and scripts. Unfortunately, to date most of the contributed names do not arrive tagged with language-script codes. Separately, the internationalization of the WHG site is a high priority for the next phase of work.

How and why does WHG use modern country boundaries?

Modern country boundaries are used in WHG primarily to filter queries and constrain reconciliation results. Country codes are included in place records for this purpose and do not reflect a given place's historical "containment" in, or association with, administrative areas at any given time in the past.

What is the WHG vocabulary of place types?

We have adopted a subset of 160 place type concepts from approximately 900 contained the Getty Art & Achitecture Thesuarus . Our focus in selecting these was on settlements (inhabited places), administrative divisions, sites, and natural features.

Can contributions include urban-scale features?

We are not actively soliciting urban-scale place data, but recognize there is growing interest in systems to manage such information.

Register/Login/User Profile

Registration and login are required to be able to upload datasets, designate collaborators, to use our reconciliation services to find matches in modern placename authorities (Getty Thesaurus of Geographic Names (TGN) and Wikidata so far), and ultimately, to contribute your data to the WHG index. Once logged in, top-level menu options for "Data", user profile appear.

Search
Places
Autocomplete (available for Trace searches) will be added for Place searches soon.

Place search runs against a union index of ~1.8m core records (~3m names) and all contributed datasets. Typing a name then pressing the {Enter} key performs a search, presents a list of results, and maps those which have geometry (not all do). Clicking a result item will highlight it on the map, providing further context. Clicking the name link takes you to the index record's "portal" page where any number of attestation "cards" drawn from our core datasets (grey banner) and multiple contributed datasets (beige banner) are gathered.

Filters are available to constrain place searches: a) GeoNames feature class, b) temporal (not before, not after), and c) spatial (bounds of world regions, modern countries, and user "study areas."). Search results can be further filtered by place type.

Traces

Search for trace data runs against a separate index. At this time there are only a few dozen example trace records. Try typing 'empire' to see a couple of examples. Selecting an auto-suggestion will perform the search and the places referenced in the trace record are mapped. As with places, clicking a result item highlights the place on the map, and clicking its name takes you to its place portal page.

Integration of trace data is at an early experimental stage. We welcome suggestions for how trace data can be better integrated into the WHG interface.

Dataset mapper

Heat maps indicating coverage for some larger contributed datasets are available via a dropdown menu.

The Data dashbord page lists datasets created by a user or for which they are a collaborator, and user "study areas." Additionally, two read-only "core" datasets are listed. Clicking an "add new" or "create new" link () starts the process of creating and managing either a dataset or study area.

Datasets
  • Place data

    Registered users can create Place datasets by uploading files in one of two formats: the expressive GeoJSON-LD based Linked Places format ("LP format" for short), or the simpler LP-TSV format. Considerations for making the choice are found in the "Choosing an upload data format" tutorial.

    Help icons () for each field in the "Upload dataset" form provide popup instruction for filling it.

    Uploaded data files are validated for adherence to the relevant format spec, LP or LP-TSV. If there are formatting errors, their details are displayed and after correcting them the file upload can be attempted again.

    Upon successful upload, a Dataset Portal page is displayed.

  • Trace data

    NOTE: Trace data uploads are not yet enabled.

Study Areas

These are user-created named polygon bounds used to constrain the searches used in the reconciliation process. See Reconciliation below.

Collections

Planned, but not yet available. Users would be able to add place and trace records to personal collections, which can be mapped, edited, and optionally, shared. Would be useful for teaching scenarios.

This page contains several tabbed sections for managing datasets: Metadata, Browse, Reconciliation, Sharing, and Log/Comments.

Metadata

This tab section provides metadata about the dataset and its most recent uploaded data file source. The title, base URI and description fields can be edited. Updating of the datasets can be initiated (only LP-TSV delimited files at present). Statistics are displayed on the right side of the screen: initial counts of rows, name variants, links, and geometries, as well as counts of link and geometry records added during the reconciliation review process.

Browse

This section combines a sortable, searchable list of the records currently in the dataset, and a map displaying any geometry it includes. Note that links and geometry from authority records matched in the reconciliation review process are reflected here, as dataset augmentation written as new place_link and place_geom records.

Reconciliation

Reconciliation "tasks" are initiated from this tab section and listed for access to review screens; a process outlined below. A summary of the initial results is generated and displayed in a list, with links to (a) review prospective matches.

It is also possible to delete the task, its associated hits, and any match records created in review work. Caution! There is no recovery from these clearing actions!

Contributing a dataset to WHG

After a dataset has been uploaded and, using our reconciliation services, augmented with as many links (matches) to modern authorities as possible (see below), it can be considered for accessioning — that is, contributed to the WHG "union index." At this time all accessioning will be initiated by the WHG project team, and review performed by the dataset owner.

Contributing a dataset to WHG entails reconciling its records to the WHG index. Any record sharing a link with one already in the index will be automatically indexed as a "child" or "sibling" to ours. Records for which there are no prospective matches are indexed as a "parent," in effect a new seed record for a place. Records for which there are prospective matches will be queued for review, just as with the TGN and Wikidata reconciliation.

In that step, each record is compared with the WHG index to see if the referenced place already has one or more attestations from another dataset. If it does, it is marked as a "child" record of the first attestation we received. If it does not, it is considered a new "parent." Accessioning relies on Place records having as many associated "place_link" records as can be obtained.

Sharing

Owners of a dataset can name any number of registered WHG users as collaborators, giving them permission to view the dataset and to perform review of prospective matches generated by reconciliation tasks.

Log & Comments

Actions related to datasets are logged and listed here.

On the Reconciliation Review and Place Portal pages, users can create comments specific to a database record. Comments for all places in a dataset are listed here. NOTE: Comments suggest followup action, e.g. correction of errors. We are contemplating how such corrections might be accomplished within the WHG interface.

Reconciliation is the process of identifying matches of your Place records to existing records in online place name authorities. So far, reconciliation to Getty TGN and Wikidata are offered. DBpedia and GeoNames will likely be added in the future. The purpose of reconciliation is to augment a dataset with associated "place_link" records, and optionally, geometry ("place_geom" records) derived from the authority. It is therefore possible to upload a dataset missing geometry for some or all of its records, and use this reconciliation service to make it mappable, at least in part.

NOTE: Making a dataset as rich with links to authorities as possible is a crucial step in making it ultimately a solid contribution to the WHG index.

In each case, the authority data store is queried for matches with your dataset records, one by one. Each query actually consists of multiple "passes," at first including as much context as your records may contain: name plus all variants; place type; one or more modern country or user-defined Study Area as a spatial constraint; coordinate geometry for the feature; and name(s) of "parent" entities. Subsequent passes (two for TGN, one for Wikidata) relax the query if no potential matches (hits) are found. Resulting hits for all records are queued for review by the dataset creator.

Getty TGN

WHG maintains a locally indexed copy of about 1.8 million place records retrieved from a TGN dump file in March, 2018. Because it is local, the process is considerably faster than for Wikidata. We hope to periodically update this index in the future, or to use the newly announce TGN Open Refine endpoint if its results are comparable and speed is acceptable.

Almost all TGN records include a point geometry, but have no concordances with other authorities or structured temporal attributes.

Wikidata

The Wikidata reconciliation is performed against its SPARQL endpoint (https://query.wikidata.org/). At approximately 1 second per record it is much slower than that for TGN. Many Wikidata records contain geometry and concordances with other authorities. When you confirm a Wikidata match, we create a "place_link" record not only for the Wikidata ID but for any TGN, GeoNames, VIAF and Library of Congress IDs, if found.

Study Areas

Many toponyms appear repeatedly in multiple locations, referring to different places, often far apart (e.g. Latin America and the Iberian Peninsula). To aid the reconciliation process, users can define a Study Area that will constrain the search for matches to its bounds, by a) entering a series of 2-letter country codes, which will generate a hull shape, or b) by drawing a polygon on a map.

Alternatively, a pre-defined region can be chosen from a separate dropdown menu.

Prospective matches to external authorities are not automatically added to the WHG database during reconciliation; i.e. new "place_geom" and "place_link" records augmenting the dataset are created only by the Reconciliation Review step performed by the dataset creator or specified collaborators.

This page presents dataset records with one or more prospective matches ("hits') on the left of the screen, with a list of those hits on the right. A small map displays geometry for the record with a green marker, and that of all hits with orange markers. Hovering over the globe symbol () in a hit item highlights its position in the map.

The objective is to determine, for each dataset record, whether any of the hits are a "closeMatch" to it. By default, "no match" is selected for each hit. The reviewer can optionally change the selection to "closeMatch" for one or more hits. In any case, clicking save records the choice and advances to the next record. Attributes of the dataset record and the hits provide context to assist making the assessment.

Matching

The meaning of closeMatch derives from the Simple Knowledge Organization System (SKOS) vocabulary, a data model commonly used in linked data applications. NOTE: The "related" relation is not yet defined formally, and assertions of it will not yet appear in the interface. For WHG, a Place record refers to a SKOS:Concept, so assertions of a closeMatch between your record and that of an external authority indicates:

  • "...(the) two concepts are sufficiently similar that they can be used interchangeably in some information retrieval applications"

Note that closeMatch is a super-property of exactMatch; that is, every exactMatch is also a closeMatch. Clear? Oh well. Practically speaking, for WHG asserting closeMatch serves as a linking "glue." Specifically, records that share one or more common authority link will conflated in our union index" only, and returned together in response to queries. For example, records from different source for Abyssinia and Ethiopia share two links, to a DBPedia record and a TGN record. Therefore, they appear together when searching for either Abyssinia or Ethiopia.