World Historical Gazetteer Version 2.1: A Guide

Welcome to the World Historical Gazetteer (WHG) platform. WHG remains a work-in-progress; expect further refinements, new features and new data in Version 3 (beta, early 2024). Notes about planned near-future updates appear highlighted like this.

Geography in Historical Research

Many scholars and students of the human past ask explicitly geospatial questions in their research, for example about distribution patterns over time and connectivity. Others simply want to make maps—to visualize the geography inherent in their source material. Either case requires making lists of historical place names and determining the coordinates of those places, rendering them mappable and amenable to spatial analysis. Finding even estimated locations for historical places is often a difficult and time-consuming task. A central goal of the World Historical Gazetteer project is to make it as easy as possible. WHG is a platform for gathering in one system, over time, contributions of place name data—large and small, and for all regions and historical periods. WHG provides services for geolocating place names, linking individual attestations for "closely matched" places, and sharing the results of that work as searchable l inked datasets.

Teaching

WHG is an excellent resource for teachers and students alike. Teachers can use information in WHG to refine lessons and to make custom maps for lectures or resources. The services offered by WHG will help make historical mapping easier and more accessible to everyone. In short, developing gazetteer datasets is an excellent classroom tool. The "Place Collection" feature facilitates building and publishing sets of user-annotated place records within WHG, accompanied by an explanatory essay, image, and links to external resources. NOTE: this feature will be significantly enhamced in v3 (2024).

Reference

The WHG "union index" of place attestations drawn from historical sources will over time increasingly link the disparate research of contributors on the dimension of place. Furthermore, by bringing together all the known references for a place without privileging any particular one, it decenters colonized name making. For instance, the WHG index contains 133 modern and historical name variants for the contemporary city of Beijing drawn from multiple sources, and 96 for the city of Istanbul. Searching for either Al Quds or Jerusalem returns contributed references for either.

Data

Two Place Data Stores: Database and Index

Data from files uploaded to WHG in either the full Linked Place format (LP) or abbreviated LP-TSV format are imported into a relational database and made available to the uploader ("owner") for viewing and augmenting via our reconciliation process. Dataset owners can designate other WHG users as co-owners or collaborating "members." Data cannot be edited in WHG, but datasets can be updated.

The reconciliation process finds prospective matches to your records in our index of 3.5 million Wikidata place records. When dataset owner(s) accept an external record as a "close match," one or more place_link and place_geom records are added to the database, reflecting that match. This augments your dataset within the WHG database, with concordance identifiers (links), and geographical coordinates, but does not alter the original uploaded content in any way. When a dataset is flagged as public, its records become publicly accessible via the Search page, a public Browse page, and the WHG API.

Separately from the database, WHG maintains a high-speed index that links records for closely matched places from multiple datasets. Following publication, an accessioning step links records from the new dataset with existing records for a given place (if any). If no matches are found, the new record becomes a very welcome new "seed." The result, for example, is that a search of the index for either Istanbul, Constantinople, or Byzantium will lead users to the same Place Portal page, listing sevral attestations for that place by any of those names, as supplied by multiple sources.

Place

As of July, 2023, World Historical Gazetteer includes about 141,000 contributed historical place records and about 1.8 million "core" non-historical records.

Public historical datasets (all are in the database; indexing is partial and in progress for some, as noted)

Core (non-historical) data

Numerous additional historical datasets are in queue for accessioning at this time. We welcome further contributions, large and small. If you have one in the works or in mind, please let us know via the contact form. We will publicize these additions as they occur via our blog and our Twitter account (@WHGazetteer)

Details for how to use the following services appear in the "Page by Page" section of this Guide and in help screens throughout the application.

Linking and geocoding via reconciliation

A core feature of WHG is its reconciliation services, which allow you to upload place records drawn from your historical sources and find potential matches for them in two authority resources: Wikidata and the Getty Thesaurus of Geographic Names (TGN). Records from both include geographic coordinates in almost cases. Potential matches are queued for review, and when you accept a match, your original dataset is augmented with geometry from the authority record as well as the authority identifier. Many Wikidata records also include concordances: identifiers from GeoNames, VIAF, and Library of Congress among others. Those are also added to the original dataset.

The scripts that we have developed to suggest potential matches do not simply match names; they make use of context provided in both the uploaded and authority records, including: a) all name variants, b) modern country or study area bounds, c) place type, and d) any provided coordinates.

Downloading augmented data

Having augmented an uploaded dataset with additional geometry and links, its owner can download it for mapping or further development or any research purposes. Note that the file they download is a revised version of the original and users must manage the different versions.

Sharing (publication)

Registered users can request a dataset be flagged as "public," and after a brief review by WHG editorial staff, it is effectively published as Linked Open Data. Upon upload, each record is assigned a unique permanent numerical identifier within WHG. It also inherits a unique dataset label identifier upon upload, which can be combined with the unique src_id you had given it, forming another permanent identifier within WHG. The Using the API page explains options for that service.

Contributing

All of the above occurs in your private workspace. Once your dataset has as many geometries and links to external authorities as the reconciliation process can discover, you can take the extra important step of accessioning it as a contribution to the WHG union index. Some resons to do this include:

  • Your place attestations now appear in index search results, in many cases linked with other attestations for the same place.
  • Because of this, researchers concerned with a particular place or set of places can learn about other people also interested in the same place(s)
  • If any of your records refer to places that were not previously in the WHG index, they will become new "seed" records. In time, people will spend far less effort geocoding their own place records
  • The WHG API allows the index to be connected to tools such as Pelagios' Recogito. When it is, you and others will have a far easier time annotating historical texts with authority records (and coordinates) for place references.
  • While simply flagging an uploaded dataset as "public" effectively publishes it, the extra step of accessioning to the index makes it that much more useful in the growing ecosystem of linked historical geodata.

Community

The WHG project belongs to a growing community interested in linking information about historical places and linking historical information from multiple disciplines via place. As such we are active partners in the Pelagios Network.

Domains of Interest

Pleiades, the "community-built gazetteer and graph of ancient places," is a trailblazing project that more than a decade ago began gathering, curating and sharing contributed data, focused on the Mediterranean region in antiquity. Its success has been instructive in several ways. The project was and is the product of a community—in its case classicists, archaeologists, and historians of the region and period. It has a dozen volunteer editors, numerous reviewers, and continues to grow in depth and breadth.

We at World Historical Gazetteer anticipate our own data aggregation and publishing platform will grow by virtue of similar geographic and temporal "domains of interest." We aim to prioritize projects about the Global South. Our list of early and prospective contributors bears this out—it is spatiotemporally clustered. For example, Werner Stangl's HGIS de las Indias has seeded an early Latin American domain, and several other contributions of colonial and pre-colonial Latin America data are expected soon. Other emerging clusters include: Dutch History, Central Asia, the "Atlantic World," the Ottoman Empire, the Islamic World, and China.

Expanding coverage of linked data resources to include under-represented areas like the Global South are an important priority.

Register/Login/User Profile

Registration and login are required to be able to upload datasets, designate collaborators, to use our reconciliation services to find matches in Wikidata and ultimately, to contribute your data to the WHG index. Once logged in, top-level menu options for "Data", and a user profile appear.

Search :: Places

There are two place data stores in WHG, therefore two search options: our "union index," and the WHG database.

Our union index holds records for about 2 million places (having over 5 million names) that have been fully accessioned. That is, to the extent possible, records for the same place are linked, and returned together in a set. Typing a name then pressing the {Enter} key performs a search, presents a list of results, and maps those which have geometry (not all do). Clicking a result item will highlight it on the map, providing further context. Clicking the name link takes you to the index record's "portal" page where any number of attestation "cards" drawn from our core datasets (grey banner) and multiple contributed datasets (beige banner) are gathered.

The database search option queries records in the WHG database from all published datasets, whether they have been fully accessioned (i.e. reconciled against the union index) or not. Datasets are made public on request by their owners, and following a review by WHG editorial staff.

Pre-filters are available to constrain both kinds of place searches: a) broad feature class, b) temporal (earliest, latest years), and c) spatial (bounds of world regions, modern countries, and user "study areas."). Results can be sorted and further filtered by specific place type or modern country bounds.

The Data dashboard page lists any Datasets and Collections created by a user or for which they are a collaborator, as well as any Study Areas they have created. Clicking an "add new" or "create new" link () starts the process of creating and managing a Dataset, Study Area, or Collection. The 'create' pages in each case describe how to proceed.

Datasets
Place data

Registered users can create Place datasets by uploading files in one of two formats: the expressive GeoJSON-LD based Linked Places format ("LP format" for short), or the simpler LP-TSV format. Considerations for making the choice are found in the "Choosing an upload data format" tutorial.

Help icons () for each field in the "Upload dataset" form provide popup instruction for filling it.

Uploaded data files are validated for adherence to the relevant format spec, LP or LP-TSV. If there are formatting errors, details of the errors are displayed (insofar as possible) and after correcting them the file upload can be attempted again.

Upon successful upload, a Dataset Portal page is displayed.

Collections

Registered users can create two kinds of collections, Place Collections, and Dataset Collections. Significant enhancements are in development for Version 3.

  • Place Collections are thematic sets of place records already in WHG. Each record in a Place Collection can be annotated with a note, temporal information, and user-defined tags. The collection itself can include an accompanying essay, an image, and links to relevant external resources.
  • Dataset Collections are sets of published datasets already in the WHG system. Linking datasets in this way allows users and collaborating groups to assemble a 'Gazetteer of {x}' from multiple discrete sources, and to present them together.
Study Areas

These are user-created named polygon bounds used to constrain the searches used in the reconciliation process. See the Reconciliation section to the right of this one.

The Dataset portal is private to the dataset owner and designated collaborators. It has several tabbed sections: Metadata, Browse, Linking, Collaborators, and Notes & Log.

Metadata

This section displays user-created and auto-generated metadata for a dataset and its most recent uploaded data file source. Several fields can be edited. Status statistics are displayed on the right side of the screen: initial counts of rows, name variants, links, and geometries, as well as counts of link and geometry records added during the reconciliation review process.

Browse

This section combines a sortable, searchable list of the records currently in the dataset, and a map displaying any geometry it includes. Once a reconciliation task has been run, a column and filter dropdown are added, to manage record-level status. Note that new links and geometry from authority records matched in the reconciliation review process are reflected in each record's info box under the map, as new place_link and/or place_geom records are written with each match.

Linking

Linking is a broad term for what our reconciliation service does. Tasks are initiated from this section and listed for access to review screens; a process outlined in a later section. A summary of the initial results is generated and displayed in a list, with links provided to access the review screen,

It is also possible to delete the task, its associated hits, and any match records created so far in review work. Caution! There is no recovery from these clearing actions!

Contributing a dataset to WHG

After a dataset has been uploaded and, using our reconciliation services, augmented with as many links (matches) to modern authorities as possible (see Reconciliation), it can be considered for accessioning&mdashthat is, contributed to the WHG "union index." At this time all accessioning will be initiated by the WHG project team, and review performed by the dataset owner and their designated collaborators, if any.

Accessioning a dataset to WHG entails reconciling its records to the WHG index. Each record is compared with the WHG index to see if the referenced place already has one or more attestations from another dataset. If an incoming record has a concordance "link"" in common with one already in the index, it will be automatically indexed as a "child" or "sibling" of the matched record (owners have the opton to review these or auto-accept them).

Records for which there are no prospective matches are automatically indexed as a "parent"&mdashin effect, a new seed record for a place. Records for which there are prospective matches are queued for review, just as with Wikidata reconciliation tasks. Matched records become part of the set for a given place. Unmatched records become parent "seeds."

In this way, accessioning relies on incoming Place records having as many associated "place_link" records as can be obtained.

Collaborators

Owners of a dataset can designate any number of registered WHG users as collaborators, in either a "co-owner" or "member" role. Co-owners have complete control over a dataset; members can view the dataset and perform review of prospective matches generated by reconciliation tasks.

Notes & Log

Actions related to datasets are logged and listed here.

User-created notes and comments from the Reconciliation Review and Place Portal pages are listed here, with a download option. NOTE: Comments suggest followup action, e.g. correction of errors. We are contemplating how such corrections might be accomplished within the WHG interface.

Reconciliation is the process of linking your place records to existing records in online place name authorities—including, as a last step to the WHG union index. The reconciliation source offered by WHG at this time is our index of 3.5 million Wikidata place records. The purpose of reconciliation to external sources is to augment a dataset with new concordances ("place_link" records in WHG), and optionally, new geometry ("place_geom" records) derived from the authority.

The primary motivation for many users is finding geographic coordinates for unlocated place names, in order to make their data more fully mappable and amenable to spatial and network analyses. Beyond that, making a dataset as rich with links to external authorities as possible is a crucial step in making it a solid contribution to the WHG index. Once historical place names are geolocated, it is to everyone's benefit if that work is shared!

In each case, the Wikidata index is queried for matches with your records, one by one. In fact, each query consists of multiple "passes." The first looks for any authority identifiers in common. If found, the records can be automatically linked. The next pass includes all the context your records may contain: a primary name plus all variants; one or more place type; bounding modern countries or user-defined Study Area as a spatial constraint; coordinate geometry for the feature. Subsequent passes relax the query if no potential matches (hits) are found. Resulting hits for all records are queued for review by the dataset creator, in batches labeled "pass 0," "pass 1" and "pass 2."

Wikidata

WHG maintains a locally indexed copy of about 3.6 million Wikidata place records. Wikidata reconciliation tasks process 150-180 records per minute. Almost all Wikidata records contain geometry and concordances with other authorities. When you confirm a Wikidata match, we create a "place_link" record not only for the Wikidata ID but for concordances with several other authorities if found, including Getty TGN, Bibliotèque nationale de France, Pleiades, Wikipedia, GeoNames, VIAF, Deutsche National Bibliothek, and Library of Congress.

We hope to periodically update our Wikidata index in the future

Study Areas

Many toponyms appear repeatedly in multiple locations, referring to different places, often far apart (e.g. Latin America and the Iberian Peninsula). To aid the reconciliation process, users can define a Study Area that will constrain the search for matches to its bounds, by a) entering a series of 2-letter country codes, which will generate a hull shape, or b) by drawing a polygon on a map. Alternatively, a pre-defined region can be chosen from a separate dropdown menu.

Reviewing "hits"

Prospective matches to Wikidata (and discovered authority IDs) are not automatically added to the WHG database during reconciliation; i.e. new "place_geom" and "place_link" records augmenting the dataset are created only by the Reconciliation Review step performed by the dataset creator or specified collaborators.

The Review page presents those dataset records that got one or more prospective matches ("hits') on the left of the screen, one by one, and a list of those hits on the right. A small map displays geometry for your record (if any) with a green marker, and geometries of all hits with orange markers. Hovering over the globe symbol () in a hit item highlights its position in the map.

The objective is to determine, for each dataset record, whether any of the hits are a "closeMatch" to it. By default, "no match" is selected for each hit. The reviewer can optionally change the selection to "closeMatch" for one or more hits. In any case, clicking save records the choice and advances to the next record. Attributes of the dataset record and listed hits provide context to assist making the assessment.

What is a Match?

The term closeMatch used by WHG comes from the Simple Knowledge Organization System (SKOS) vocabulary, a data model commonly used in linked data applications. For WHG, a Place is considered a SKOS:Concept, described by data records in Linked Places format, so assertions of a skos:closeMatch between your record and that of an external authority indicates:

"...(the) two concepts are sufficiently similar that they can be used interchangeably in some information retrieval applications"

Note that closeMatch is a super-property of exactMatch; that is, every exactMatch is also a closeMatch. Clear? Oh well! Practically speaking, for WHG asserting closeMatch serves as a linking "glue." Specifically, records that share one or more common authority link will be conflated (linked) in our "union index" only, and returned together in response to queries. For example, records for Abyssinia and Ethiopia from different sources share two links, to a DBPedia record and a TGN record. Therefore, they appear together when searching for either Abyssinia or Ethiopia. They are not conflated or linked in any way within the WHG database.

What now?

WHG v2.1 includes several significant improvements to v1, but development of the platform is still at a relatively early stage. The highest immediate priority is adding significantly more historical data. Apart from adding content, several additional features are planned, and now in development towards a Verson 3 to be released in 2024. We are working ot ensure that WHG can be sustained and improved over the long term

What is a gazetteer?

In its simplest form, a gazetteer is a list of place names. Typically, digital gazetteers provide some level of description for each listed place, e.g. its type and geographic coordinates. Historical gazetteers include prior names and some level of temporal information. The Linked Places format (spec ; tutorial) used by WHG allows us to record temporally scoped name variants, coordinates, place types, and relations with other places, as well as related descriptions and depictions.

Who uses historical gazetteers?

Historical gazetteers are useful for anyone concerned with the history of a place or group of places, including researchers, teachers, students, history buffs, and geneaologists. They help connect our present to our past.

What is a place?

The term has multiple related meanings. A few we like: (i) one answer to a where question, (ii) a setting for events and activity, (iii) "...an object resulting from a shared identification of a location. As an object, it may become a part of a network and participate in events" (Purves, Winter & Kuhn 2019 ). Attributes of places include names, locations, and types—all of which routinely change over time.

What is a place in WHG?

A place record represents one or more attestations of a place found in historical sources or in modern gazetteers and name authorities. Due to our use of Linked Places format, a WHG place record may include any number of names, types, locations (geometry), relations, descriptions, depictions, and links with other records. The entire record can be temporally scoped with a "when" attribute, as can any individual name, type, geometry or relation.

What is the geographic and temporal scope of WHG?

The geographic scope is global, and the temporal scope is roughly the span of written history.

Are there "preferred names" in WHG?

No. A place record can include any number of name and language variants; in fact they are encouraged. But we ask that each record have an assigned "title," which serves principally as a headword in lists. The title of records in our union index is the title of its "seed" record.

Does WHG suport multiple languages?

The name variants found in WHG are of numerous languages and scripts. Unfortunately, to date most of the contributed names do not arrive tagged with language-script codes. Separately, the internationalization of the WHG site is a high priority for a future phase of work.

How and why does WHG use modern country boundaries?

Modern country boundaries are used in WHG primarily to filter queries and spatially constrain reconciliation results. Country codes are included in place records for this purpose and do not reflect a given place's historical "containment" in, or association with, administrative areas at any given time in the past.

What is the WHG vocabulary of place types?

We have adopted a set of 179 place type concepts drwan from the approximately 900 contained in the Getty Art & Achitecture Thesuarus . Our focus in selecting these was on settlements (inhabited places), administrative divisions, sites, and natural features. Place types are an important facet for search, filtering, and reconciliation of place records.

Can contributions include urban-scale features?

We are not actively soliciting urban-scale place data (e.g. buildings, streets, monuments, plazas) at this time, but recognize there is growing interest in systems to manage such information.