WHG Walkthrough: Uploading and reconciling data

Take the following steps to walk through uploading sample datasets to WHG and performing reconciliation against our internal index of the complete Getty Thesaurus of Geographic Names (TGN)

Register and Upload

  • Register on the site and log in
  • Click menu option 'Data'
  • Click "add new" link
  • Fill in Create Dataset form
    • Title: any string
    • Label: a unique string, 20 or fewer characters
    • hint: try part of the title + '_' + your initials
    • Description: Briefly
    • URI base: leave blank unless the data is published and each record has a distinct URI, e.g. 'http://myorg.org/places/99999'
    • Web page: URL to a project page
    • Public?: Check if it's okay for anyone to view the data
    • Initial file
      • Choose a file from disk: Your own, or an example file. These are available via a link on the right side of the screen. Click to download the .zip file to a location you'll remember and expand it. Select one of these - either tsv or lpf format.
      • Format: LP-TSV (delimited text; spec); Linked Places format (JSON-LD & GeoJSON compatible; spec).
  • Click the "Upload" button.
    • A fairly rigorous validation of file format is performed, and errors are reported on the right side of the screen. Some formatting errors may produce unexpected errors. Please get in touch via the Contact form and we will help troubleshoot the problem.

If there are no errors the file's its contents are inserted into the WHG database and you are directed to its Portal page.

At this stage, you can:

  • browse the contents of the uploaded data on the Browse tab
  • initiate reconciliation tasks to find prospective matches in Wikidata and/or the Getty Thesaurus of Geographic Names (TGN)

Reconciliation

WHG reconciliation services allow dataset owners to augment their data with a) additional geometry for more complete mapping and analysis, and b) links to (matches with) modern name authority resources like Getty TGN, Wikidata, and via Wikidata concordances, GeoNames, VIAF AND Library of Congress. Those links are the essential "glue" enabling the semi-automation of accessioning to WHG.

  • Leave default settings in place, with Getty TGN selected
  • Click the "Start" button
  • For each record in the uploaded data file, a search is performed against an indexed copy of the most of TGN (~1.8m records). Up to three passes (queries) are made; if the first returns no results, the second is performed, and so on. These are labeled pass1, pass2, and pass3 in the results.
  • Upon completion, a result summary is displayed, with links to review the prospective matches (hits) for each pass.
  • Click on the first 'review' link in the list on the right to begin

Reconciliation Review

Once a reconciliation task is complete, dataset owners and team members must review the prospective matches, declaring match/no match for each. This is made easier with our Review screen.

  • The Reconciliation Review screen presents all of the uploaded records that had any hits, one by one on the left and a list of the hits on the right.
  • The default selection for all hits is "no match." If any of the hits on the right are a 'close match' with your record, click the appropriate radio button. In either case, click the "Save" button to record your decisions. The screen then advances to the next record and the previous is removed from the queue.
  • Assertions of matches are saved to the WHG database as 'place_link' records, associated with your dataset's place record.
  • Additionally, if the "accept geometries in matches" box was checked when creating this reconciliation task (default is "yes"), any geometries in the authority record (TGN in this case), are saved as new place_geom records, and are now associated with your dataset's place record.
  • A help icon links to an explanation of the formal relation, "closeMatch."
  • The "related" choice is available experimentally. Any "related" assertions are recorded, but are not reflected in the interface at this time.
  • Note that if your record has a geometry, it will show up in the map as a green marker, and geometries from all of the hits appear as orange markers. Hovering over the globe symbol for a hit will highlight that record on the map.
  • Note that after the first save, an undo link appears on the left side of the grey banner. This will undo any result of the last save and return that record to the queue.

After reviewing all hits from all passes, affirming any matches discovered, you will have effectively augmented your dataset in the WHG database with new place_link and place_geom records. Those additions will be reflected in the Browse tab map and record details. Also, your dataset is now prepared for accessioning to the WHG index. Note: At this time, accessioning will be performed by WHG staff in consultation with contributors.