Pipeline, architecture, and software
- Django 2.2.4 (Python framework)
- PostgreSQL 10 (relational database)
- Elasticsearch 6.6 (index)
- Nginx, Gunicorn (web server)
- Celery (task queueing)
- Ubuntu 18.04 (operating system)
- Interfaces to our two linked data stores, a relational database (db) and a high-speed index (idx), include a graphic web application (GUI) and APIs.
- Contributed data in Linked Places or LP-TSV format is uploaded by registered users to the database (-> db) using GUI screens.
- Once uploaded, datasets are managed in a set of GUI screens, where they can be browsed and reconciled against Getty TGN and Wikidata Reconciliation entails initiating a task and reviewing prospective matches returned.
- Confirming matches to TGN and/or Wikidata augments the contributed dataset by adding new place_link and, if desired, place_geom records. NOTE: The original contribution can always be retrieved in its original state; i.e. omitting records generated by the reconciliation review step.
- Once an uploaded dataset is reconciled and as many place_link records are generated for it as possible, it can be accessioned to the WHG index (idx <- db ). At this time that step will be performed by WHG staff, however...
- Accessioning to the WHG index is another reconciliation process, so there are two steps: initiating the task and reviewing results – in this case only some. Incoming records that share a link to an external gazetteer (e.g. tgn, geonames, wikidata, etc.) with a record already in our index are added automatically, and associated with that match and any other similarly linked "siblings".
- Incoming records that don't share one or more links to existing index items are candidates to become new "parent" ("seed") records in the index. But they must be reviewed by the dataset owner first, as was done in an earlier step.
- That's it!