14Feb2022

Connect your nodegoat environment to Wikidata, BnF, Transkribus, Zotero, and others

CORE Admin

The nodegoat Guides have been extended with a new section on 'Ingestion Processes'. An Ingestion Process allows you to query an external resource and ingest the returned data in your nodegoat environment. Once the data is stored in nodegoat, it can be used for tagging, referencing, filtering, analysis, and visualisation purposes.

You can ingest data in order to gather a set of people or places that you intend to use in your research process. You can also ingest data that enriches your own research data. Any collection of primary sources or secondary sources that have been published to the web can be ingested as well. This means that you can ingest transcription data from Transkribus, or your complete (or filtered) Zotero library.

The development of the Ingestion Process was part of the project 'Dynamic Data Ingestion (DDI)' (presented in this workshop series) and builds upon the Linked Data Resource feature (initially commissioned by the TIC-project in 2015 and extended in collaboration with ADVN in 2019).

Every nodegoat user is able to make use of these features. Next to the examples listed below, every endpoint that outputs JSON or XML can be queried. nodegoat data can be exported in CSV and ODT formats, or published via the nodegoat API as JSON and JSON-LD.

Wikidata

The first two guides deal with setting up a data model for places and people, and ingesting geographical and biographical data from Wikidata: 'Ingest Geographical Data', 'Ingest Biographical Data'. A number of SPARQL-queries are needed to gather the selected data. As writing these queries can be challenging, we have added two commented queries (here and here) that explain the rationale behind the queries.

These first two guides illustrate a common point in working with relational data (e.g. coming from graph databases, or relational databases): you need to first ingest the referenced Objects (in this case universities) before you can make references to these Objects (in this case people attending the universities).

The third guide covers the importance of external identifiers. External identifiers can be added manually, as described in the guide 'Add External Identifiers', or ingested from a resource like Wikidata, as described in the newly added guide 'Ingest External Identifiers'.

Bibliothèque nationale de France

The guide 'Ingest Publication Data' shows how you can make use of the SPARQL-endpoint of the Bibliothèque nationale de France (BnF).

The configured Ingestion Process sends a query with the previously ingested VIAF identifiers to the BnF in order to identify an author and to receive a list of the publications of this author.

Transkribus

Just as you can query centralised data repositories, you can also query endpoints that publish user generated data. A good example of this is the Transkribus transcription platform that publishes the generated transcription data via its API. The guide 'Ingest Transcription Data from Transkribus' shows you how to connect your nodegoat environment to the Transkribus API.

Once you have finalised the transcription process in Transkribus, you can configure Ingestion Processes that transfer the documents and recognised texts to your nodegoat environment. This will allows you to search the full text data of the source documents within your nodegoat environment. You can also start text tagging the ingested transcriptions in order to, for example, identify musicians on concert programs or books mentioned in letters.

Zotero

If you are using Zotero to store and manage your references you can ingest your bibliographic data into nodegoat by means of an Ingestion Process. Follow the guide 'Ingest Bibliographic Data from Zotero' to learn how to set this up and how to run updates whenever you have added new references.

After you have configured this, you can continue to use Zotero as your reference manager while having an up to date copy of your bibliographic data in your nodegoat environment. These references can be used throughout your environment to substantiate the claims you make, see the guide 'Add Source References' on how to set this up.

Consult the documentation to find out how you can configure other Linked Data Resources and Ingestion Processes.

Comments

Add Comment