Ingest External Identifiers

In this guide we will enrich the data that has been ingested in the guide 'Ingest Biographical Data'. We will run a query for every Object of the Type 'Person' in order to fetch the VIAF identifier of each person. The URI stored in each Object will be sent as part of the query to the Wikidata endpoint and the resulting VIAF ID will be stored in a newly created Object Description.

Model  

To store this data we need to update the Object Type 'Person' was configured in the guide 'Ingest Biographical Data'. Alternatively, you can use the data model created in the guide 'Create your first Object Type'.

Go to Model and go to the tab 'Object Types'. Click the blue 'edit' button at the Object Type 'Person'. Specify an Object Description with the name 'VIAF Identifier'. Leave the value type set to 'String' and check the checkboxes 'Quick Search' and 'Overview'.

Click 'Save Type' to save your Object Type.

Linked Data Resource  

Go to Model and go to 'Linked Data'. Click 'Add Linked Data Resource' and give the resource a name like 'Wikidata VIAF Identifiers'. Enter the request URL 'https://query.wikidata.org/sparql?query=' in the 'URL' input field and enter '&format=json' in the 'URL Options'. Leave the 'URL Headers' empty.

We will use this SPARQL query:

SELECT ?viaf_id
WHERE {
<http://www.wikidata.org/entity/Q355237> wdt:P214 ?viaf_id.
}

The Ingestion Process that we will create needs to be able to adjust the Wikidata URI when it runs this query for every Object of the Type 'Person'. We can facilitate this by indicating the parts that can be modified when the query runs.

We do this by using the [query] and [variable] tags. The [query] tag is used to indicate an adjustable part of the query in which one or multiple [variable] tags exist. [variable] tags are assigned a value by the Ingestion Process every time the query runs.

When we implement this, the query looks like this:

SELECT ?viaf_id
WHERE {
[query=getVIAF]
<[variable=wikidataURI]http://www.wikidata.org/entity/Q355237[/variable]> wdt:P214 ?viaf_id.
[/query]
}

Paste this query in the 'Query' input field. Click the green 'test' button to run the query. This action will combine the values in the 'URL' field, the 'URL options' field and the 'Query' field.

After the query has run, the results are shown in the 'Response' input field. This field allows you to inspect the results and to verify that all specified variables have been returned. By clicking the green 'use' button, the returned data populates the mapping options.

After the 'use' button is clicked, you can select the position in the returned data that contains the URI and Label. As this query does not return a URI or a label, we use the 'viaf_id' value as a placeholder. To do this, change the dropdown menu next to the label 'URI' and 'Label' to: {"results":{"bindings":{"[]":{"viaf_id":{"value":""}}}}}.

To process the 'viaf_id' value we a custom key/value pair. Enter the name for the key: 'VIAF Identifier'. Change the dropdown menu for this value to: {"results":{"bindings":{"[]":{"viaf_id":{"value":""}}}}}.

Click 'Save Linked Data Resource' to store this configuration.

Ingestion Processes  

Ensure that you have the Object Type 'Person' as well as the Ingestion Processes enabled in you project by going to Management and select 'Projects'. Edit your project and enable the System Process 'Ingestion' as well as the Object Type 'Person'.

Go to the Data section of your environment. Go to the tab 'Processes', click 'Ingestion', and click 'Add Ingestion'.

Give the Ingestion Process a name like 'Store Wikidata VIAF Identifiers'. Use the dropdown menu with the label 'Source' to select the Linked Data Resource 'Wikidata VIAF Identifiers'. Use the dropdown menu with the label 'Target' to select the Object Type 'Person'. This process will update the previously ingested Objects, so change the mode to 'Update Existing Objects'.

Disregard the 'Query / Filter External Resource By Value' form section.

Use the form section 'Link External Resource To Object' to establish a link between the returned data and the data already present in your nodegoat environment. Set 'Identify Objects By' to 'Query' and use the first dropdown menu ('Variable in the Linked Data query.') to select the variable 'getVIAF: wikidataURI'. Use the second dropdown menu ('Target Element in Data Model.') to select the 'URI' Object Description. With these settings in place, the Ingestion Process will run a query for every Object of the Type 'Person' and will assign the value of the 'URI' Object Description (e.g. 'http://www.wikidata.org/entity/Q355237') to the variable 'getVIAF: wikidataURI'.  

Use the form section 'Map External Resource To Data Model' to connect the returned variables to elements in the selected Object Type. In this case you connect the variable 'VIAF Identifier' to the Object Description 'VIAF Identifier'.  

Set 'uri' and 'label' to empty values and use the red 'del' button to remove them.

With these settings in place, the Ingestion Process will run a query for every Object of the Type 'Person' and will update the Object with the returned VIAF Identifier.

Click 'Save Ingestion'.

You now see the newly created Ingestion Process listed in your overview of Ingestion Processes. Click the green 'run' button on the right side of this overview to run this Ingestion Process.

Click 'Run Ingestion' to run the Ingestion Process. The Ingestion Process runs and informs you about the results. If everything went correctly you now have updated the Objects of the Type 'Person'. Open the Type 'Person' to see the updated Objects. As not all people will have a VIAF identifier stored in Wikidata, some Objects might not have been updated.

Appendix: Transforming Identifiers to URLs or URIs  

While having the Identifiers in place is the most crucial part of this process, it might be helpful to use them to navigate to external resources. To do this, we need to prepend 'https://viaf.org/viaf/' to the stored value. In this case the URL and URI are interchangeable.

Go to Model and go to 'Linked Data'. Click 'Add Linked Data Resource' and give the resource a name like 'VIAF URL' and change the protocol to 'Static Link'. Enter the request URL 'https://viaf.org/viaf/' in the 'URL' input field and leave the 'URL Options' emtpy.

Click 'Save Linked Data Resource' to store this configuration.

Go to Model and go to the tab 'Object Types'. Click the blue 'edit' button at the Object Type 'Person'. Find the Object Description with the name 'VIAF Identifier' and change the value type to 'External'. Use the dropdown menu that appears to select the Linked Data Resource 'VIAF URL'.

Go to the Data section of your environment. Go to the tab 'Objects' and open the Object Type 'Person'. When you inspect an Object of a person that has a VIAF Identifier you will be able to click the generated URL to go to the VIAF website.