The goal of this tutorial is to learn how we can link records using references, with a technique similar to JSON Reference.
- Step 1: Bootstrap exercise
- Step 2: Add author reference to the record
- Step 3: Create a JSON resolver
- Step 4: Update the entrypoints
- Step 5: Try it
- Bonus
- About references in Invenio
- What did we learn
In tutorial 08, we have learned how to create a new data model, the author record. It would be now very useful to link a record to his author so that, when using the REST APIs to return records, we can immediately return also the details of the author without performing any extra query. For example, given the following record:
{
"id": 1,
"title": "Invenio is awesome",
"contributors": [
{
"name": "Kent, Clark"
}
],
"contributors_count": 1
}
and its author:
{
"id": 1,
"name": "Goodman, Martin"
}
we would like that when we retrieve the record, the details of the author are automatically included in the each record:
{
"author": {
"id": 1,
"name": "Goodman, Martin"
},
"id": 1,
"title": "Invenio is awesome",
"contributors": [
{
"name": "Kent, Clark"
}
],
"contributors_count": 1
}
How do we do it?
- In the record data model, we define a
$ref
attribute that is an URL pointing to the author record. - When indexing the record, Invenio will find the
$ref
record and use the URL to retrieve the referenced author. - The URL is a Flask route that we have to define and implement and it will be called by Invenio. This won't be a real HTTP call (see below for more explanations).
- The indexed record will contain the replaced object instead of the
$ref
field.
Let's implement it.
If you completed the previous tutorial, you can skip this step. If instead you would like to start from a clean state run the following commands:
cd ~/src/training/
./start-from.sh 10-indexing-records
We need to create a reference (in a similar way as we would do using foreign keys in the SQL world) between record and author. Let's add a new $ref
field in the record data model to reference the author.
my-site/my_site/records/jsonschemas/records/record-v1.0.0.json
"type": "object",
"properties": {
+ "author": {
+ "type": "object",
+ "properties": {
+ "$ref": {
+ "type": "string"
+ }
+ },
+ "required": ["$ref"]
+ },
"title": {
"description": "Record title.",
"type": "string"
},
Since we have changed the data model, we need to change the Elasticsearch mappings because you want to search records by author metadata.
my-site/my_site/records/mappings/v7/records/record-v1.0.0.json
+ "author": {
+ "type": "object",
+ "properties": {
+ "id": {
+ "type": "integer"
+ },
+ "name": {
+ "type": "text"
+ }
+ }
+ },
"id": {
"type": "keyword"
},
What will now happen is that when creating a new record, there will be a new $ref
field with the URL to resolve the related author. For example, $ref: https://my-site.com/api/resolver/author/1
.
When indexing, the invenio-indexer
module will find the $ref
field and, instead of calling the URL and performing the HTTP request, it will try to find if there is a Flask route registered with that URL. In that case, it will execute the method of the route itself and replace the $ref
field with the output of the method. This method is a called JSON resolver
.
Let's create a new file:
my-site/my_site/records/jsonresolvers.py
"""Author resolver."""
from __future__ import absolute_import, print_function
import jsonresolver
from invenio_pidstore.resolver import Resolver
from invenio_records.api import Record
# the host corresponds to the config value for the key JSONSCHEMAS_HOST
@jsonresolver.route('/api/resolver/author/<authid>', host='my-site.com')
def record_jsonresolver(authid):
"""Resolve referenced author."""
# Setup a resolver to retrieve an author record given its id
resolver = Resolver(pid_type='authid', object_type="rec", getter=Record.get_record)
_, record = resolver.resolve(str(authid))
# we could manipulate here the record and eventually add/remove fields
del record['$schema']
return record
We need to add the JSON resolver method in the entrypoints.
my-site/setup.py
'authors = my_site.authors.jsonschemas',
],
+ "invenio_records.jsonresolver": [
+ "author = my_site.records.jsonresolvers",
+ ],
'invenio_search.mappings': [
'records = my_site.records.mappings',
We can now try to create an author and then a record with a reference to it. But first, since we have changed schema, mappings and entrypoints, let's re-install the app and re-init DB and Elasticsearch.
pipenv run pip install -e .
./scripts/setup
./scripts/server
Create a new author:
curl -k --header "Content-Type: application/json" \
--request POST \
--data '{"name": "Goodman, Martin"}' \
"https://127.0.0.1:5000/api/authors/?prettyprint=1"
firefox http://127.0.0.1:9200/authors/_search?pretty=true
Now, stop the server.
Create a new record. In the $ref
field, we will put the route URL that we have defined in the JSON resolver method. To use the REST API, we would have to change the loaders, since the author
field is not defined. Let's create the record using the CLI.
cd ~/src/my-site
echo '{"author": { "$ref": "https://my-site.com/api/resolver/author/1" }, "title": "Invenio is awesome", "contributors": [{"name": "Kent, Clark"}], "owner": 1}' | pipenv run invenio records create --pid-minter recid
Let's re-index the newly create record so that the $ref
attribute will be replaced:
pipenv run invenio index reindex --pid-type recid --yes-i-know
pipenv run invenio index run
Now, we can query Elasticsearch and verify that the author metadata are in the record.
./scripts/server
firefox http://127.0.0.1:9200/records/_search?pretty=true
If, instead, we check what's in the database (using the Admin panel) we can see that the record has still the $ref
field:
# create an admin user
pipenv run invenio users create [email protected] --password 123456 --active
pipenv run invenio roles add [email protected] admin
firefox https://127.0.0.1:5000/admin/persistentidentifier/
Did you notice that the contributors_count
field is also showing up in the author records?
firefox http://127.0.0.1:9200/authors/_search?pretty=true
Can you guess why? How can we fix it?
Invenio uses the jsonresolver module to define and resolve references between records. The $ref
URL is generated using an host domain defined by the config variable JSONSCHEMA_HOST
(it is not meant to be an existing URL) to be able to avoid performing real HTTP request when resolving but instead calling a Flask route method implementation.
Given that it is an internal reference, you should not expose this field and URL when returning the schema and the records through APIs. To avoid that, by default invenio-records-rest JSON serializers defines replace_refs=True
as parameter.
Moreover, as you saw, the URL will be hard-coded in each record. If you need to change it, you will have most probably to perform an update to all your records.
To hide references and solve this problem, we recommend you to create your own record class and add the author
field creation in the overridden create
method. In this way, everything you create a record, the $ref
is automatically generated. As reference, here an example:
class MyRecord(Record):
_schema = "myrecords/myrecord-v1.0.0.json"
@classmethod
def create(cls, data, id_=None, **kwargs):
"""Create MyRecord record."""
data["$schema"] = current_jsonschemas.path_to_url(cls._schema)
data["author"] = {
"$ref": "http://mysite.com/resolver/{}".format(data['author_id'])
}
return super(MyRecord, cls).create(data, id_=id_, **kwargs)
- We have understood how Invenio uses JSON references
- We have seen how to create a reference between 2 records
- We have learned how to implement a JSON resolver