Skip to content
This repository has been archived by the owner on Jun 23, 2020. It is now read-only.

Enrichment OpenLibrary, Gutenberg, dbpedia (and maybe dewey labels) #667

Open
1 of 4 tasks
dr0i opened this issue May 12, 2015 · 4 comments
Open
1 of 4 tasks

Enrichment OpenLibrary, Gutenberg, dbpedia (and maybe dewey labels) #667

dr0i opened this issue May 12, 2015 · 4 comments
Assignees
Labels

Comments

@dr0i
Copy link
Member

dr0i commented May 12, 2015

With the new way of getting transforming the data ( without using hadoop , s. hbz/lobid#139) we lost our enrichment to

This must now be done in another way.

@dr0i dr0i self-assigned this May 12, 2015
@dr0i dr0i added the ready label May 12, 2015
@fsteeg fsteeg removed the ready label Jan 18, 2016
@dr0i dr0i added the ready label May 8, 2017
@dr0i
Copy link
Member Author

dr0i commented Apr 17, 2018

With hbz/lobid-resources@26e4a06#diff-070376a28f971f006644814c8b3860ec (enrichment of wikidata geo data) comes the enrich-method in ElasticsearchIndexer.java: a lookup in an other index is done and the result merged into the ETL-result of the hbz01 (thus it becomes the new lobid-resource). The idea is to not make many lookups for enrichments (3 alone mentioned in this issue) but to have ONE parallel enrichment-index: every lobid-resource would make ONE lookup and merge the result. The enrichment-index would be updated independently of the indexing of lobid-resources and thus could take all the time it needs to be build. Also, most of the time there will be only a few updates in the enrichment index at all. So: when doing a fulldump-reindexing: only one lookup on a preprocessed enrichment index , while the update of that enrichment index will be a) independently of lobid-resources and b) even if it's an aggregated index: not much changes expected.

@dr0i
Copy link
Member Author

dr0i commented Apr 17, 2018

Re last comment, in short: the mentioned enrichment-index would be the entityfacts for hbz01-lobid-resources.

@acka47
Copy link
Contributor

acka47 commented Apr 18, 2018

the mentioned enrichment-index would be the entityfacts for hbz01-lobid-resources.

Why? I think we just need EntityFacts for lobid-gnd. For NWBib and probably also lobid-resources we will load EntityFacts data on the fly, see hbz/nwbib#427.

@dr0i
Copy link
Member Author

dr0i commented Apr 19, 2018

Entityfacts just enriches the gnd, but not lobid-resources (e.g. books) . With "entityfacts for hbz01-lobid-resources" I don't mean to index entityfacts into lobid-resources but to build "something-like-it" for our catalogs entries (lobid-resources).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants