Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LCSH and LCNAF vocabularies #594

Open
nichtich opened this issue Nov 26, 2020 · 7 comments
Open

Support LCSH and LCNAF vocabularies #594

nichtich opened this issue Nov 26, 2020 · 7 comments
Labels
question Further discussion needed

Comments

@nichtich
Copy link
Member

Matt Miller gave an excellent presentation at SWIB2020 (recording available soon). LoC has connected more than a million authority records and Wikidata. We should allow to edit and extend this mappings with Cocoda.

  • Either create a cron job to download, convert and import LCSH data and LCNAF data into jskos-server
  • Or access individual records in JSON and somehow wrap search form (there seems to be no public search API?)
@nichtich nichtich added the question Further discussion needed label Nov 26, 2020
@stefandesu
Copy link
Member

  • Bulk downloads for LCSH and LCNAF are available here: https://id.loc.gov/download/
    • Note that LCC (not mentioned here, but related) is not available for some reason.
    • Note also that LCNAF is extremely large (depending on the format between 2 GB and 11 GB for the whole dump), so it might make sense to use their API to access it.
  • The public search API is documented here: https://id.loc.gov/techcenter/searching.html
    • The default suggest service returns data in OpenSearch Suggest Format, so the same as our suggest APIs. 👍

We could try to add an LoC provider to cocoda-sdk and use their APIs first, maybe starting with only URIs and labels, to see whether it works or whether we need to import the data ourselves. What do you think?

@stefandesu
Copy link
Member

It also seems like their SKOS output does not contain all the information we need (for example the broader concept seems to be missing), so we'd probably need to parse MADS/RDF output (which shouldn't be much of an issue). Also, their structure seems to differ from what we're used to: If I understand correctly, you have a scheme (e.g. LCC) which has top level members that are so called "collections", and those collections have members that are classes (and after that you have the normal hierarchy). However, it doesn't seem to differ from scheme -> top concepts -> concepts, only that we need to parse it a bit differently.

@nichtich
Copy link
Member Author

nichtich commented Aug 26, 2021

A LoC provider had the advantage of not requiring to set up a database and always being up-to-date, so I'd prefer this way. Individual vocabularies could be configured manually nevertheless (instead of parsing the content of https://id.loc.gov/) so special treatment of vocabularies to get top contents should be no problem. Information about narrower concepts is (only) included in MADS indeed, so MADS/RDF - JSON (example for http://id.loc.gov/authorities/subjects/sh91005240) looks good.

@stefandesu
Copy link
Member

With your specific example, I realized that we can't offer a full concept tree for LCSH and LCNAF. First of all, these are not monohierarchical - every concept can belong to more than one collection, as does your stated example (LCSH Collection - Authorized Headings and LCSH Collection - General Collection). Secondly, those collections have many thousand submembers. "Authorized Headings" for example has 452,612 members, so a hierarchical view doesn't make sense, unfortunately.

However, for now it should be fine to have a search and display concept details. I will also try to implement the hierarchy for LCC because it seems to be monohierarchical. This also means that we might need to parse things differently depending on which scheme we're in. Not optimal, but as long as it's possible to use those schemes in Cocoda, it should be fine for now.

@nichtich nichtich added this to the 1.5.0 milestone Sep 3, 2021
@nichtich
Copy link
Member Author

This issue has been solved in cocoda-sdk but needs response from LoC and for full coverage mappings in Wikidata should be supported by wikidata-jskos

@stefandesu
Copy link
Member

Since LCSH and LCNAF are now including with the public release of Cocoda 1.4.7, I'm closing this issue. Support for LCC and saving mappings into Wikidata is still open (see gbv/wikidata-jskos#69).

@stefandesu stefandesu removed this from the 1.5.0 milestone Oct 21, 2021
@nichtich
Copy link
Member Author

Implementation in wikidata-jskos looks good but mappings WD-LCSH and WD-LCNAF are not shown in Cocoda mapping navigator (although searchable in mapping search) and labels are not retrieved. This needs investigation.

@nichtich nichtich reopened this Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further discussion needed
Projects
None yet
Development

No branches or pull requests

2 participants