Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export of senior theses record should not include pu. fields from dspace #85

Closed
1 task
christinach opened this issue Oct 8, 2024 · 7 comments · Fixed by #86 or #87
Closed
1 task

Export of senior theses record should not include pu. fields from dspace #85

christinach opened this issue Oct 8, 2024 · 7 comments · Fixed by #86 or #87
Assignees
Labels

Comments

@christinach
Copy link
Member

christinach commented Oct 8, 2024

Expected behavior

Example record from last successfully indexed export

{
    "id": "dsp01vh53wv957",
    "title_t": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_citation_display": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_display": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_sort": "computeranalysisofthetransientresponseofpressuretransducerstoshockinputs",
    "author_sort": "Pbi, W. C.",
    "electronic_access_1display": "{\"http://arks.princeton.edu/ark:/88435/dsp01vh53wv957\":[\"DataSpace\",\"Citation only\"]}",
    "restrictions_note_display": [
      "This thesis can be viewed in person at the <a href=http://mudd.princeton.edu>Mudd Manuscript Library</a>.  \nTo order a copy complete the <a href=\"http://rbsc.princeton.edu/senior-thesis-order-form\" target=\"_blank\">Senior Thesis Request Form</a>.  \nFor more information contact <a href=mailto:[email protected]>[email protected]</a>."
    ],
    "call_number_display": "AC102",
    "call_number_browse_s": "AC102",
    "language_facet": "English",
    "language_name_display": "English",
    "author_display": [
      "Pbi, W. C."
    ],
    "author_s": [
      "Pbi, W. C.",
      "Princeton University. Department of Aeronautical Engineering"
    ],
    "department_display": [
      "Princeton University. Department of Aeronautical Engineering"
    ],
    "location": "Mudd Manuscript Library",
    "location_display": "Mudd Manuscript Library",
    "location_code_s": "mudd$stacks",
    "advanced_location_s": [
      "mudd$stacks",
      "Mudd Manuscript Library"
    ],
    "access_facet": "In the Library",
    "holdings_1display": "{\"thesis\":{\"location\":\"Mudd Manuscript Library\",\"library\":\"Mudd Manuscript Library\",\"location_code\":\"mudd$stacks\",\"call_number\":\"AC102\",\"call_number_browse\":\"AC102\",\"dspace\":true}}",
    "class_year_s": [
      "1966"
    ],
    "pub_date_start_sort": [
      "1966"
    ],
    "pub_date_end_sort": [
      "1966"
    ],
    "format": "Senior thesis"
  },

Actual behavior

The same record using version v1.4.3

{
    "id": "dsp01vh53wv957",
    "title_t": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_citation_display": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_display": "Computer Analysis of the Transient Response of Pressure Transducers to Shock Inputs",
    "title_sort": "computeranalysisofthetransientresponseofpressuretransducerstoshockinputs",
    "author_sort": "Pbi, W. C.",
    "electronic_access_1display": "{\"http://arks.princeton.edu/ark:/88435/dsp01vh53wv957\":[\"DataSpace\",\"Full text\"]}",
    "pu.embargo.lift": null,
    "pu.embargo.terms": null,
    "pu.mudd.walkin": null,
    "pu.location": [
      "This thesis can be viewed in person at the <a href=http://mudd.princeton.edu>Mudd Manuscript Library</a>.  \nTo order a copy complete the <a href=\"http://rbsc.princeton.edu/senior-thesis-order-form\" target=\"_blank\">Senior Thesis Request Form</a>.  \nFor more information contact <a href=mailto:[email protected]>[email protected]</a>."
    ],
    "dc.rights.accessRights": null,
    "call_number_display": "AC102",
    "call_number_browse_s": "AC102",
    "language_facet": "English",
    "language_name_display": "English",
    "author_display": [
      "Pbi, W. C."
    ],
    "author_s": [
      "Pbi, W. C.",
      "Princeton University. Department of Aeronautical Engineering"
    ],
    "department_display": [
      "Princeton University. Department of Aeronautical Engineering"
    ],
    "access_facet": "Online",
    "electronic_portfolio_s": "{\"thesis\":{\"call_number\":\"AC102\",\"call_number_browse\":\"AC102\",\"dspace\":true}}",
    "class_year_s": [
      "1966"
    ],
    "pub_date_start_sort": [
      "1966"
    ],
    "pub_date_end_sort": [
      "1966"
    ],
    "format": "Senior thesis",
    "restrictions_note_display": [
      "This thesis can be viewed in person at the <a href=http://mudd.princeton.edu>Mudd Manuscript Library</a>.  \nTo order a copy complete the <a href=\"http://rbsc.princeton.edu/senior-thesis-order-form\" target=\"_blank\">Senior Thesis Request Form</a>.  \nFor more information contact <a href=mailto:[email protected]>[email protected]</a>."
    ]
  },

Steps to replicate

Run the rake task on bibdata-staging

Impact of this bug

The rake task fails to index because of the new unnecessary fields in the export json

    "msg":"2 Async exceptions during distributed update: \nError from server at http://lib-solr-staging5d.princeton.edu:8983/solr/catalog-staging1_shard1_replica_n2/: null\n\n\n\nrequest: http://lib-solr-staging5d.princeton.edu:8983/solr/catalog-staging1_shard1_replica_n2/\nRemote error message: ERROR: [doc=dsp01vh53wv957] unknown field 'pu.location'\nError from server at http://lib-solr-staging5d.princeton.edu:8983/solr/catalog-staging1_shard2_replica_n8/: null\n\n\n\nrequest: http://lib-solr-staging5d.princeton.edu:8983/solr/catalog-staging1_shard2_replica_n8/\nRemote error message: ERROR: [doc=dsp0141687h67f] unknown field 'pu.location'",
    "code":400}}

Acceptance criteria

  • The export does not include the pu. fields

Implementation notes, if any

@jrgriffiniii
Copy link
Contributor

Locally this seems to be functioning without any errors:

bundle exec rake oai:index_record[oai:dataspace.princeton.edu:88435/dsp01vh53wv957] SOLR="http://localhost:8983/solr/orangetheses-core-development"
[10:53:32] INFO: Adding dsp01vh53wv957

@jrgriffiniii
Copy link
Contributor

I have just tested this against the staging environment for bibdata, and it successfully completed for the theses collection. However, I must please defer to others in DACS in order to be certain that this indeed fixing the indexing errors.

@christinach
Copy link
Member Author

@jrgriffiniii I posted an update on the PR that there is still a failure. The rake task can successfully export the records. However because the field pu.location exists in the export, it causes the POST to the solr index to fail.

The last orangetheses ref that works is 4ac8dc2bd04b10db764fc37df3261531c9937061 https://github.com/pulibrary/bibdata/blob/2adad5269031fd31a80d72f7e68bfb226d6f85ce/Gemfile#L50

@jrgriffiniii
Copy link
Contributor

I am very sorry, I may need to please request for assistance with this, as I am finding the following when I invoke bundle exec rake oai:index_all[com_88435_dsp019c67wm88m] SOLR="http://lib-solr8-prod.princeton.edu:8983/solr/catalog-alma-production", that this succeeds. I assumed that this transmitted a POST request to the Solr endpoint.

@jrgriffiniii
Copy link
Contributor

I was corrected and I am now testing against the following:

bundle exec rake orangetheses:cache_theses

@jrgriffiniii
Copy link
Contributor

I have tested the following successfully:

RAILS_ENV=staging SOLR="http://lib-solr8d-staging.princeton.edu:8983/solr/catalog-staging" bundle exec rake orangetheses:cache_collection[361]

@christinach
Copy link
Member Author

bundle exec rake orangetheses:cache_theses will create a json file with the desired records from dspace.
I'm happy to test the changes on bibdata staging or if you wish to try indexing on staging please follow:

  1. create a bibdata branch using the specific branch from orangetheses in the Gemfile
  2. Deploy the bibdata branch to bibdata staging environment
  3. ssh [email protected]
  4. cd /opt/bibdata/current
  5. FILEPATH=/home/deploy/theses.json bundle exec rake orangetheses:cache_theses (this will override the existing theses.json file which is ok.)
  6. curl 'http://lib-solr8d-staging.princeton.edu:8983/solr/catalog-staging/update?commit=true' --data-binary @/home/deploy/theses.json -H 'Content-type:application/json'

christinach added a commit that referenced this issue Oct 11, 2024
 Theses fields are not included in the solr document and the theses.json file fails to index
christinach added a commit that referenced this issue Oct 14, 2024
Theses fields are not included in the solr document and the theses.json file fails to index
christinach added a commit that referenced this issue Oct 14, 2024
Theses fields are not included in the solr document and the theses.json file fails to index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment