Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespaces in authorites are retained in Standard Titles #1575

Open
jenniferward opened this issue May 14, 2024 · 7 comments
Open

Whitespaces in authorites are retained in Standard Titles #1575

jenniferward opened this issue May 14, 2024 · 7 comments
Milestone

Comments

@jenniferward
Copy link
Contributor

Whitespaces are (conveniently) stripped from Sources at the start or end of a field (see #1409) but apparently they are retained elsewhere in Muscat. I've noticed it in the Titles. This interferes with the alphabetizing and can also inadvertently lead to duplicates.

Standard titles sorted alphabetically: The ones with whitespaces are at the top.
https://muscat-test.rism.info/admin/standard_titles?clear_filters=true&order=title_asc
image

How the first one looks in Edit mode: https://muscat-test.rism.info/admin/standard_titles/50206615/edit
image

We ended up with 12 Sonatas with a whitespace at the beginning:
https://muscat-test.rism.info/admin/standard_titles/5046265
image

but also the correct 12 Sonatas
https://muscat-test.rism.info/admin/standard_titles/3911582
The one with the space at the beginning is showing up in Sources (not stripped, even after saving):
https://muscat-test.rism.info/admin/sources/1001056785/edit
image

@lpugin
Copy link
Contributor

lpugin commented May 14, 2024

Additionally, we have about 9,000 duplicates...

@jenniferward
Copy link
Contributor Author

@xhero
Copy link
Contributor

xhero commented May 15, 2024

It could be I think they are only stripped in marc, I imagine this is stuff people copy and paste around?

@jenniferward
Copy link
Contributor Author

Yes, 'people' have been known to copy things from anywhere, even from Muscat!

@jenniferward jenniferward added this to the 11 milestone May 16, 2024
@xhero
Copy link
Contributor

xhero commented Jun 10, 2024

I'm looking at this, it seems that the auth files do not strip the input data. I think we need to do it in two steps:

  • Fix all identical duplicates, this is a small nightmare but it should be doable
  • Automatically strip whitespace to avoid new ones

@fjorba
Copy link
Contributor

fjorba commented Jun 10, 2024

I'm looking at this, it seems that the auth files do not strip the input data. I think we need to do it in two steps:

If my experience helps with cases like this one, I think that it is better to first avoid new cases and then fix the old errors. Because otherwise, there is always the chance a newer one pops up after the correction and before the fix is applied. But of course, you know your workflow better.

@xhero
Copy link
Contributor

xhero commented Jun 10, 2024

Normally I would do the same! But in this case I don't want to make a fix that automatically strips whitespace on save, and then have problems when editors save old records that might collide with new ones (triggering the unique constraints) if the record is saved again.
In any case fixing the data and updating the system will happen at upgrade time, when the system is offline, so there is no risk of these kind of problems. It is mostly to say that this problem will not be fixed in 11 :)

xhero added a commit that referenced this issue Jun 11, 2024
xhero added a commit that referenced this issue Jun 17, 2024
* develop: (362 commits)
  Add Saudi Arabia to Institution Country Codes list
  Add a paper trail event fot the 856 fix
  Fix #1596 make 370 work repeatable
  Update changelog
  Update translations
  All the landing pages!
  Add paper trail message
  #1575 Initial script
  Fix #1081, migrate 856 y to z
  #1595 no need to print the names
  Fix #1595, ignore punctuation in names
  Fix #1592 add Bolivia to SecLit
  Fix #1593, 690 view
  Fix missing label to owner autocomplete
  Remove space from changelog
  Bump actionpack from 7.0.8.1 to 7.0.8.4
  Update CHANGELOG
  Fix variable name..
  Update CHANGELOG
  Fix invalid next
  ...

# Conflicts:
#	Gemfile
#	Gemfile.lock
#	app/models/holding.rb
#	app/models/institution.rb
#	app/models/person.rb
#	app/models/work.rb
#	db/schema.rb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants