Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/3374 three letter language code #2188

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from

Conversation

philipkcl
Copy link
Contributor

@philipkcl philipkcl commented Jan 6, 2023

Switch to the three letter code language code list #3374

Please don't delete any sections when completing this PR template; instead enter N/A for checkboxes or sections which are not applicable, unless otherwise stated below

See #

Describe the scope/purpose of the PR here in as much detail as you like

  • no migration needed, system compatible with 2 and 3 letter language code, value of language code will be save as 3 letter code if form saved .
  • orphaned_datasets.py have not been update, because it seem to be outdated and not working
  • DATAOBJ_TO_MAPPING_DEFAULTS added country_code_3

Categorisation

This PR...

  • has scripts to run
  • has migrations to run
  • adds new infrastructure
  • changes the CI pipeline
  • affects the public site
  • affects the editorial area
  • affects the publisher area
  • affects the monitoring

Basic PR Checklist

  • FeatureMap annotations have been added
  • Unit tests have been added/modified
  • Functional tests have been added/modified
  • Code has been run manually in development, and functional tests followed locally
  • No deprecated methods are used
  • No magic strings/numbers - all strings are in constants or messages files
  • ES queries are wrapped in a Query object rather than inlined in the code
  • Where possible our common library functions have been used (e.g. dates manipulated via dates)
  • If needed, migration has been created and tested locally
  • Have you done a recent merge up from develop
  • Cleaned up commented out code, etc
  • Release sheet has been created, and completed as far as is possible https://docs.google.com/spreadsheets/d/1Bqx23J1MwXzjrmAygbqlU3YHxN1Wf7zkkRv14eTVLZQ/edit
  • Documentation updates - if needed - have been identified and prepared for inclusion into main documentation (e.g. added and highlighted/commented as appropriate to this PR)
  • There has been a recent merge up from develop (or other base branch)
  • The docs for this branch have been generated and pushed to the doc site (see docs/README.md for details)

Testing

List the Functional Tests that must be run to confirm this feature

  1. ...
  2. ...

Deployment

What deployment considerations are there? (delete any sections you don't need)

Configuration changes

What configuration changes are included in this PR, and do we need to set specific values for production

Scripts

What scripts need to be run from the PR (e.g. if this is a report generating feature), and when (once, regularly, etc).

Migrations

What migrations need to be run to deploy this

Monitoring

What additional monitoring is required of the application as a result of this feature

New Infrastructure

What new infrastructure does this PR require (e.g. new services that need to run on the back-end).

Continuous Integration

What CI changes are required for this

@philipkcl philipkcl requested review from Steven-Eardley and richard-jones and removed request for Steven-Eardley January 6, 2023 14:51
@philipkcl philipkcl marked this pull request as ready for review January 6, 2023 14:52
Copy link
Contributor

@richard-jones richard-jones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One structural change, and the other thing is that we'll need a migration for this to move all 2 letter codes in the database to 3 letter ones. You can look in portality.migrate to see how we handle migrations (look in one of the more recent ones to get the latest approaches), or I can talk you through it.

from typing import Callable, Any


def create_fn_to_isolang(output_format=None, is_upper=False) -> Callable[[Any], str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this has got its own file, can't this just live in the portality.lib.coerce module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_fn_to_isolang also call by portality.lib.dataobj
I should not let portality.lib.coerce import to portality.lib.dataobj or either way.

you can find a lot of same kind of convert function in portality.lib.coerce or portality.lib.dataobj, some of those is duplicate

  • seamless.to_utf8_unicode
  • seamless.to_unicode_upper
  • to_country_code
  • to_url
  • to_float

in order to solve the import and duplicate problems, extract to new file to contain list of same kind of function is a solution.

but, purpose of this task is not refactoring. that is the reason why this file only have one function.

@richard-jones
Copy link
Contributor

@philipkcl I think we should also identify some areas where language codes are particularly visible, such as search interface, ToC pages, and Article pages and list them under the testing header above to ask the DOAJ team to review before release

@philipkcl
Copy link
Contributor Author

One structural change, and the other thing is that we'll need a migration for this to move all 2 letter codes in the database to 3 letter ones. You can look in portality.migrate to see how we handle migrations (look in one of the more recent ones to get the latest approaches), or I can talk you through it.

I found the system compatible very well for records with 2-letter or 3-letter. which mean the form can show which language of 2-letter code and convert to 3-letter when the form saved.

Should we still need to convert data from 2-letter to 3-letter?

@philipkcl
Copy link
Contributor Author

@philipkcl I think we should also identify some areas where language codes are particularly visible, such as search interface, ToC pages, and Article pages and list them under the testing header above to ask the DOAJ team to review before release

I found language code have been used in some form and some background job.
I can list those form and background job I found later. But, I can't ensure that was all of them.

@richard-jones
Copy link
Contributor

Should we still need to convert data from 2-letter to 3-letter?

I think we should be storing the data as we want it yeah.

There are other ways to get at the data that aren't the form, and I'm not sure whether the fields would be automatically converted or not (it's possible they are!). We could check the API and perhaps the Journal CSV and see whether they come out as 3 letter codes there too.

Even so, the strong temptation is to just migrate the data we hold so we don't need to think about it in the future.

@philipkcl
Copy link
Contributor Author

@richard-jones, I just found there have a special language field in Article
article.bibjson.journal.language

  • this field is special because it use "coerce" : "unicode" instead of "coerce" : "isolang", should I convert it to isolang ?
  • do you know who will use this field ?
  • is that should also use 3-letter language code ?

article structure
https://github.com/DOAJ/doaj/blob/develop/portality/models/article.py#L849

journal structure
https://github.com/DOAJ/doaj/blob/feature/3374_three_letter_language_code/portality/models/v2/shared_structs.py#L20

@philipkcl
Copy link
Contributor Author

added migration

@philipkcl
Copy link
Contributor Author

philipkcl commented Feb 17, 2023

also migrate to 3 letter country code

@philipkcl
Copy link
Contributor Author

this branch is not ready yet, I need to update test case for country code

@philipkcl
Copy link
Contributor Author

ready for review

Copy link
Contributor

@RK206 RK206 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change requested.

@philipkcl Unit tests failing even after testing multiple times. Not sure it is an issue with my environment. Please have a check once

=========================== short test summary info ============================
FAILED doajtest/unit/api_tests/test_api_crud_article.py::TestCrudArticle::test_14_test_via_endpoint
FAILED doajtest/unit/api_tests/test_api_crud_article.py::TestCrudArticle::test_15_no_redirects
FAILED doajtest/unit/api_tests/test_api_crud_article.py::TestCrudArticle::test_16_no_redirects
================= 3 failed, 15 passed, 2213 warnings in 59.35s =================

Tested as much as possible on UI. Looks ok to me

@philipkcl
Copy link
Contributor Author

the unit test fail seem to be created by branch develop (43454ec9), I will try to fix that in this branch.

@philipkcl philipkcl requested a review from RK206 February 27, 2023 12:09
Copy link
Contributor

@richard-jones richard-jones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration is for the Articles, but actually we need to migrate the data in Journals and Applications

@philipkcl
Copy link
Contributor Author

remember to run article_cleanup_sync.py when we release

@philipkcl
Copy link
Contributor Author

@richard-jones , removed article migration

@philipkcl philipkcl requested a review from richard-jones March 13, 2023 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants