Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(validations): add validations yaml and set up repo docs #2

Merged

Conversation

billhimmelsbach
Copy link
Collaborator

@billhimmelsbach billhimmelsbach commented Mar 22, 2024

Sets up the regtech-regex repository with an initial validations file, sets up the Readme, and cleans up the boilerplate files. Gets this repo ready to become public! 🥳

Changes

  • add first version of validations.yaml
    • email
    • lei
    • rssd_id
    • simple_us_phone_number
    • tin
  • cleans up issue/pr templates with only parts relevant to the repo
  • updates the changelog for the initial version
  • added a basic package.json (making this a real NPM package in the future may be helpful
  • updates the CONTRIBUTING.md file to not include browser support information
  • removes unnecessary INSTALL.md file
  • creates the Readme file with examples, documentation, and usage instructions

Notes

Here's the completed open source checklist with ~ next to ones that aren't applicable:

  • Has PII been removed?
    • Use Clouseau for scanning source code.
    • If there are images, visually inspect each image to ensure there is no CFPB-specific information.
  • Have security vulnerabilities been remediated?
  • Are we including any other open source products? If so, is there any conflict with our public domain release?
  • Is our TERMS.md included?
  • Is a CHANGELOG.md present and does it contain structured, consistently formatted recent history?
  • Are instructions for contributing included (CONTRIBUTING.md)?
  • Are installation instructions clearly written in the README and tested on a clean machine?
  • Are all dependencies described in the README, requirements.txt, and/or buildout.cfg?
  • [~] Are the API docs generated?
  • [~] Are there unit tests?
  • [~] If applicable and possible, is it set up in TravisCI?
  • Have multiple people reviewed the code?
  • Is there a screenshot in the README, if applicable?

Todos

post-mvp wishlist

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Passes all existing automated tests
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested

jcadam14
jcadam14 previously approved these changes Mar 25, 2024
Copy link
Contributor

@jcadam14 jcadam14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for our initial cut!

validations.yaml Outdated
Comment on lines 8 to 15
lei:
description: must be 20 characters and only contain A-Z, and 0-9 (no special characters)
examples:
- 123456789TESTBANK123
- 123456789TESTBANK456
- 123456TESTSUBBANK456
link: https://regex101.com/r/5AaT18/6
regex: ^[\dA-Z]{20}$
Copy link
Member

@hkeeler hkeeler Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above examples are not actually valid LEIs.

  • The 5th and 6th characters are supposed to aways be 00.

I think we can also tune up the regex to find other potentially invalid LEIs.

  • The last two digits are a checksum, so they should always be numeric.
  • The first four digits (the LOU code) may be only numeric. All the examples I see have it as numeric, but none of the references I've found are explicit about it. I also can't find a list of all valid LOUs, which you'd think would be something you could download off of GLEIF. 🤷

Anyway, I think we could safely do something like:

^[A-Z0-9]{4}00[A-Z0-9]{16}\d{2}$

That'd at least catch the reserved 00 and the checksum at the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @hkeeler! We're going to need an update of the seed / DB data then too since those are the examples currently in the DB. 👍

Copy link
Collaborator Author

@billhimmelsbach billhimmelsbach Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe that regex above isn't quite right @hkeeler, since it's only supposed to be 20 characters? So maybe it's:

^[A-Z0-9]{4}00[A-Z0-9]{12}\d{2}$

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha wow it took some digging and looking at the ISO format but yeah GLEIF has it at https://www.gleif.org/en/about-lei/iso-17442-the-lei-code-structure also. Thanks Hans!

I wrote cfpb/sbl-project#187.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe that regex above isn't quite right @hkeeler, since it's only supposed to be 20 characters? So maybe it's:

^[A-Z0-9]{4}00[A-Z0-9]{12}\d{2}$

Yes. Yours is correct. Somehow the LEI I was testing with was 24 characters long. I must have done something weird. 🤷

Copy link
Collaborator Author

@billhimmelsbach billhimmelsbach Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not 100% sure if the LOU number that prefaces the LEI will always be just digits? Seems like it is, but I assume they'd hop to alphanumeric if they needed to account for more LOUs that can issue LEIs. I can't find something on the GLIEF website that says digits only? The spec ISO 17442-1:2020 at least doesn't confirm only digits for that part of the LEI?

I don't know. I'm happy to start with assuming each LEI will start with a 4 digit LOU and see what the data team says? (I did get the original alphanumeric 20 char restriction after talking with the data team)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha I was using the same lib. So yeah for the sbl-project json updates we can add the correct LEIs, just the question of, along with the regex check do we want the backend to validate the checksum?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sounding like the may be legit reasons for an LEI to break the rules. There's certainly data in our current dataset that does. I think for this PR, let's keep it simple and just make it the original ^[\dA-Z]{20}$, and we can circle back soon with @nongarak and @Kibrael to find out more and decide if we want to do anything more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, what a ride! 🎢 I might make an issue to revisit this later. 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the 2 digit checksum at the end at least? It's actually part of the ISO 17442-1:2020 standard so I'm hoping it's actually something that all the LEI data has? Mind checking @hkeeler?

validations.yaml Outdated
@@ -0,0 +1,36 @@
email:
description: must conform to common email conventions using the W3C method of email validation which is only a subset of full RFC 5322 compliance as per https://html.spec.whatwg.org/multipage/input.html#email-state-(type=email)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to have a

  references:
    - https://html.spec.whatwg.org/multipage/input.html#email-state-(type=email)

...for links to any outside sources.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Should I then pop them out of the description and only put these links in the references section?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@billhimmelsbach
Copy link
Collaborator Author

billhimmelsbach commented Mar 26, 2024

Alright @hkeeler and @jcadam14! I've...

  • moved the validations.yaml to a src/ folder
  • updated LEI examples to include ones that follow all the conventions (123400TESTBANK000289) and one that only follows ISO 17442-1:2020 (TESTBANK123456789012) with a comment pointing to the issue to explore LEIs further
  • if @hkeeler gives me the 👍, used a regex that still has a check for 2 digits rather than just plain alphanumeric ^[A-Z0-9]{18}\d{2}$ and updated the link to reflect that
  • added references array, if there are no references I've left it null, but open to making it null explicitly, an empty array, etc
  • added error_text to make sure that the friendly errors presented to users on frontends can be kept in sync more easily with the regex
  • changed a little of the content for commas and capitalization

@billhimmelsbach
Copy link
Collaborator Author

billhimmelsbach commented Mar 26, 2024

Heyo! I updated the PR description to get everything ready, so I think this is ready for final review @hkeeler @jcadam14. 🥳

Copy link
Member

@hkeeler hkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall look great. Just the one bit about references.

- '9999'
- '1'
link: https://regex101.com/r/l3SyQi/3
references:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize an empty value was a valid null in YAML, though I do see in the spec that it's valid. It does feel weird, though. You normally just don't include a given attribute if it is null, though you do occasionally see explicit NULLs. I also just leard ~ is also null in YAMLese. 🤷

Anyway, my vote would be to just exclude references for a given entry if we don't have any references for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have some kind of linter/checklist someday to check to make sure people don't forget fields, so that was my thought behind the null values.

Happy to just exclude it though, so put up a commit for it.

@billhimmelsbach
Copy link
Collaborator Author

Hurray! 🎉

@billhimmelsbach billhimmelsbach merged commit c4f1aae into main Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants