-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add multi-line JSON reading methods #3
Conversation
15928e1
to
4650f6d
Compare
@@ -1,7 +1,7 @@ | |||
repos: | |||
- repo: https://github.com/psf/black | |||
#this version is synced with the black mentioned in .github/workflows/ci.yml | |||
rev: 23.10.0 | |||
rev: 24.4.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort of an aside, but how do we feel about more ruff cutover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed - I do like moving from 3 or 4 different tools to 1.
Ongoing concerns:
- I still don't love how there's not really an obviously-correct default config - our config sections so far seem large and opinionated. But 🤷 it's fine
- I'm also lightly leery of its stability right now - like they are still in 0.x mode and changing CLI args etc.
- Relatedly, ruff has no format-change guarantees like black does (with the year-based major version).
So I'm cautious, but I'd be down for more migrations.
cumulus_fhir_support/ndjson.py
Outdated
return {} | ||
|
||
|
||
def read_ndjson( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i realize this is for best compatibility with what we've been doing to date, but with the scope change: consider read_multiline_json
here and in the following function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, the whole ndjson / jsonlines thing is weird. Well, I guess just the ndjson side is weird.
I had kept with ndjson for consistency yeah. But I'm open to a better name. But multiline is so corporate, man. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_json_but_like_what_if_there_was_a_ton_of_it_man
but, more seriously, read_delimited_json
? read_json_lines
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK we couldn't come up with better names - read_multiline_json
it is
Specifically: - list_multiline_json_in_dir: scans for files & their resource types - read_multiline_json: returns line by line results from a file - read_multiline_json_from_dir: returns line by line results from a folder This hides all the I/O error handling & odd format challenges (like .jsonl vs .ndjson or handling new lines).
Specifically:
This hides all the I/O error handling & odd format challenges (like .jsonl vs .ndjson or handling new lines).
This effort will require two other PRs: one for the ETL and one for the Library: