-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chart review doc gathering status #299
base: main
Are you sure you want to change the base?
Conversation
# We've found a real world case where we've found missing data preventing note download; for now, | ||
# we are just notifying about it and returning an empty string object. | ||
# If we find a actually need this data, we can change this later. | ||
# But note that if we do, we'll need to handle downloading Binary FHIR objects, in addition to arbitrary URLs. | ||
raise ValueError("No textual mimetype found") | ||
|
||
print(f"Found unexpected mime type '{mimetype}', skipping.") | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type in question that caused this error was application/xml
- wanted to talk about this change before commiting to this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can talk about this during our meeting today, but I'm guessing this was a Cerner thing they do - they wrap the doc in a Cerner-specific xml format, with an RDF doc base64-encoded inside. Usually they generate the html version of the doc later in the doc lifecycle, like when it becomes a status=final
or whatever the status is.
So I'm leery of trying to support application/xml
- we already (very optimistically) support application/xhtml+xml
. But arbitrary xml is harder to know what to do with (i.e. how to extract the note text).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK after talking about this in person, I think your beef was more that unhandled mimetypes were stopping upload-notes
in its tracks, rather than you wanting to handle application/xml. That makes sense.
So I might vote for upload-notes
catching the ValueError here and printing something sensible there. This method is also used by the ETL and I believe it catches errors and prints the message itself. So that's the established error-processing approach for this anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that - is it cool if I keep the erroring mimetype info in the ValueError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I was having vibes that it might be too low level info, but 🤷 give the people what they crave - mimetypes
@@ -318,6 +319,13 @@ def print_header(name: str | None = None) -> None: | |||
if name: | |||
print(name) | |||
|
|||
def get_transient_progress()-> (progress.Progress): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this might more naturally fit in cli_utils.py
(which is where make_progress_bar()
lives). I'm trying to roughly group misc functions. (Maybe even convert this into a transient=False
kwarg on that method?)
How do you feel about whether a TimeElapsed
column fits here? Helpful or noisy?
Also, why the parens on the return type hint? Is that something your IDE or ruff
encourages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh at one point i was passing back a tuple - i'll remove those.
TimeElapsed is a fine addition.
# We've found a real world case where we've found missing data preventing note download; for now, | ||
# we are just notifying about it and returning an empty string object. | ||
# If we find a actually need this data, we can change this later. | ||
# But note that if we do, we'll need to handle downloading Binary FHIR objects, in addition to arbitrary URLs. | ||
raise ValueError("No textual mimetype found") | ||
|
||
print(f"Found unexpected mime type '{mimetype}', skipping.") | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can talk about this during our meeting today, but I'm guessing this was a Cerner thing they do - they wrap the doc in a Cerner-specific xml format, with an RDF doc base64-encoded inside. Usually they generate the html version of the doc later in the doc lifecycle, like when it becomes a status=final
or whatever the status is.
So I'm leery of trying to support application/xml
- we already (very optimistically) support application/xhtml+xml
. But arbitrary xml is harder to know what to do with (i.e. how to extract the note text).
scanned += 1 | ||
progress.update(task, description = f"Unmatched IDs: {len(real_docref_ids)}, found: {found}, scanned: {scanned}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: you'll miss the last progress update here if we early exit because we found all the target docrefs.
if not real_docref_ids: | ||
break | ||
scanned += 1 | ||
progress.update(task, description = f"Unmatched IDs: {len(real_docref_ids)}, found: {found}, scanned: {scanned}") | ||
|
||
def _filter_fake_docrefs(codebook: deid.Codebook, anon_docrefs_csv: str, docrefs: Iterable[dict]) -> Iterator[dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two filter methods were already very similar (and maybe should have been sharing more code between them). With the extra progress reporting, this is triggering my code sense that they should share more of their guts.
The only real difference is how to calculate the id to check for each docref, right? That could be a lambda arg or something, I dunno.
This PR makes the following changes:
Checklist
docs/
) needs to be updated