Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Pagination Features for FHIRClient and Bundle Classes #174

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

LanaNYC
Copy link
Contributor

@LanaNYC LanaNYC commented Sep 3, 2024

Overview

This pull request aims to address pagination functionalities as outlined in issue #172. The implementation includes the following key changes:

  • Refactoring FHIRSearch perform() and perform_resources with iterations.
  • Adding __iter__ method to the Bundle class to handle pagination.
  • Moving pagination logic to _utils.py for better modularity and reuse.
  • Adjusted tests to accommodate the new iterable-based approach.

@mikix
Copy link
Contributor

mikix commented Sep 4, 2024

I would greatly appreciate any guidance from the maintainers or community members on resolving the Superfluous entry error. This will allow me to complete the testing phase and ensure the robustness of the new pagination features.

I believe this is because of some bad indenting in fhirclient/models/bundle.py - def elementProperties lost its indenting and is now a toplevel function instead of a class method. Adding back the indenting seems to surface more reasonable errors.

Note that manually editing bundle.py like that is fraught - most model files like that will get overwritten the next time ./generate_models.sh is run. But that's fine! We have options:

  • Copy/paste a copy of bundle.py and put it in ./fhir-parser-resources/ and edit the generation machinery to copy it over (this has downsides of code-drift over time)
  • Make some of these changes in fhir-parser first, then update this repo to consume the changes (forward-port the changes)
  • Make the changes here, where it's easier/faster, then go back and back-port the changes to fhir-parser too (and then probably also update this repo on top of that, removing the custom changes to get fully in sync again)

I'm down for whatever - that last option is probably easiest for just getting something that works. And maybe the solution also depends on whether we think this feature "belongs" in fhir-parser. Just wanted to flag the close relationship these two repos have.

@LanaNYC
Copy link
Contributor Author

LanaNYC commented Sep 9, 2024

@mikix Thank you. I'll have more time this week and hope to finish this PR.

@LanaNYC LanaNYC force-pushed the feat/pagination-Bundle branch from 9a69be0 to 8f4a32b Compare September 20, 2024 16:15
@LanaNYC LanaNYC changed the title WIP: Implement Pagination Features for FHIRClient and Bundle Classes Implement Pagination Features for FHIRClient and Bundle Classes Sep 20, 2024
@LanaNYC LanaNYC force-pushed the feat/pagination-Bundle branch from 8f4a32b to 08a8535 Compare September 20, 2024 16:28
Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you! Comments below

fhirclient/_utils.py Outdated Show resolved Hide resolved
fhirclient/models/bundle.py Outdated Show resolved Hide resolved
fhirclient/models/bundle.py Outdated Show resolved Hide resolved
if not first_bundle:
return iter([])
yield first_bundle

def perform_resources(self, server):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: while we're here, maybe add -> list['Resource']: on this method, which should help differentiate how this is different than the _iter version.

(And probably add -> 'Bundle': to perform() too.)



# Use forward references to avoid circular imports
def perform_iter(self, server) -> Iterator['Resource']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is written, it will iterate on BundleEntry objects, yeah? But we actually want Bundles.

Should this be re-written as return iter_pages(self.perform(server)) (and change the return hint to Iterator['Bundle']?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you for catching this. Will be fixed with next pushed code.

fhirclient/models/fhirsearch.py Show resolved Hide resolved
Comment on lines 160 to 162
bundle = self.perform(server)
resources = []
if bundle is not None and bundle.entry is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this code could just be return list(self.perform_resources_iter()) yeah? (just to reduce duplicated code paths, and to emphasize that the method is now just a "worse" version of the _iter one)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that change would be a behavior change - but I think an important one? Previously, this did no pagination. But that just means that we'd throw resources on the floor with no indication we had done so or way to work around that.

So I think switching this method to use pagination under the hood is a valuable fix.

Copy link
Contributor Author

@LanaNYC LanaNYC Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you'd prefer to change a behavior of perfom_resources(), my current implementation of perform_iter() depend on perform(). I can refactor perform_iter() to remove this dependency while keeping both perform() and perfom_resources() with DeprecationWarning to maintain backward compatibility. Please, let me know what direction you'd prefer here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it sounds reasonable to have the deprecation warning in both perform & perform_resources. Refactoring could maybe be as simple as moving the guts out to a helper method like _perform_perform that both call (don't name it that 😄)

Comment on lines 178 to 179
if not first_bundle or not first_bundle.entry:
return iter([])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check isn't needed, is it? perform can't return None and the not .entry check is handled by __iter__.

This method could be written as the following, I think?

for bundle in self.perform_iter(server):
  for entry in bundle:
    yield entry.resource

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding these tests, love to see it ❤️

@LanaNYC
Copy link
Contributor Author

LanaNYC commented Sep 25, 2024

@mikix Thank you for detailed feedback. I'll look at everything this weekend.

@LanaNYC
Copy link
Contributor Author

LanaNYC commented Oct 7, 2024

Hi @mikix,

Sorry for the delay. I've got carried away by work and life. I'm ready to finish this PR now.

First of all, thank you for your feedback on the PRs! I really appreciate the thoughtful consideration about the iteration and pagination logic in Bundle.

Second, to make sure we're aligned before I proceed, here's my understanding of your comments and the next steps I should take:
Summary of your feedback:

  1. Bundle should stay as an Iterator vs. Iterable;
  2. Direct Iteration: You're okay with dropping that feature for now, especially since the primary iteration should handle pagination, which is already being covered in FHIRSearch.
  3. It sounds like you are suggesting we drop the changes to fhir-parser for now and focus on keeping the iteration logic for entries within the scope of FHIRSearch or _util.py. This would simplify both PRs and reduce the complexity while still delivering value.

Subtasks:

  1. Remove the direct iteration support in the Bundle class:
  • Revert the iteration logic added to the Bundle class to reduce complexity and unnecessary features.
  1. Move iteration logic to _util.py or FHIRSearch:
  • Ensure that the logic for iterating over entries in Bundle is handled within _util.py or FHIRSearch, making it specific to where it is needed (e.g., handling pagination and entry iteration).
  1. Close the open PR for fhir-parser:
  1. Update and test client-py:
  • After making the changes, update the client-py tests to ensure everything works as expected, especially focusing on the behavior of iter_pages.

Before I start on these changes, I’d like to confirm with you:

  • Does this approach sound good, or would you prefer I keep the iteration logic within the Bundle class in some capacity?
  • Should I proceed with closing the fhir-parser PR and refocusing everything on the client-py side?
  • Are there any other changes or considerations you’d like me to make before moving forward with the adjustments?

Once I hear back from you, I’ll begin working on these changes.

Thank you again for your guidance!

@LanaNYC LanaNYC marked this pull request as draft October 7, 2024 15:55
@mikix
Copy link
Contributor

mikix commented Oct 7, 2024

Welcome back to calmer times, @LanaNYC!

First off -- and I'm not saying you meant this -- but you would be entirely within your rights to be typing "thank you for your feedback on the PRs" a little sarcastically. Sorry for being so nit-picky, but I have a hard time turning that part of my brain off. 😄 And likewise, you're wise to get pre-approval in case I blow up your plan again. But hopefully the back and forth is worth it. 🤞

I think that plan looks good. 👍

  • I think leaving Bundle alone for now is a reasonable boundary (in hindsight).
  • So with this approach the only public entry point to iteration/pagination would be perform_iter and perform_resources_iter (and perform_resources behind the scenes)? That feels reasonable.

@LanaNYC
Copy link
Contributor Author

LanaNYC commented Oct 7, 2024

@mikix Truly love your nit-picking, —please keep it up, especially with my PRs :) . It's the best way to grow and learn. Diving into fhir-parser has been a great learning experience for me, and I have no regrets.

Since you've pre-approved my plan, I’ll move forward with it. I’ll switch this PR from draft to 'Ready for review' once all the tests pass.

@LanaNYC LanaNYC force-pushed the feat/pagination-Bundle branch 3 times, most recently from 9b5ad9b to bebe4a6 Compare October 14, 2024 19:50
@LanaNYC LanaNYC marked this pull request as ready for review October 14, 2024 19:51
@mikix
Copy link
Contributor

mikix commented Oct 22, 2024

Oh I missed that this was marked ready-for-review! I'll review this again?

I also wanted to drop a line that I just switched the Pythons we run in CI (in #182), so you'll want to rebase so that the CI runs the now-required 3.13 tests.

@LanaNYC LanaNYC force-pushed the feat/pagination-Bundle branch from bebe4a6 to 2232f3a Compare October 22, 2024 17:29
@LanaNYC
Copy link
Contributor Author

LanaNYC commented Oct 22, 2024

No problem. Rebase is done. All checks have passed. PR ready for a review. As discussed earlier, please feel free to dive into the details as much as you'd like - your thorough feedback is always appreciated!

Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Got a few comments, but this is in really good shape I think!

fhir-parser Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this submodule is still pointing at your old branch?

Comment on lines 53 to 56

def __iter__(self):
return iter(self.entry or [])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about removing this change too, yeah? It will be overridden the next time we regenerate models and it's not a huge help for folks anyway - they could just do self.entry or [] themselves.

Comment on lines 23 to 24
next_link = _get_next_link(bundle)
if next_link:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you absolutely don't need to use these, but python 3.8 introduced walrus operators, so this could be slightly condensed, if you liked:

if next_link := _get_next_link(bundle):

Beauty & readability is in the eye of the beholder, though.


def _sanitize_next_link(next_link: str) -> str:
"""
Sanitize the `next` link to ensure it is safe to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked you in this PR before, what the worries were with this - I forget the exact response - but could you add a sentence or two in this docstring about it, for posterity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been testing _sanitize_next_link() more today and noticed a few potential issues. The initial idea behind sanitizing links was to implement a safeguard to protect against URL injection attacks by ensuring the link is safe to use. However, to maintain the library's flexibility, I can't simply rely on blacklisting or whitelisting parameters.

The current approach is just the first step toward safely parsing URLs, but it isn't sufficient for full validation. Users will still need to apply additional validation measures on their end.

On another note, do you think that _sanitize_next_link() is too rigid for an open-source library?
Am I overthinking it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be unnecessary here, especially if it requires some extra client side work to be fully valid.

What are the risks / attack vectors to consider here? The EHR vendor you are connecting to would have to be malicious, yeah? (Like, this URL isn't coming from a human entering data at the hospital - the EHR software directly generates it as part of its search code.)

In which case, the malicious EHR doesn't need to send you to a separate server or mess with URLs. They already have you talking securely to them, and they can leak your credentials or feed you bogus data or whatever it is they want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm assuming that one would be connecting over https and other man-in-the-middle attacks are not in scope?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If man-in-the-middle attacks are in scope - maybe it would be enough to simply validate that the hostnames (netlocs) match between origin_server and the new URL.


for link in bundle.link:
if link.relation == "next":
return link.url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might sanitize the URL here?

If there's not a use case for getting the raw URL, we are just leaving a gap where a caller could introduce a bug by not knowing they need to sanitize it after getting it. I've seen that kind of "external context" in an API lead to bugs, once it's handed over to other devs who don't have that context in their heads.

return sanitized_url


def _execute_pagination_request(sanitized_url: str) -> Optional['Bundle']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the return type hint can just be 'Bundle' - this method will never return None (it may raise an exception... but otherwise will return a Bundle). (and can update the docstring below too)

Comment on lines 113 to 115
except requests.exceptions.HTTPError as e:
# Handle specific HTTP errors as needed, possibly including retry logic
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: small trick when you don't actually need the exception object is to just do a plain raise which re-raises the current exception.

    except requests.exceptions.HTTPError:
        # In future: handle specific HTTP errors as needed, possibly including retry logic
        raise

Comment on lines 138 to 141
res = server.request_json(self.construct())
bundle = bundle.Bundle(res)
bundle = bundle_module.Bundle(res)
bundle.origin_server = server
return bundle
Copy link
Contributor

@mikix mikix Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not your doing, but this looks like a re-implementation of Bundle.read_from.

You might be able to reduce this whole method (post-deprecation-warning) to:

from .bundle import Bundle
return Bundle.read_from(self.construct(), server)

(since it even checks if server is None for you)


class TestFHIRSearchIter(unittest.TestCase):
def setUp(self):
# self.mock_server = MockServer(tmpdir=os.path.join(os.path.dirname(__file__), '..', 'data', 'examples'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️?

Comment on lines 41 to 43
mock_perform.return_value = self.mock_bundle
mock_iter_pages.side_effect = iter([self.mock_bundle])
mock_fetch_next_page.side_effect = [None]
Copy link
Contributor

@mikix mikix Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your tests are fine - but I will say that all of this mocking can end up being brittle. Like here, you are using knowledge of the internals of perform_iter (that it calls perform()) and then mocking out a lot of the detailed utility code - where there might be bugs! But it's generally easy to get lost in the sauce when doing all this mocking. If another dev breaks this test in six month, they might not have the understanding of which micro-targeted line of code you are meant to be testing.

I know that it's a bit philosophical territory about how small to make unit tests vs functional tests vs whatever. But just practically, it can be tough to mock all the little interactions like this.

This is entirely optional for this PR, but if I were writing tests for this, I might:

  • Add a testing dependency on the responses mocking library. Like in pyproject.toml's optional-dependencies section, add responses to the tests list of dependencies.
  • Then in tests like this, you can mock things at the network boundary:
@responses.activate
def test_perform_iter_single_bundle(self):
    responses.add(responses.GET, "https://example.com/bundle-url/", json={bundle content...})

But again, what you have here is fine - just it can be annoying when writing tests to do all the mocking like this, trying to avoid the network. But responses lets you mock further back.

@LanaNYC LanaNYC force-pushed the feat/pagination-Bundle branch from 2232f3a to bd60dc3 Compare October 28, 2024 18:14
@LanaNYC
Copy link
Contributor Author

LanaNYC commented Oct 28, 2024

@mikix The latest version is ready for review. Thank you for steering me towards responses mocking library - It made my life so much easier :)

Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fabulous - I think my only blocking concern is the authentication header comment.

fhir-parser Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - this is harmless, but it does point at a different commit, one from your branch. Since the content is the same, it's not necessary to figure out how to revert this - but maybe see if it's simple to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is ready for review. I've reverted fhir-parser to the original commit, hopefully that should solve it.

Optional[Bundle]: The next page of results as a FHIR Bundle, or None if no "next" link is found.
"""
if next_link := _get_next_link(bundle):
sanitized_next_link = _sanitize_next_link(next_link)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this call could be dropped now that _get_next_link does the sanitizing.

Comment on lines 94 to 97
response = requests.get(sanitized_url)
response.raise_for_status()
next_bundle_data = response.json()
next_bundle = Bundle(next_bundle_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could/should this also use Bundle.read_from? That would set a few more headers like Accept: application/json and would also send authentication headers - which... we probably need here yeah?

It would mean sending down the FHIRServer instance, but that should be available to the callers of this function.

Comment on lines +43 to +44
@responses.activate
def test_perform_iter_single_bundle(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah this looks much cleaner with responses

…to Bundle, move pagination logic to _utils.py, add tests

Simplify url sanitization, adjust testse

Replace redundant code on perform()

Add tests perform-iter and perform_resources_iter via Response

Revert fhir-parser submodule to original commit
@LanaNYC LanaNYC force-pushed the feat/pagination-Bundle branch from bd60dc3 to 3ec3ef3 Compare November 4, 2024 17:49
Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

Thank you so much for this work!

@mikix mikix merged commit ff41695 into smart-on-fhir:main Nov 6, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants