-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Pagination Features for FHIRClient and Bundle Classes #174
Conversation
I believe this is because of some bad indenting in Note that manually editing
I'm down for whatever - that last option is probably easiest for just getting something that works. And maybe the solution also depends on whether we think this feature "belongs" in fhir-parser. Just wanted to flag the close relationship these two repos have. |
@mikix Thank you. I'll have more time this week and hope to finish this PR. |
9a69be0
to
8f4a32b
Compare
8f4a32b
to
08a8535
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you! Comments below
fhirclient/models/fhirsearch.py
Outdated
if not first_bundle: | ||
return iter([]) | ||
yield first_bundle | ||
|
||
def perform_resources(self, server): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: while we're here, maybe add -> list['Resource']:
on this method, which should help differentiate how this is different than the _iter
version.
(And probably add -> 'Bundle':
to perform()
too.)
fhirclient/models/fhirsearch.py
Outdated
|
||
|
||
# Use forward references to avoid circular imports | ||
def perform_iter(self, server) -> Iterator['Resource']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is written, it will iterate on BundleEntry
objects, yeah? But we actually want Bundles.
Should this be re-written as return iter_pages(self.perform(server))
(and change the return hint to Iterator['Bundle']
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you for catching this. Will be fixed with next pushed code.
fhirclient/models/fhirsearch.py
Outdated
bundle = self.perform(server) | ||
resources = [] | ||
if bundle is not None and bundle.entry is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this code could just be return list(self.perform_resources_iter())
yeah? (just to reduce duplicated code paths, and to emphasize that the method is now just a "worse" version of the _iter one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that change would be a behavior change - but I think an important one? Previously, this did no pagination. But that just means that we'd throw resources on the floor with no indication we had done so or way to work around that.
So I think switching this method to use pagination under the hood is a valuable fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you'd prefer to change a behavior of perfom_resources(), my current implementation of perform_iter() depend on perform(). I can refactor perform_iter() to remove this dependency while keeping both perform() and perfom_resources() with DeprecationWarning to maintain backward compatibility. Please, let me know what direction you'd prefer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it sounds reasonable to have the deprecation warning in both perform & perform_resources. Refactoring could maybe be as simple as moving the guts out to a helper method like _perform_perform
that both call (don't name it that 😄)
fhirclient/models/fhirsearch.py
Outdated
if not first_bundle or not first_bundle.entry: | ||
return iter([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check isn't needed, is it? perform
can't return None
and the not .entry
check is handled by __iter__
.
This method could be written as the following, I think?
for bundle in self.perform_iter(server):
for entry in bundle:
yield entry.resource
tests/models/bundle_iterator_test.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding these tests, love to see it ❤️
@mikix Thank you for detailed feedback. I'll look at everything this weekend. |
Hi @mikix, Sorry for the delay. I've got carried away by work and life. I'm ready to finish this PR now. First of all, thank you for your feedback on the PRs! I really appreciate the thoughtful consideration about the iteration and pagination logic in Bundle. Second, to make sure we're aligned before I proceed, here's my understanding of your comments and the next steps I should take:
Subtasks:
Before I start on these changes, I’d like to confirm with you:
Once I hear back from you, I’ll begin working on these changes. Thank you again for your guidance! |
Welcome back to calmer times, @LanaNYC! First off -- and I'm not saying you meant this -- but you would be entirely within your rights to be typing "thank you for your feedback on the PRs" a little sarcastically. Sorry for being so nit-picky, but I have a hard time turning that part of my brain off. 😄 And likewise, you're wise to get pre-approval in case I blow up your plan again. But hopefully the back and forth is worth it. 🤞 I think that plan looks good. 👍
|
@mikix Truly love your nit-picking, —please keep it up, especially with my PRs :) . It's the best way to grow and learn. Diving into fhir-parser has been a great learning experience for me, and I have no regrets. Since you've pre-approved my plan, I’ll move forward with it. I’ll switch this PR from draft to 'Ready for review' once all the tests pass. |
9b5ad9b
to
bebe4a6
Compare
Oh I missed that this was marked ready-for-review! I'll review this again? I also wanted to drop a line that I just switched the Pythons we run in CI (in #182), so you'll want to rebase so that the CI runs the now-required 3.13 tests. |
bebe4a6
to
2232f3a
Compare
No problem. Rebase is done. All checks have passed. PR ready for a review. As discussed earlier, please feel free to dive into the details as much as you'd like - your thorough feedback is always appreciated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Got a few comments, but this is in really good shape I think!
fhir-parser
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this submodule is still pointing at your old branch?
fhirclient/models/bundle.py
Outdated
|
||
def __iter__(self): | ||
return iter(self.entry or []) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we talked about removing this change too, yeah? It will be overridden the next time we regenerate models and it's not a huge help for folks anyway - they could just do self.entry or []
themselves.
fhirclient/_utils.py
Outdated
next_link = _get_next_link(bundle) | ||
if next_link: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you absolutely don't need to use these, but python 3.8 introduced walrus operators, so this could be slightly condensed, if you liked:
if next_link := _get_next_link(bundle):
Beauty & readability is in the eye of the beholder, though.
fhirclient/_utils.py
Outdated
|
||
def _sanitize_next_link(next_link: str) -> str: | ||
""" | ||
Sanitize the `next` link to ensure it is safe to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked you in this PR before, what the worries were with this - I forget the exact response - but could you add a sentence or two in this docstring about it, for posterity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been testing _sanitize_next_link() more today and noticed a few potential issues. The initial idea behind sanitizing links was to implement a safeguard to protect against URL injection attacks by ensuring the link is safe to use. However, to maintain the library's flexibility, I can't simply rely on blacklisting or whitelisting parameters.
The current approach is just the first step toward safely parsing URLs, but it isn't sufficient for full validation. Users will still need to apply additional validation measures on their end.
On another note, do you think that _sanitize_next_link() is too rigid for an open-source library?
Am I overthinking it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be unnecessary here, especially if it requires some extra client side work to be fully valid.
What are the risks / attack vectors to consider here? The EHR vendor you are connecting to would have to be malicious, yeah? (Like, this URL isn't coming from a human entering data at the hospital - the EHR software directly generates it as part of its search code.)
In which case, the malicious EHR doesn't need to send you to a separate server or mess with URLs. They already have you talking securely to them, and they can leak your credentials or feed you bogus data or whatever it is they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm assuming that one would be connecting over https and other man-in-the-middle attacks are not in scope?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If man-in-the-middle attacks are in scope - maybe it would be enough to simply validate that the hostnames (netlocs) match between origin_server and the new URL.
fhirclient/_utils.py
Outdated
|
||
for link in bundle.link: | ||
if link.relation == "next": | ||
return link.url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might sanitize the URL here?
If there's not a use case for getting the raw URL, we are just leaving a gap where a caller could introduce a bug by not knowing they need to sanitize it after getting it. I've seen that kind of "external context" in an API lead to bugs, once it's handed over to other devs who don't have that context in their heads.
fhirclient/_utils.py
Outdated
return sanitized_url | ||
|
||
|
||
def _execute_pagination_request(sanitized_url: str) -> Optional['Bundle']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the return type hint can just be 'Bundle'
- this method will never return None
(it may raise an exception... but otherwise will return a Bundle). (and can update the docstring below too)
fhirclient/_utils.py
Outdated
except requests.exceptions.HTTPError as e: | ||
# Handle specific HTTP errors as needed, possibly including retry logic | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: small trick when you don't actually need the exception object is to just do a plain raise
which re-raises the current exception.
except requests.exceptions.HTTPError:
# In future: handle specific HTTP errors as needed, possibly including retry logic
raise
fhirclient/models/fhirsearch.py
Outdated
res = server.request_json(self.construct()) | ||
bundle = bundle.Bundle(res) | ||
bundle = bundle_module.Bundle(res) | ||
bundle.origin_server = server | ||
return bundle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not your doing, but this looks like a re-implementation of Bundle.read_from.
You might be able to reduce this whole method (post-deprecation-warning) to:
from .bundle import Bundle
return Bundle.read_from(self.construct(), server)
(since it even checks if server is None for you)
|
||
class TestFHIRSearchIter(unittest.TestCase): | ||
def setUp(self): | ||
# self.mock_server = MockServer(tmpdir=os.path.join(os.path.dirname(__file__), '..', 'data', 'examples')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✂️?
mock_perform.return_value = self.mock_bundle | ||
mock_iter_pages.side_effect = iter([self.mock_bundle]) | ||
mock_fetch_next_page.side_effect = [None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your tests are fine - but I will say that all of this mocking can end up being brittle. Like here, you are using knowledge of the internals of perform_iter
(that it calls perform()
) and then mocking out a lot of the detailed utility code - where there might be bugs! But it's generally easy to get lost in the sauce when doing all this mocking. If another dev breaks this test in six month, they might not have the understanding of which micro-targeted line of code you are meant to be testing.
I know that it's a bit philosophical territory about how small to make unit tests vs functional tests vs whatever. But just practically, it can be tough to mock all the little interactions like this.
This is entirely optional for this PR, but if I were writing tests for this, I might:
- Add a testing dependency on the responses mocking library. Like in pyproject.toml's optional-dependencies section, add
responses
to thetests
list of dependencies. - Then in tests like this, you can mock things at the network boundary:
@responses.activate
def test_perform_iter_single_bundle(self):
responses.add(responses.GET, "https://example.com/bundle-url/", json={bundle content...})
But again, what you have here is fine - just it can be annoying when writing tests to do all the mocking like this, trying to avoid the network. But responses
lets you mock further back.
2232f3a
to
bd60dc3
Compare
@mikix The latest version is ready for review. Thank you for steering me towards responses mocking library - It made my life so much easier :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fabulous - I think my only blocking concern is the authentication header comment.
fhir-parser
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - this is harmless, but it does point at a different commit, one from your branch. Since the content is the same, it's not necessary to figure out how to revert this - but maybe see if it's simple to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR is ready for review. I've reverted fhir-parser to the original commit, hopefully that should solve it.
fhirclient/_utils.py
Outdated
Optional[Bundle]: The next page of results as a FHIR Bundle, or None if no "next" link is found. | ||
""" | ||
if next_link := _get_next_link(bundle): | ||
sanitized_next_link = _sanitize_next_link(next_link) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this call could be dropped now that _get_next_link
does the sanitizing.
fhirclient/_utils.py
Outdated
response = requests.get(sanitized_url) | ||
response.raise_for_status() | ||
next_bundle_data = response.json() | ||
next_bundle = Bundle(next_bundle_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could/should this also use Bundle.read_from
? That would set a few more headers like Accept: application/json
and would also send authentication headers - which... we probably need here yeah?
It would mean sending down the FHIRServer
instance, but that should be available to the callers of this function.
@responses.activate | ||
def test_perform_iter_single_bundle(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah this looks much cleaner with responses
…to Bundle, move pagination logic to _utils.py, add tests Simplify url sanitization, adjust testse Replace redundant code on perform() Add tests perform-iter and perform_resources_iter via Response Revert fhir-parser submodule to original commit
bd60dc3
to
3ec3ef3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me!
Thank you so much for this work!
Overview
This pull request aims to address pagination functionalities as outlined in issue #172. The implementation includes the following key changes:
FHIRSearch
perform() and perform_resources with iterations.__iter__
method to theBundle
class to handle pagination._utils.py
for better modularity and reuse.