-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement FHIR Bulk Export client. #1781
Conversation
Adding a basic progress tracking callback mechanism.
Improved exception reporting.
…u.csiro.pathling logger)
Adding support for handling of Too Many Requests (429) status.
…also support a timestamp (which is what we are getting from the test server).
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1781 +/- ##
============================================
- Coverage 84.67% 83.94% -0.74%
Complexity 139 139
============================================
Files 341 384 +43
Lines 7868 8818 +950
Branches 528 592 +64
============================================
+ Hits 6662 7402 +740
- Misses 912 1067 +155
- Partials 294 349 +55 ☔ View full report in Codecov by Sentry. |
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
…n bulkexport client.
Adding unit tests for AuthTokenProvider
- adding support for SMART configuration discovery - removing 'identifier' field from FHRI Reference()
Adding an example of python bulk export from Smart Health IT server with asymmetric keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of your work on this @piotrszul!
I've gone through it and come up with some feedback for you to take a look at.
bulkclient/pom.xml
Outdated
<artifactId>*</artifactId> | ||
</exclusion> | ||
</exclusions> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this dependency as the first step to separating this module from the main Pathling code base.
* Constants for authentication. | ||
*/ | ||
@UtilityClass | ||
public class AuthConst { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps these could be moved into SmartClientAuthMethodBase?
Or into the specific subclass where they are more specific, e.g. CLIENT_ASSERTION_TYPE_JWT_BEARER
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reafactored.
* Authentication method for one of the FHIR SMART client authentication profiles. | ||
*/ | ||
|
||
public interface ClientAuthMethod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this level of interface? Aren't all the authentication methods we support SMART client authentication methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored.
return jwkAlgorithm | ||
.replace("RS", "RSA") | ||
.replace("ES", "ECDSA") | ||
.replace("HC", "HMAC"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a undocumented supported algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should actually be "HS" not "HC" as if HS256.
it's one of the symmetric signing algorithms supported by auth0.
Added for completness as it's supported by 'Algorithm' class but we do not use that since we only expose asymmetric algorithms.
auth_scope="system/*.read", | ||
), | ||
).export() | ||
print(f"TransactionTime: {as_fhir_instant(result.transaction_time)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the following error when running this script. This is probably because I didn't set the client secret environment variable, but perhaps it should handle this more gracefully.
:: loading settings :: url = jar:file:/opt/homebrew/Caskroom/miniconda/base/envs/pathling-dev/lib/python3.12/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/gri306/.ivy2/cache
The jars for the packages stored in: /Users/gri306/.ivy2/jars
au.csiro.pathling#library-runtime added as a dependency
io.delta#delta-core_2.12 added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-56feae64-188a-4257-9fe1-6edb5a66c4cd;1.0
confs: [default]
found au.csiro.pathling#library-runtime;6.6.0-SNAPSHOT in local-m2-cache
found io.delta#delta-core_2.12;2.4.0 in local-m2-cache
found io.delta#delta-storage;2.4.0 in local-m2-cache
found org.antlr#antlr4-runtime;4.9.3 in local-m2-cache
found org.apache.hadoop#hadoop-aws;3.3.4 in local-m2-cache
found com.amazonaws#aws-java-sdk-bundle;1.12.262 in local-m2-cache
found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in local-m2-cache
:: resolution report :: resolve 172ms :: artifacts dl 7ms
:: modules in use:
au.csiro.pathling#library-runtime;6.6.0-SNAPSHOT from local-m2-cache in [default]
com.amazonaws#aws-java-sdk-bundle;1.12.262 from local-m2-cache in [default]
io.delta#delta-core_2.12;2.4.0 from local-m2-cache in [default]
io.delta#delta-storage;2.4.0 from local-m2-cache in [default]
org.antlr#antlr4-runtime;4.9.3 from local-m2-cache in [default]
org.apache.hadoop#hadoop-aws;3.3.4 from local-m2-cache in [default]
org.wildfly.openssl#wildfly-openssl;1.0.7.Final from local-m2-cache in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 7 | 0 | 0 | 0 || 7 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-56feae64-188a-4257-9fe1-6edb5a66c4cd
confs: [default]
0 artifacts copied, 7 already retrieved (0kB/8ms)
24/04/11 11:15:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/04/11 11:15:15 DEBUG PathlingContext: Setting log level for 'au.csiro.pathling' to DEBUG
24/04/11 11:15:16 WARN SimpleFunctionRegistry: The function date_diff replaced a previously registered function.
Exporting from: https://aehrc-cdr.cc/fhir_r4 to /Users/gri306/Code/pathling/bulkclient/target/export-1712798116
24/04/11 11:15:16 DEBUG BulkExportClient: Creating FileStore of: au.csiro.pathling.library.fs.HdfsFileStoreFactory@36abae5e for outputDir: /Users/gri306/Code/pathling/bulkclient/target/export-1712798116
24/04/11 11:15:16 DEBUG BulkExportClient: Creating HttpClient with configuration: HttpClientConfiguration(socketTimeout=60000, maxConnectionsTotal=32, maxConnectionsPerRoute=16, retryEnabled=true, retryCount=2)
24/04/11 11:15:16 DEBUG SMARTTokenCredentialFactory: Retrieving SMART configuration for fhirEndpoint: https://aehrc-cdr.cc/fhir_r4
24/04/11 11:15:16 DEBUG SMARTTokenCredentialFactory: SMART configuration retrieved: SMARTDiscoveryResponse(tokenEndpoint=https://aehrc-cdr.cc/smartsec_r4/oauth/token, grantTypesSupported=null, tokenEndpointAuthMethodsSupported=null, tokenEndpointAuthSigningAlgValuesSupported=null, capabilities=[launch-ehr, launch-standalone, client-public, client-confidential-symmetric, client-confidential-asymmetric, sso-openid-connect, context-ehr-patient, context-standalone-patient, permission-offline, permission-patient])
Traceback (most recent call last):
File "/Users/gri306/Code/pathling/bulkclient/examples/bulk_export_smilecdr.py", line 66, in <module>
).export()
^^^^^^^^
File "/Users/gri306/Code/pathling/lib/python/pathling/bulkexport.py", line 157, in export
return BulkExportResult.from_java(self._jclient.export())
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pathling-dev/lib/python3.12/site-packages/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pathling-dev/lib/python3.12/site-packages/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pathling-dev/lib/python3.12/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o103.export.
: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:221)
at au.csiro.pathling.auth.ClientAuthMethod.create(ClientAuthMethod.java:123)
at au.csiro.pathling.auth.SMARTTokenCredentialFactory.createSMARTAuthMethod(SMARTTokenCredentialFactory.java:123)
at au.csiro.pathling.auth.SMARTTokenCredentialFactory.createAuthMethod(SMARTTokenCredentialFactory.java:105)
at au.csiro.pathling.auth.SMARTTokenCredentialFactory.createCredentials(SMARTTokenCredentialFactory.java:91)
at au.csiro.pathling.export.BulkExportClient.createHttpClient(BulkExportClient.java:365)
at au.csiro.pathling.export.BulkExportClient.export(BulkExportClient.java:250)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be resolved by JSR-380 validation of config.
I was having some issues with the making it work (in this project - some dependency issues).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added JRS-380 validation for the client configuration and it's subconfig beans.
bulkclient/src/main/java/au/csiro/pathling/export/ws/BulkExportTemplate.java
Show resolved
Hide resolved
bulkclient/src/main/java/au/csiro/pathling/export/ws/BulkExportRequest.java
Show resolved
Hide resolved
*/ | ||
@Nonnull | ||
@Builder.Default | ||
List<Reference> patient = Collections.emptyList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the full list of parameters that I can use? Is it possible to pass parameters that are not in this class, e.g. includeAssociatedData
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the full list.
Actually whatever is defined in BulkExportClient
but this is then mapped to the BulkExportRequest
.
I think includeAssociatedData
is the only one missing from the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add this to make the list complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented.
executorServiceResource.getExecutorService()); | ||
|
||
final BulkExportResult result = doExport(fileStore, bulkExportTemplate, downloadTemplate); | ||
log.info("Export successful: {}", result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we issue a Bulk Data Delete Request once the download has successfully completed (if the server supports it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec say that we MAY issue it to either cancel the request in progress or to cleanup.
So I guess it depends what we want to consider to be required for the first release of the bulk client.
I this this might be something to be added subsequent releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add this to mitigate the problem of exported data accumulating in the source server, which has been a problem that we have encountered with HAPI JPA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented.
bulkclient/src/test/java/au/csiro/pathling/example/RunBulkExport.java
Outdated
Show resolved
Hide resolved
Minor spelling fixes.
…thentication method classes.
… of an error or successful download.
We ended up shipping this client within a separate repo: https://github.com/aehrc/fhir-bulk-java |
Resolves #1776