Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make UnivariateDriftCalculator (and other objects) JSON serializable #394

Open
KGoldsmith11 opened this issue Jun 4, 2024 · 0 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@KGoldsmith11
Copy link

KGoldsmith11 commented Jun 4, 2024

Currently, the UnivariateDriftCalculator object supports serialization via pickle. However, this format is not compatible with Apache Spark, which I intend to use for processing. on the inference side.

For governance reasons, I need to fit the drift calculator object in a different machine to the one where I will perform inference and have access to the analysis chunks, and the machine performing inference uses spark. Therefore I need to fit the object, serialise it, move it to the inference machine, load it in pyspark and then calculate the data drift on the inference data. This is not working using pickle but spark does have json load methods (spark doesnt have pickle loading methods).

JSON would be a good alternative to pickle as there are json load methods in spark.

@KGoldsmith11 KGoldsmith11 added the enhancement New feature or request label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants