Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

44 translate cropno crop inference udf to gfmap #48

Merged
merged 38 commits into from
Jun 12, 2024

Conversation

GriffinBabe
Copy link
Contributor

FYI: The branch hv_mvp_inferenceUDF is already merged here

About the folder minimal_wc_presto: I deleted all the duplicates UDFs. I suppose that want to keep the dependency code somewhere, but should we leave this here?

I added the Feature Extractor and Model Inference classes to the module's source, as well as updated a script to perform cropland mapping from coordinates provided in the command lines.

The UDFs are now using all the exporter dependencies packed by @HansVRP and are using GFMAP for fetching the data from the collections, resulting in less (and hopefully for you, more readable) code.

I also took the liberty in deleting an old Feature Extractor UDF (I suppose we don't need it anymore)

HansVRP and others added 30 commits May 6, 2024 15:42
…rely on remote mvp_wc_presto for fast debugging
…eal-classification into kvt_mvp_inferenceUDF

Conflicts:
	minimal_wc_presto/backend_inference_example_openeo.ipynb
	minimal_wc_presto/mvp_wc_presto/world_cereal_inference.py
	minimal_wc_presto/udf_long_worldcereal_inference.py
…ps://github.com/WorldCereal/worldcereal-classification into 44-translate-cropno-crop-inference-udf-to-gfmap

Conflicts:
	minimal_wc_presto/test_cropland_gfmap.py
	scripts/inference/cropland_mapping.py
	src/worldcereal/openeo/feature_extractor.py
	src/worldcereal/openeo/inference.py
@GriffinBabe GriffinBabe requested review from HansVRP and kvantricht May 31, 2024 14:31
@GriffinBabe GriffinBabe self-assigned this May 31, 2024
@GriffinBabe GriffinBabe linked an issue May 31, 2024 that may be closed by this pull request
Copy link
Contributor

@kvantricht kvantricht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GriffinBabe provided quite a few comments for you to consider. Let's discuss wherever needed.

minimal_wc_presto/ONNX_conversion.py Outdated Show resolved Hide resolved
scripts/inference/cropland_mapping.py Outdated Show resolved Hide resolved
scripts/inference/cropland_mapping.py Show resolved Hide resolved
scripts/inference/cropland_mapping.py Show resolved Hide resolved
src/worldcereal/openeo/feature_extractor.py Outdated Show resolved Hide resolved
src/worldcereal/openeo/feature_extractor.py Outdated Show resolved Hide resolved
src/worldcereal/openeo/inference.py Show resolved Hide resolved
src/worldcereal/openeo/inference.py Outdated Show resolved Hide resolved
src/worldcereal/openeo/preprocessing.py Outdated Show resolved Hide resolved
src/worldcereal/openeo/preprocessing.py Outdated Show resolved Hide resolved
@GriffinBabe GriffinBabe requested a review from kvantricht June 3, 2024 14:58
@kvantricht
Copy link
Contributor

@GriffinBabe I pushed a new Presto wheel to Artifactory and added what is supposed to become the new way of using dependency wheel in the UDF in my commit. Once this works, we may even not need the dependency zip in this UDF anymore (to be verified). Good luck testing the new functionality. Let me know if you need anything from my side.

@HansVRP
Copy link
Contributor

HansVRP commented Jun 3, 2024

I am also rehashing over the UDF trying to load the required libraries from within the class initiation. I already managed to pull the catboost predictor definition out of the UDF_long. I will try to do the same for the Presto feature extractor.

If it works we might not to upload mvp_world_cereal inference anywhere

@kvantricht
Copy link
Contributor

I am also rehashing over the UDF trying to load the required libraries from within the class initiation. I already managed to pull the catboost predictor definition out of the UDF_long. I will try to do the same for the Presto feature extractor.

If it works we might not to upload mvp_world_cereal inference anywhere

This may be parallel work with what I’m doing Hans, as I discussed with Darius and migrated all that functionality to presto-WorldCereal library where it belongs. We want to test the new functionality defining an external whl on the top of the UDF.

@HansVRP
Copy link
Contributor

HansVRP commented Jun 4, 2024

I am also rehashing over the UDF trying to load the required libraries from within the class initiation. I already managed to pull the catboost predictor definition out of the UDF_long. I will try to do the same for the Presto feature extractor.
If it works we might not to upload mvp_world_cereal inference anywhere

This may be parallel work with what I’m doing Hans, as I discussed with Darius and migrated all that functionality to presto-WorldCereal library where it belongs. We want to test the new functionality defining an external whl on the top of the UDF.

Did PrestoFeatureExtractor also end up in the whl? It is indeed not required!

@kvantricht
Copy link
Contributor

I am also rehashing over the UDF trying to load the required libraries from within the class initiation. I already managed to pull the catboost predictor definition out of the UDF_long. I will try to do the same for the Presto feature extractor.
If it works we might not to upload mvp_world_cereal inference anywhere

This may be parallel work with what I’m doing Hans, as I discussed with Darius and migrated all that functionality to presto-WorldCereal library where it belongs. We want to test the new functionality defining an external whl on the top of the UDF.

Did PrestoFeatureExtractor also end up in the whl? It is indeed not required!

Presto-related stuff now went here: https://github.com/WorldCereal/presto-worldcereal/blob/openeo_inference/presto/inference.py

@kvantricht kvantricht linked an issue Jun 7, 2024 that may be closed by this pull request
@GriffinBabe
Copy link
Contributor Author

@kvantricht @HansVRP can you review again?

def extract(self, image):
pass

ONNX_DEPS_URL = "https://artifactory.vgt.vito.be/artifactory/auxdata-public/openeo/onnx_dependencies_1.16.3.zip"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this still needed? onnxruntime is included in the worldcereal_deps.zip. And if it's still needed, shouldn't we also put it on s3?

Copy link
Contributor Author

@GriffinBabe GriffinBabe Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's much lighter than all the presto dependency so it's a waste of resources to install everything just for onnxruntime. Also it's a requirement in GFMAP for the ModelInference subclasses, until the pip dependency feature is more stable.

Indeed, could you upload it to the S3 @HansVRP ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that for each single UDF in the chain those dependencies get downloaded/unpacked again? That's also lots of overhead I guess?

Copy link
Contributor Author

@GriffinBabe GriffinBabe Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's just onnxruntime that get's reinstalled (quite light), compared to the full UDF I don't think there is a large difference in execution time, even less with large spatial extents probably

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cached, they should not be reloaded each time.

scripts/inference/cropland_mapping.py Show resolved Hide resolved
out_format="NetCDF",
job_options={
"driver-memory": "4g",
"executor-memoryOverhead": "12g",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very high. Need to flag this for further profiling. But for now we can leave it like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have seen it is a potential overkill, 200 km square ran on half the amount of memory for me. @GriffinBabe do you have a job id for a job which needed more memory?

scripts/inference/cropland_mapping.py Show resolved Hide resolved
BASE_URL = "https://s3.waw3-1.cloudferro.com/swift/v1/project_dependencies" # NOQA
DEPENDENCY_NAME = "worldcereal_deps.zip"

GFMAP_BAND_MAPPING = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in fact a bit an overhead, if later on in presto-worldcereal we again remap: https://github.com/WorldCereal/presto-worldcereal/blob/main/presto/inference.py#L41

So we should consider to immediately use the presto naming here and remove the remapping in presto-worldcereal

@GriffinBabe GriffinBabe merged commit 77d5efc into main Jun 12, 2024
1 check passed
@GriffinBabe GriffinBabe deleted the 44-translate-cropno-crop-inference-udf-to-gfmap branch June 12, 2024 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Translate crop/no-crop inference UDF to GFMAP Implement WorldCereal inference pipeline using GFMAP
3 participants