Skip to content

Populating Algorithms: Mogpr from Fusets #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from
Closed

Populating Algorithms: Mogpr from Fusets #80

wants to merge 26 commits into from

Conversation

Pratichhya
Copy link
Contributor

The service is also available at https://marketplace-portal.dataspace.copernicus.eu/catalogue/app-details/17

The modification in the openeo_udp of this repo in comparison to the existing one is:

  • The dependencies are passed within the files and need not pass separately in the job_options

@Pratichhya Pratichhya marked this pull request as draft January 9, 2025 12:53
@Pratichhya
Copy link
Contributor Author

@soxofaan @HansVRP
Before changing it as Ready from Draft, could you please provide me with your suggestion on:

  • Here, the input parameter is only datacube, and there is no definition/example on if there is a restriction to passing this as a parameter instead of spatial and temporal extent as done in other cases.
  • In the benchmark scenario, the process graph includes multiple nodes, but according to documentation(correct me if I am wrong) it should only have a single node that points to the namespace.

"rel": "openeo-process",
"type": "application/json",
"title": "openEO Process Definition",
"href": "https://raw.githubusercontent.com/ESA-APEx/apex_algorithms/refs/heads/mogpr_v1/openeo_udp/fusets_mogpr/fusets_mogpr.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this branch will probably be deleted afterwards, so make sure to update once merged

@soxofaan
Copy link
Contributor

Here, the input parameter is only datacube, and there is no definition/example on if there is a restriction to passing this as a parameter instead of spatial and temporal extent as done in other cases.

I'm not sure I understand what your are asking. Is this about raising the issue of lack of documentation? Or just if it is ok to use a single data cube parameter instead of extent parameters?

@soxofaan
Copy link
Contributor

In the benchmark scenario, the process graph includes multiple nodes, but according to documentation(correct me if I am wrong) it should only have a single node that points to the namespace.

in the docs I just see "typically a single node":

process graph will typically just contain a single node

so it does not say "it should". At first sight, I think it's ok to have more than one nodes in the benchmark process graph

Copy link
Contributor

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of notes

DEPENDENCIES_DIR2 = 'venv_static'

DEPENDENCIES_URL1 = "https://artifactory.vgt.vito.be:443/artifactory/auxdata-public/ai4food/fusets_venv.zip"
DEPENDENCIES_URL2 = "https://artifactory.vgt.vito.be:443/artifactory/auxdata-public/ai4food/fusets.zip"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is very valuable to define these as constants here. Each value is only used once, so you could just use these values directly in the setup_dependencies() calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the values directly in the setup_dependencies

@Pratichhya
Copy link
Contributor Author

Here, the input parameter is only datacube, and there is no definition/example on if there is a restriction to passing this as a parameter instead of spatial and temporal extent as done in other cases.

I'm not sure I understand what your are asking. Is this about raising the issue of lack of documentation?

Could be said such 👀 but No No not exactly. It is a concern on if there is standard way to do as for spatial_extent and temporal extent in terms of naming

Or just if it is ok to use a single data cube parameter instead of extent parameters?

Yes, this one esp, because I saw no example of doing such

@Pratichhya Pratichhya marked this pull request as ready for review January 14, 2025 14:46
@soxofaan
Copy link
Contributor

It is a concern on if there is standard way to do as for spatial_extent and temporal extent in terms of naming

as mentioned elsewhere in this issue, I'd recommend in most cases to use data for raster cubes, to align with standard openEO process naming. The only standard openEO process that deviates from this, as far as I could think of, is merge_cubes with cube1 and cube2

Or just if it is ok to use a single data cube parameter instead of extent parameters?

Yes, this one esp, because I saw no example of doing such

I think it depends basically, It depends on the usage/applicability of the "algorithm": is the algorithm tied closely to a certain data set (e.g. biopar stuff), then it makes sense to include the load_collection in the UDP. But if it is a generic algorithm (e.g. producing monthly composites of whatever you feed it), then you can only expect a generic data parameter to get the raster data cube.

@Pratichhya Pratichhya marked this pull request as draft January 19, 2025 15:29
@Pratichhya
Copy link
Contributor Author

@@ -0,0 +1,39 @@
# Sentinel-1 and Sentinel-2 data fusion through Multi-output Gaussian process regression (MOGPR)

This service is designed to enable multi-output regression analysis using Gaussian Process Regression (GPR) on geospatial data. It provides a powerful tool for understanding and predicting spatiotemporal phenomena by filling gaps based on other correlated indicators. This service focusses on the fusion of Sentinel-1 and Sentinel-2 data, allowing the user to select one of the predefined data sources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

focuses or focusses ?

@Pratichhya Pratichhya marked this pull request as ready for review January 29, 2025 10:04
@Pratichhya
Copy link
Contributor Author

Finally ready, I guess

Copy link
Contributor

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more notes :)


:param collection: openEO collection parameter
:param label: String representing the text with which the collection should match
:param callable: Function that is executed when the collection matches the label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit confusing: the value you pass to callable (also called "function" in other places of this script) is not a callable/function but a data cube

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this one, as the developer seems to mention it as a function everywhere. I will confirm and update it accordingly :)

Copy link
Contributor

@HansVRP HansVRP Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    for option in [
        {
            'label': 'grd',
            'function': _load_s1_grd_bands(connection=connection, polygon=polygon, date=date, bands=['VV', 'VH'])
        },

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a matter of syntax, here they pass a callable function which creates a cube.

For me it would be fine to keep things as is since @Pratichhya is not the original developer.
Do you agree @soxofaan?

Copy link
Contributor

@soxofaan soxofaan Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. _load_s1_grd_bands(...) is a function call , right, not a callable?
And it's results in a data cube, so you have a dictionary

{
    "function": <DataCube Object>
}

which looks weird to me.

But as mentioned elsewhere, don't let all my comments be blockers. I'm fine with accepting things as is, and clean up things later.

@HansVRP
Copy link
Contributor

HansVRP commented Feb 6, 2025

@Pratichhya for me it is good to merge, but first rebase to the local main and check that the unit tests are passing

@Pratichhya
Copy link
Contributor Author

@Pratichhya for me it is good to merge, but first rebase to the local main and check that the unit tests are passing

Rebased and the failure is not with the changes in this branch 😄

@Pratichhya
Copy link
Contributor Author

rebased again and no failure
@soxofaan @HansVRP
Can I go ahead and merge this PR?

@HansVRP HansVRP self-requested a review February 14, 2025 13:01
@soxofaan
Copy link
Contributor

I'm afraid there is some more work to be done as with #88 the whole folder structure has been changed, so you would have to move some files around in this PR

@HansVRP
Copy link
Contributor

HansVRP commented Feb 17, 2025

indeed, however, it should be simpler now since we have a folder per service!

@Pratichhya
Copy link
Contributor Author

Pratichhya commented Feb 17, 2025

This looks like a long chain of changes, thus to address the current change of structure, I migrated the changes to this fresh PR: #95 and closing this PR

@Pratichhya Pratichhya closed this Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants