Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default training pipeline #184

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

kvantricht
Copy link
Contributor

@kvantricht kvantricht commented Oct 14, 2024

This PR adds the training pipelines for the default cropland model trained on global Presto embeddings for the WorldCereal reference database.

@kvantricht kvantricht requested a review from jdegerickx October 15, 2024 07:15
for class_nr in range(len(self.settings["classes"]))
]

model = CatBoostClassifier(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these model settings be configurable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all this filtering of reference datasets and filtering of samples based on rules does not feel generic.
It's also very much tuned to Phase I extractions.
We should probably get rid of this at some point?
Maybe even have a separate repository where we train the global models?

Copy link
Contributor Author

@kvantricht kvantricht Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, yes maybe, but this how the model is currently trained so we need to be transparent. I agree that these methods need to evolve in the coming months.

Copy link
Contributor

@jdegerickx jdegerickx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main comment is whether we really want to commit all this Phase I related cleaning of datasets to this repository?
I guess in the end all default models will be trained based on Phase II RDM samples and extractions.
So perhaps training of global models should (for now) be done in a separate repository, where we import functionality from worldcereal-classification?
Just a suggestion...

@kvantricht
Copy link
Contributor Author

main comment is whether we really want to commit all this Phase I related cleaning of datasets to this repository? I guess in the end all default models will be trained based on Phase II RDM samples and extractions. So perhaps training of global models should (for now) be done in a separate repository, where we import functionality from worldcereal-classification? Just a suggestion...

i don't feel like setting up yet another repository, especially not right now. My thought was to add what there is now for transparency, but I can also accept to not merge it for the time being.

@kvantricht kvantricht marked this pull request as draft October 15, 2024 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants