Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept optional date in process_parquet for more precise subsetting #95

Closed
kvantricht opened this issue Aug 21, 2024 · 2 comments
Closed
Assignees

Comments

@kvantricht
Copy link
Contributor

Currently we take end_date of training data and go back one year to subset training time series:

# df['start_date'] = seasons.get_season_start(df[['lat','lon']])
# For now, in absence of a relevant start_date, we get time difference with respect
# to end_date so we can take 12 months counted back from end_date
df["valid_date_ind"] = (
(((df["timestamp"] - df["end_date"]).dt.days + 365) / 30).round().astype(int)
)
# Now reassign start_date to the actual subset counted back from end_date
df["start_date"] = df["end_date"] - pd.DateOffset(years=1) + pd.DateOffset(days=1)
df_pivot = df[(df["valid_date_ind"] >= 0) & (df["valid_date_ind"] < 12)].pivot(
index=index_columns, columns="valid_date_ind", values=feature_columns
)

However, if we train a dedicated CatBoost on a subset of data for a small AOI, we may benefit from subsetting based on the requested start_date and end_date (has to be one year) by the user. Can we adapt the method to accept an optional argument, e.g. end_date which - if given - dictates the subsetting of the timeseries?

We then need to be careful to be resilient to different years of the training data, and also drop samples that don't fall entirely within the requested time frame (adapted for the year of the sample).

@kvantricht
Copy link
Contributor Author

@cbutsko I think we can close this? It's tackled on worldcereal-classification side?

@cbutsko
Copy link

cbutsko commented Nov 22, 2024

This will be tackled in this issue

@cbutsko cbutsko closed this as completed Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants