Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to fetch OpenMeteo Data(NWP Forecast and Historical data) #93

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

praj-tarun
Copy link

Pull Request

Description

This pull request adds functionality to fetch both forecast and historical weather data using the OpenMeteo API. It introduces a new class WeatherDataFetcher to encapsulate the data fetching logic and provides methods to fetch forecast data for specific models and historical data from the OpenMeteo API.

  • Created a new class WeatherDataFetcher to handle weather data fetching.
  • Implemented methods to fetch forecast data (fetch_forecast_data) and historical data (fetch_historical_data).
  • Added a method to process hourly data (process_hourly_data) extracted from the API response.
  • Implemented a method to print location information (print_location_info) extracted from the API response.

Fixes #90

Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thank you for this work! Overall this is a good start, and thank you for spending the time on this.

For this repo, we ideally want to return a grid of data from OpenMeteo to train/run inference on it, not individual points. I know OpenMeteo's data is more aligned for individual points, but we would ideally extract values for a grid of latitude/longitude pairs that covers the globe, and return the data as an Xarray object. We want it to work similarly to getting data from WeatherBench 2 (see #86) where the data is in a regular grid across the whole world and is returned as an Xarray object. OpenMeteo includes models that are not global, so the output does not need to be a global grid of lat/lon data, but it should still be a grid, rather than a single point at a time, if that makes sense?

I would also recommend moving this under the graph_weather/data folder, so its in with the rest of the repo, and easier to test. Although not necessary for the PR, ideally, there would be a single interface for getting weather data from different sources (WeatherBench 2, OpenMeteo, potentially others) and outputs a uniform format for the data, so having this code be in that folder helps with that.

Again, thanks for all the work! Obviously happy to answer more questions and work with you on this PR to get it merged and used!

from retry_requests import retry


class WeatherDataFetcher:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be changed the following, so that the name is more descriptive of the data it is getting

Suggested change
class WeatherDataFetcher:
class OpenMeteoWeatherDataFetcher:

retry_session = retry(cache_session, retries=5, backoff_factor=0.2)
self.openmeteo = openmeteo_requests.Client(session=retry_session)

def fetch_forecast_data(self, NWP, params):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def fetch_forecast_data(self, NWP, params):
def fetch_forecast_data(self, nwp, params):

We don't want to hard code the NWP that we are using. Ideally, we also want type hints for the inputs and outputs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I'll add type hints as per your suggestion.

hourly = response.Hourly()

# Extract variables
hourly_variables = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the variables that are extracted are not hardcoded, but can be passed in as arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surely will do that, but could you please provide guidance on which variables should be included in the Xarray Dataset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this reply slipped through, but I would go with by default, all available ones, and make one of the arguments a list of parameter names. I think there should be a way to get all the available parameters for a model from the API or something?

hourly_data[variable_name] = variable_values

# Create a DataFrame from the dictionary
hourly_dataframe = pd.DataFrame(data=hourly_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, we want to have the data be returned in an Xarray Dataset, that has coordinates of latitude, longitude, and time_utc, and then the variables and dataarrays in the Dataset.

print(f"Timezone difference to GMT+0 {response.UtcOffsetSeconds()} s")


def main():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be great in the tests folder, as a pytest test! So then we can automatically run this on all code changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jacobbieker,
I'm encountering an issue while creating an xarray dataset with the OpenMeteo data due to dimension problems. Although I'm able to successfully fetch datasets for multiple coordinates, I'm facing challenges with dimension handling. although the len of dims are same, still!
image

I'm planning to add an argument for NWP (Numerical Weather Prediction) if we need to specify a particular NWP in the function. What do you think about this approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is a bit hard to debug from this, but if you add to each data point the coord latitude and longitude, that might then work to reshape into a grid?

For adding an argument to specify the NWP, that is perfect! We want to be able to access all the NWPs from OpenMeteo from this, so that would be ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for using OpenMeteo Open Dataset for training/inference
2 participants