Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't download October 2022 Predispatch forecast #56

Open
notuntoward opened this issue Nov 15, 2023 · 5 comments
Open

Can't download October 2022 Predispatch forecast #56

notuntoward opened this issue Nov 15, 2023 · 5 comments

Comments

@notuntoward
Copy link

notuntoward commented Nov 15, 2023

I'm trying to download the predispatch price forecast data over the time range:

cache_start_time = "2021/01/01 00:00:00"
cache_end_time = "2023/10/31 00:00:00"

This range should be available, according to the output of:

nemseer.get_data_daterange()

However, when I run the download (see attached jupyter notebook), I get the following error message:

HTTPError: 404 Client Error: Not Found for url: http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2022/MMSDM_2022_10/MMSDM_Historical_Data_SQLLoader/PREDISP_ALL_DATA/PUBLIC_DVD_PREDISPATCHPRICE_202210010000.zip

It's expecting a file with basename:

PUBLIC_DVD_PREDISPATCHPRICE_202210010000.zip

but the basename that actually exists in that directory is:

PUBLIC_DVD_PREDISPATCHCONSTRAINT1_202210010000.zip

Do you have any suggestions?

mkFeats_ERROR_REPRO_is_really_ipynb.txt

@prakaa
Copy link
Collaborator

prakaa commented Nov 16, 2023

Hi @notuntoward

nemseer.get_data_daterange() just checks if a particular month has data - i.e. it scrapes something like http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2022/ and then sees that all months (01-12) are available. It doesn't check whether particular data files are available.

As you say, it looks like Oct 2022 only has PREDISPATCHCONSTRAINT1 available and not PREDISPATCHPRICE. This is an issue at AEMO's end as they haven't made the data available for that month.

There are 3 things you could do:

  1. Skip data for October 2022
  2. Contact AEMO and ask for this data. I have done this and they have provided it (though sometimes this can take a few weeks/months)
  3. Get PREDISPATCHPRICE_D for October 2022, though this only contains the last forecast for each interval

@notuntoward
Copy link
Author

Thanks very much for the quick answer. Guess I'll skip October for now and email AEMO and get it eventually.

@notuntoward
Copy link
Author

Coming back to this problem of missing data, does NEMSEER have a way of telling which dates to avoid downloading? I'm hitting dates with bad data again, and it would be better if I knew all of the bad ones ahead of time, so that I could programmatically patch around them, avoiding the crashes.

@prakaa
Copy link
Collaborator

prakaa commented Feb 15, 2024

Hi @notuntoward,

I didn't build that functionality into NEMSEER because I didn't encounter that issue early on.

That being said, it's something I would be open to implementing.

What I did implement was a stub file (.invalid_aemo_files.txt) that was written whenever a bad zipfile was encountered. The downloader checks the stub file and doesn't download that file again if it's found to be corrupt:

def _invalid_zip_to_file(invalid_files: Path, filename: str) -> None:
"""Ensure that any invalid file is noted in the `invalid_files` text file"""
with open(invalid_files, "a+") as f:
f.seek(0)
existing = [line.strip() for line in f.readlines()]
if filename in existing:
pass
else:
f.write(f"{filename}\n")
return None

An ideal in-built solution would probably use a try...except clause for the download (probably somewhere here, or in the Downloader class) and also write to a stub file with a name like .missing_aemo_files.txt. Then the Downloader would check both stubs before downloading files, and issue a warning if there were missing or invalid files. I wouldn't try to built in a "file verifier" (i.e. a scraper that checks files at a URL) - this is easy to do by extending existing code but it will add latency to downloading files via NEMSEER.

I'm quite busy finishing up writing my thesis at the moment. If you have some time @notuntoward I'd be happy to review a pull request if you're able to pull one together.

Abi

@notuntoward
Copy link
Author

Thanks for the suggestions. I know how thesis writing goes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants