Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass datetime format explicitly when parsing #34

Merged
merged 5 commits into from
May 16, 2024

Conversation

mdavis-xyz
Copy link
Contributor

@mdavis-xyz mdavis-xyz commented Apr 29, 2024

When I get any data, I get warnings from pandas about implicit parsing of datetimes. This makes me think I've done something wrong. But after checking, it seems I haven't.

Calling pd.to_datetime without passing an explicit datetime format string is risky. e.g. maybe it works for one person, but someone else has a different locale or timezone or whatever. (e.g. I'm running code in France, with a French locale.) So pandas may try to parse with YYYY/DD/MM or something crazy, and the user might not notice. (I'm thinking of cases like static tables, or small files with only 1 row.)

AEMO data comes with two timestamp formats. Most are "%Y/%m/%d %H:%M:%S". Some are `"%Y/%m/%d %H:%M:%S.%f" (new 5MS bidding stuff).

I added a test for this. I confirmed that the test fails with the old code, and passes with the new code. But I have not run all tests, to confirm that I didn't break anything (because of #16).
I manually confirmed that the datetime columns are still datetimes.

I also tidied up the exception handling. except Exception will catch things like a KeyboardInterrupt. So you try to halt your code (e.g. click the stop icon in Jupyter) but it doesn't stop. Catching precisely the type of exception that will be thrown is better.

There also were some other unit tests that failed for me. I modified the assertions so the failure error is clearer. (They still fail, because of a hard-coded path in defaults.py that doesn't exist on my machine. Perhaps we should find a way to generalise that?)

@nick-gorman
Copy link
Member

@mdavis-xyz thanks for the pull request!

I'd just been looking at this issue while working on #33

I actually think we should pull out the datetime parsing functionality into a separate module and then import as a function into data_fetch_methods and filters.py, replacing the use of pd.to_datetime in filters.py.

On top of your addition of the "%Y/%m/%d %H:%M:%S.%f" case we also need to deal with the "%Y-%m-%d %H:%M:%S" which is now poping up in some of the 4sec FCAS data.

Does that make sense, and do you have bandwidth to make these changes? would be nice to include them with the changes for #33

The hardcoded path in defaults and can just be changed to any directory your happy to store the test in on your computer.

@mdavis-xyz
Copy link
Contributor Author

I've refactored this into a separate module. But I haven't tested it yet. (I'm currently on a train with bad wifi. I can push commits, but not download big data.)

I note that __init__.py imports data_fetch_methods.py. But that imports with from . import ..., which is effectively importing __init__.py. So we have circular imports. Python can handle that. But when I added my new file, the circular imports got more complicated and broke the library. I think removing one line in __init__.py fixed that. No user should be accessing nemosis.data_fetch_methods directly, right?

@nick-gorman
Copy link
Member

Fixed a couple of errors in the datetime parser and fixed up some tests that were broken because they depend on data availbility that changes with time.

Let me know what you think.

@mdavis-xyz
Copy link
Contributor Author

Yes that looks good, thanks

@nick-gorman nick-gorman merged commit 9ed5346 into UNSW-CEEM:master May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants