Pass datetime format explicitly when parsing #34

mdavis-xyz · 2024-04-29T18:51:39Z

When I get any data, I get warnings from pandas about implicit parsing of datetimes. This makes me think I've done something wrong. But after checking, it seems I haven't.

Calling pd.to_datetime without passing an explicit datetime format string is risky. e.g. maybe it works for one person, but someone else has a different locale or timezone or whatever. (e.g. I'm running code in France, with a French locale.) So pandas may try to parse with YYYY/DD/MM or something crazy, and the user might not notice. (I'm thinking of cases like static tables, or small files with only 1 row.)

AEMO data comes with two timestamp formats. Most are "%Y/%m/%d %H:%M:%S". Some are `"%Y/%m/%d %H:%M:%S.%f" (new 5MS bidding stuff).

I added a test for this. I confirmed that the test fails with the old code, and passes with the new code. But I have not run all tests, to confirm that I didn't break anything (because of #16).
I manually confirmed that the datetime columns are still datetimes.

I also tidied up the exception handling. except Exception will catch things like a KeyboardInterrupt. So you try to halt your code (e.g. click the stop icon in Jupyter) but it doesn't stop. Catching precisely the type of exception that will be thrown is better.

There also were some other unit tests that failed for me. I modified the assertions so the failure error is clearer. (They still fail, because of a hard-coded path in defaults.py that doesn't exist on my machine. Perhaps we should find a way to generalise that?)

nick-gorman · 2024-04-30T03:25:50Z

@mdavis-xyz thanks for the pull request!

I'd just been looking at this issue while working on #33

I actually think we should pull out the datetime parsing functionality into a separate module and then import as a function into data_fetch_methods and filters.py, replacing the use of pd.to_datetime in filters.py.

On top of your addition of the "%Y/%m/%d %H:%M:%S.%f" case we also need to deal with the "%Y-%m-%d %H:%M:%S" which is now poping up in some of the 4sec FCAS data.

Does that make sense, and do you have bandwidth to make these changes? would be nice to include them with the changes for #33

The hardcoded path in defaults and can just be changed to any directory your happy to store the test in on your computer.

mdavis-xyz · 2024-05-05T19:31:28Z

I've refactored this into a separate module. But I haven't tested it yet. (I'm currently on a train with bad wifi. I can push commits, but not download big data.)

I note that __init__.py imports data_fetch_methods.py. But that imports with from . import ..., which is effectively importing __init__.py. So we have circular imports. Python can handle that. But when I added my new file, the circular imports got more complicated and broke the library. I think removing one line in __init__.py fixed that. No user should be accessing nemosis.data_fetch_methods directly, right?

nick-gorman · 2024-05-06T07:01:03Z

Fixed a couple of errors in the datetime parser and fixed up some tests that were broken because they depend on data availbility that changes with time.

Let me know what you think.

mdavis-xyz · 2024-05-07T11:46:31Z

Yes that looks good, thanks

Pass datetime format explicitly when parsing

18da58b

refactor date parsing into separate module

e6aaeda

nick-gorman added 2 commits May 6, 2024 16:58

fix infer column data types and tests

7330d2d

fix tests

6e4799c

Merge branch 'master' into pandas_warning

43b8631

nick-gorman merged commit 9ed5346 into UNSW-CEEM:master May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass datetime format explicitly when parsing #34

Pass datetime format explicitly when parsing #34

mdavis-xyz commented Apr 29, 2024 •

edited

Loading

nick-gorman commented Apr 30, 2024

mdavis-xyz commented May 5, 2024

nick-gorman commented May 6, 2024

mdavis-xyz commented May 7, 2024

Pass datetime format explicitly when parsing #34

Pass datetime format explicitly when parsing #34

Conversation

mdavis-xyz commented Apr 29, 2024 • edited Loading

nick-gorman commented Apr 30, 2024

mdavis-xyz commented May 5, 2024

nick-gorman commented May 6, 2024

mdavis-xyz commented May 7, 2024

mdavis-xyz commented Apr 29, 2024 •

edited

Loading