Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: can't compare offset-naive and offset-aware datetimes #679

Closed
cysabi opened this issue May 14, 2020 · 10 comments · Fixed by SimplicityGuy/dateparser#2
Closed

Comments

@cysabi
Copy link

cysabi commented May 14, 2020

This issue is not isolated inside the function. It permeates outside of the function.

It seems to start occurring after you use search_dates on relative and absolute datetimes at the same time.

Just use search_dates on this string:

May 2020
June 2020
2023
January UTC
June 5 am utc
June 23th 5 pm EST
May 31, 8am UTC
@cysabi
Copy link
Author

cysabi commented May 14, 2020

>>> from dateparser.search import search_dates
>>> search_dates("""May 2020
... June 2020
... 2023
... January UTC
... June 5 am utc
... June 23th 5 pm EST
... May 31, 8am UTC""")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/search/__init__.py", line 49, in search_dates
    result = _search_with_detection.search_dates(
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/conf.py", line 84, in wrapper
    return f(*args, **kwargs)
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/search/search.py", line 229, in search_dates
    return {'Language': language_shortname, 'Dates': self.search.search_parse(language_shortname, text,
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/search/search.py", line 161, in search_parse
    parsed, substrings = self.parse_found_objects(parser=parser, to_parse=translated,
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/search/search.py", line 125, in parse_found_objects
    parsed_item = self.parse_item(parser, item, translated[i], parsed, need_relative_base)
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/search/search.py", line 111, in parse_item
    parsed_item = parser.get_date_data(item)
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/date.py", line 417, in get_date_data
    parsed_date = _DateLocaleParser.parse(
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/date.py", line 196, in parse
    return instance._parse()
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/date.py", line 200, in _parse
    date_obj = self._parsers[parser_name]()
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/date.py", line 222, in _try_parser
    date_obj, period = date_parser.parse(
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/conf.py", line 84, in wrapper
    return f(*args, **kwargs)
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/date_parser.py", line 26, in parse
    date_obj, period = parse(date_string, settings=settings)
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/parser.py", line 72, in parse
    raise exceptions.pop(-1)
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/parser.py", line 66, in parse
    res = parser(datestring, settings)
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/parser.py", line 443, in parse
    dateobj = po._correct_for_time_frame(dateobj)
  File "/home/leptospira/lohi/.venv/lib/python3.8/site-packages/dateparser/parser.py", line 395, in _correct_for_time_frame
    if self.now < dateobj:
TypeError: can't compare offset-naive and offset-aware datetimes

@noviluni
Copy link
Collaborator

noviluni commented Jul 1, 2020

Related to: #491

@noviluni
Copy link
Collaborator

noviluni commented Jul 1, 2020

I've been investigating this. It is a weird bug, as the next works:

dates = ['May 2020', 'June 2020', '2023', 'January UTC', 'June 5 am utc', 'June 23th 5 pm EST', 'May 31, 8am UTC']

for date in dates:
    print(search_dates(date))

But when joining the dates it doesn't work:

print(search_dates('\n'.join(dates)))

This is because of the self.now of the parser.py.

I found a workaround: #736 but it seems that the core problem is the self.now. It's also the causing of some errors related to the multithreading: #441

I will check it when having a moment.

@jhawgs
Copy link

jhawgs commented Oct 10, 2020

I am working on a project that utilizes this search functionality, and I get the same error. Do you know whether there is a fix right around the corner? Otherwise, I’m happy to work on one. I just don’t want to submit an unnecessary pull if there’s gonna be a fix soon.

@noviluni
Copy link
Collaborator

Hi @jhawgs, sorry, I haven't time to spend on this. Please, go ahead!

I created this PR: #736, you can continue from that or submit a new/different PR. The only requirement is to add tests to check that nothing is broken and the coverage is still the same after merging the PR.

Thanks!

@jhawgs
Copy link

jhawgs commented Oct 15, 2020

I've been working on the bug, but I don't believe that I am familiar enough with the code as I would need to be to put together a comprehensive fix. I did, however, find that the error occurs when running a function that only seems to be useful when one has set the PREFER_DATES_FROM setting to something other than the default. Because I am not using this functionality, I am simply bypassing the function call. To do this I comment out the following line in parser.py:

dateobj = po._correct_for_time_frame(dateobj)

This may or may not be an option for anybody having the problem, but I certainly do not recommend it as a long term fix.

Also, I found some weird behavior that I don't think is expected, but I might be wrong. When running LeptoFlare's example, the parser tries to retain certain states across dates. This portion of the example:

>>> search_dates("""May 2020 ... June 2020 ... 2023 ... January UTC""")

gets the following dates:

2020-05-14 00:00:00 2020-06-14 00:00:00 2023-01-14 00:00:00 2023-01-14 00:00:00

Specifically it assumes that "January UTC" is in 2023 instead of reverting back to the current year and assuming that it refers to January of either 2021 or 2020. This might be expected behavior when leaving PREFER_DATES_FROM as the default "current_period," but when looking at the code that should change the date based on a new setting of "past" or "future," it looks like the only possibilities would be 2022 or 2023. This bug occurs in the same function call that I commented out, which suggests that anybody looking for a comprehensive fix for either issue should start at:

dateobj = po._correct_for_time_frame(dateobj) in parser.py.

@SimplicityGuy
Copy link
Contributor

SimplicityGuy commented Feb 11, 2021

Here's another string type that has the same error thrown: [02.10.21]

Granted this is ambiguous as to whether it's February 10, 2021 or October 2, 2021, however the same can't compare offset-naive and offset-aware datetimes is thrown.

EDIT:
This is so strange! If I just try that string ([02.10.21]) search_dates works fine. However, if I try it after I get the error, I then get the same error. See log:

>>> import dateparser
>>> import dateparser.search
>>> dateparser.search.search_dates("[02.10.21]")
[('02.10.21', datetime.datetime(2021, 2, 10, 0, 0))]

>>> dateparser.search.search_dates("""May 2020
... June 2020
... 2023
... January UTC
... June 5 am utc
... June 23th 5 pm EST
... May 31, 8am UTC""")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/__init__.py", line 49, in search_dates
    result = _search_with_detection.search_dates(
  File "/usr/local/lib/python3.9/site-packages/dateparser/conf.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/search.py", line 221, in search_dates
    return {'Language': language_shortname, 'Dates': self.search.search_parse(language_shortname, text,
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/search.py", line 153, in search_parse
    parsed, substrings = self.parse_found_objects(parser=parser, to_parse=translated,
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/search.py", line 117, in parse_found_objects
    parsed_item, is_relative = self.parse_item(parser, item, translated[i], parsed, need_relative_base)
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/search.py", line 104, in parse_item
    parsed_item = parser.get_date_data(item)
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 421, in get_date_data
    parsed_date = _DateLocaleParser.parse(
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 178, in parse
    return instance._parse()
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 182, in _parse
    date_data = self._parsers[parser_name]()
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 201, in _try_absolute_parser
    return self._try_parser(parse_method=_parse_absolute)
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 212, in _try_parser
    date_obj, period = date_parser.parse(
  File "/usr/local/lib/python3.9/site-packages/dateparser/conf.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/dateparser/date_parser.py", line 20, in parse
    date_obj, period = parse_method(date_string, settings=settings)
  File "/usr/local/lib/python3.9/site-packages/dateparser/parser.py", line 66, in _parse_absolute
    return _parser.parse(datestring, settings)
  File "/usr/local/lib/python3.9/site-packages/dateparser/parser.py", line 507, in parse
    dateobj = po._correct_for_time_frame(dateobj)
  File "/usr/local/lib/python3.9/site-packages/dateparser/parser.py", line 452, in _correct_for_time_frame
    if self.now < dateobj:
TypeError: can't compare offset-naive and offset-aware datetimes

can't compare offset-naive and offset-aware datetimes

>>> dateparser.search.search_dates("[02.10.21]")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/__init__.py", line 49, in search_dates
    result = _search_with_detection.search_dates(
  File "/usr/local/lib/python3.9/site-packages/dateparser/conf.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/search.py", line 221, in search_dates
    return {'Language': language_shortname, 'Dates': self.search.search_parse(language_shortname, text,
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/search.py", line 153, in search_parse
    parsed, substrings = self.parse_found_objects(parser=parser, to_parse=translated,
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/search.py", line 117, in parse_found_objects
    parsed_item, is_relative = self.parse_item(parser, item, translated[i], parsed, need_relative_base)
  File "/usr/local/lib/python3.9/site-packages/dateparser/search/search.py", line 95, in parse_item
    pre_parsed_item = parser.get_date_data(item)
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 421, in get_date_data
    parsed_date = _DateLocaleParser.parse(
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 178, in parse
    return instance._parse()
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 182, in _parse
    date_data = self._parsers[parser_name]()
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 201, in _try_absolute_parser
    return self._try_parser(parse_method=_parse_absolute)
  File "/usr/local/lib/python3.9/site-packages/dateparser/date.py", line 212, in _try_parser
    date_obj, period = date_parser.parse(
  File "/usr/local/lib/python3.9/site-packages/dateparser/conf.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/dateparser/date_parser.py", line 20, in parse
    date_obj, period = parse_method(date_string, settings=settings)
  File "/usr/local/lib/python3.9/site-packages/dateparser/parser.py", line 66, in _parse_absolute
    return _parser.parse(datestring, settings)
  File "/usr/local/lib/python3.9/site-packages/dateparser/parser.py", line 507, in parse
    dateobj = po._correct_for_time_frame(dateobj)
  File "/usr/local/lib/python3.9/site-packages/dateparser/parser.py", line 467, in _correct_for_time_frame
    if self.now < dateobj:
TypeError: can't compare offset-naive and offset-aware datetimes

can't compare offset-naive and offset-aware datetimes
>>>

@SimplicityGuy
Copy link
Contributor

The issue here is not one of multi-threading or self.now. The issue is that self.now gets set to a timezone unaware (e.g. naive) time. I'm investigating a fix here.

@SimplicityGuy
Copy link
Contributor

Fixed in PR #881. Please take a look.

noviluni pushed a commit that referenced this issue Feb 18, 2021
…ess issue (#881)

* Reducing cyclomatic complexity of complex code (#1)

* fix: scoping where the language is retrieved
* chore: reduce cyclomatic complexity and indentation on complicated code

* fix: addresses #679 by handling offset awareness.

* test: adding test cases for dates with time and timezone.

* test: additional tests to cover the cases where RELATIVE_BASE has timezone info.

* fix: changing a condition that can't currently be hit to an assert so that future maintainers can address this if the condition becomes possible.

* fix: fix the message for the assert to provide better guidance.
@noviluni
Copy link
Collaborator

PR #881 has been merged, so we can close this issue now :)

thomasbird pushed a commit to LeapBeyond/scrubadub that referenced this issue Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants