Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing #894 and correcting tests #901

Closed
wants to merge 28 commits into from
Closed

Fixing #894 and correcting tests #901

wants to merge 28 commits into from

Conversation

gavishpoddar
Copy link
Contributor

@gavishpoddar gavishpoddar commented Mar 30, 2021

Hi, @Gallaecio ,

I have fixed #894 and submitted PR #900 but it failed because the tests didn't supported am so tests failed.

In this PR the tests and #894 is fixed.

It seems removing this line resolves the issue #894 : search_dates() wrong result for 12:xx am
The test files were corrected to recognize am previously the am was ignored because of issue #894

After merging the test file  issue #894 is resolved
@codecov
Copy link

codecov bot commented Mar 30, 2021

Codecov Report

Merging #901 (ff1e856) into master (803d445) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #901   +/-   ##
=======================================
  Coverage   98.26%   98.26%           
=======================================
  Files         231      231           
  Lines        2597     2599    +2     
=======================================
+ Hits         2552     2554    +2     
  Misses         45       45           
Impacted Files Coverage Δ
dateparser/search/search.py 99.35% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 803d445...ff1e856. Read the comment docs.

Copy link
Collaborator

@noviluni noviluni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some doubts I have. I'm not also sure if we can just modify the parse_item and the parse_found_objects signature directly...

dateparser/search/search.py Outdated Show resolved Hide resolved
dateparser/search/search.py Outdated Show resolved Hide resolved
dateparser/search/search.py Outdated Show resolved Hide resolved
dateparser/search/search.py Outdated Show resolved Hide resolved
@gavishpoddar
Copy link
Contributor Author

Put together

The German language date parsing needs some correction but this PR fixes AM PM for other languages including English.

tests/test_search.py Outdated Show resolved Hide resolved
@gavishpoddar
Copy link
Contributor Author

Hi, @noviluni I have made the changes, and thanks for the review

Comment on lines 93 to 94
if language == "de":
item = item.replace('am', '')
Copy link
Collaborator

@noviluni noviluni Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if language == "de":
item = item.replace('am', '')
if language == "de":
# Replace "am" because "am 8" means "on the 8th" but "8 am" actually
# means "8 am"
item = item.replace('am', '')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm... Maybe we can use regex (re.sub) to remove it only when preceding a number? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @noviluni, I have made some can you please check and recommend

tests/test_search.py Outdated Show resolved Hide resolved
Comment on lines +470 to +473
('June 5 am utc', datetime.datetime(2023, 6, datetime.datetime.utcnow().day, 5, 0, tzinfo=pytz.utc)),
('June 23th 5 pm EST', datetime.datetime(2023, 6, 23, 17, 0, tzinfo=pytz.timezone("EST"))),
('May 31', datetime.datetime(2023, 5, 31, 0, 0)),
('8am UTC', datetime.datetime(2023, 8, 31, 0, 0, tzinfo=pytz.utc))]),
('8am UTC', datetime.datetime(2023, 5, 31, 8, 0, tzinfo=pytz.utc))]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the parsing of 8am UTC is a change for the better, but I’m not sure about June 5 am utc. The previous interpretation seems OK to me, “June 5th in the morning UTC”. In fact, I think it may even be better, because I suspect leaving a unit missing in the middle (i.e. having month and hour, but missing day) may be less likely than the other interpretation (month and day, am meaning sometime in the morning).

I checked and that test string was added as part of a performance improvement, #881, and I imagine it does not come from an actual case found in the internet, which makes it hard to argue either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right it “June 5th in the morning UTC”. is a better interpretation. I am fixing that but I think it's beyond the scope of #894.

So should I open a new issue or maybe directly create a new pull request or fix it in the current PR but that's beyond not directly related to issue #894.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess you are right that fixing it is out of the scope of this change and should be a separate issue. It was just bad luck that your change broke it, or rather it was just dumb luck that it was working in the first place.

However, what I would do here is move that scenario into a separate test, with the previous value, and mark the test as an expected failure (xfail), to make it clear that the expected outcome is still the old one, only that Dateparser does not support it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great after this PR is merged I will create an issue and create a supporting PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t forget the xfail part, though, I do think that needs to be addressed in this pull request, instead of changing the test expectations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on it now and by the time you create separate test, with the previous value, and mark the test as an expected failure (xfail) it should be done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the time you create separate test

I was hoping for you to do that, as it needs to be done as part of this pull request (which is the one that causes the test to fail).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it is broken throughout search_dates and parse function as with other cases like

June 5th pm EST ---> [('June 5th pm EST', datetime.datetime(2021, 6, 24, 17, 0, tzinfo=<StaticTzInfo 'EST'>))]

or

June 5th pm EST ---> ('June 9 pm', datetime.datetime(2021, 6, 24, 21, 0))]

Which is incorrect. I am working on the patch but it will require some time and significant change in other parts of the library and many tests will be broken because they are also incorrect. Or the current interpretation is correct.

@gavishpoddar
Copy link
Contributor Author

Thanks, for your support and suggestions these improvements are now made on PR #945.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

search_dates() wrong result for 12:xx am
3 participants