Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve URL boundary regarding commas (Old name: Wrong URL Pattern being Parsed) #261

Open
rahulv017 opened this issue Aug 18, 2022 · 4 comments · May be fixed by #278
Open

Improve URL boundary regarding commas (Old name: Wrong URL Pattern being Parsed) #261

rahulv017 opened this issue Aug 18, 2022 · 4 comments · May be fixed by #278
Labels
bug Something isn't working priority: 1 (high) time est: 10 hours We estimate this issue will take ≈10 hours to complete

Comments

@rahulv017
Copy link

Input : 'www.safenet.com/abc,False,False'
Output: urls:['www.safenet.com/abc,False,False']
Expected Result : urls: ['www.safenet.com/abc']

@fhightower
Copy link
Owner

Thanks for reporting! I'll get this fixed. I think this will be resolved as part of #254.

@fhightower fhightower added bug Something isn't working time est: 1 hour We estimate this issue will take ≈1 hour to complete priority: 1 (high) labels Aug 18, 2022
@fhightower
Copy link
Owner

fhightower commented Aug 19, 2022

Thanks again for raising this issue. Just want to post an update on this work:

I'll have to give this more thought because this technically (per the RFC) is not a bug. Commas are allowed to be part of a URL path, so parsing www.safenet.com/abc,False,False as an entire URL is correct.

That being said, this library should absolutely handle this case better b/c commas are not frequently used in URLs and are frequently used as delimiters in plain text. At the moment, I am inclined to do something similar to what I've done for email addresses where we have complete_email_addresses to capture email addresses per the spec and simple_email_addresses to capture email addresses as most people expect. I will likely take the same approach with URLs, but need to give this some thought.

In the mean time, if you have a CSV file with a column containing indicators, it's probably best to only send data from that column into this library (if possible) or to replace commas w/ spaces.

@fhightower fhightower changed the title Wrong URL Pattern being Parsed Improve URL boundary regarding commas (Old name: Wrong URL Pattern being Parsed) Aug 20, 2022
@fhightower
Copy link
Owner

I've renamed this issue to more generally reflect the root problem we need to solve.

@fhightower fhightower added time est: 10 hours We estimate this issue will take ≈10 hours to complete and removed time est: 1 hour We estimate this issue will take ≈1 hour to complete labels Dec 6, 2022
@fhightower fhightower pinned this issue Dec 12, 2022
@fhightower
Copy link
Owner

fhightower commented Dec 12, 2022

@rahulv017 - sorry for the delay on this, but I will focus on this ticket over the next week or so. I still need to give some thought to what the best approach is, but I think there is likely something we can do to improve this use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: 1 (high) time est: 10 hours We estimate this issue will take ≈10 hours to complete
Projects
None yet
2 participants