Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve URL boundary regarding quotation marks and parentheses #130

Closed
fhightower opened this issue May 20, 2021 · 4 comments · Fixed by #254
Closed

Improve URL boundary regarding quotation marks and parentheses #130

fhightower opened this issue May 20, 2021 · 4 comments · Fixed by #254
Assignees
Labels
bug Something isn't working priority: 1 (high) time est: 1 hour We estimate this issue will take ≈1 hour to complete

Comments

@fhightower
Copy link
Owner

The following tests were failing and I'd like to get them passing:

s = "DownloadString('https://example[.]com/rdp.ps1');g $I"
iocs = find_iocs(s)
assert iocs['urls'] == ['https://example.com/rdp.ps1']

s = 'DownloadString("https://example[.]com/rdp.ps1");g $I'
iocs = find_iocs(s)
assert iocs['urls'] == ['https://example.com/rdp.ps1']
@fhightower fhightower added this to To do in Grammar Improvements via automation May 20, 2021
@fhightower fhightower added the bug Something isn't working label May 20, 2021
@fhightower fhightower self-assigned this Feb 11, 2022
@fhightower
Copy link
Owner Author

fhightower commented Apr 14, 2022

Another example:

find_iocs('Foo https://citizenlab.ca/about/), bar')

finds https://citizenlab.ca/about/), as a URL when https://citizenlab.ca/about/ is expected

@fhightower fhightower mentioned this issue Apr 19, 2022
1 task
@FANGOD
Copy link

FANGOD commented May 19, 2022

I also have a problem.

A line of file:
url,https://groups.google.com/g/vfc9gs,Malicious Google Groups discussion
but got:
urls=["https://groups.google.com/g/vfc9gs,Malicious"]

@fhightower
Copy link
Owner Author

Thanks for reporting this, @FANGOD. I hope to have this fixed soon.

@fhightower fhightower added time est: 1 hour We estimate this issue will take ≈1 hour to complete priority: 1 (high) labels Jul 7, 2022
@fhightower fhightower changed the title Improve URL boundary Improve URL boundary regarding quotation marks and parentheses Aug 20, 2022
@fhightower
Copy link
Owner Author

This issue is really tracking two things and I want to split them up as one is relatively easy to solve and the second requires some thought.

I've renamed this issue to focus on problems with the URL boundary regarding single quotation marks and parenthesis. For example, given Foo 'https://citizenlab.ca/about/'), bar, we expect to parse https://citizenlab.ca/about/ as the URL (not the originally parsed https://citizenlab.ca/about/')).

The other challenge with URL boundaries in the issue you reported @FANGOD about commas. This is tricky as noted here because commas are technically a valid part of a URL path. I'm going to track updates to the way commas are handled in #261 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: 1 (high) time est: 1 hour We estimate this issue will take ≈1 hour to complete
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants