Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urls with spaces unescaped #58

Open
ghost opened this issue Jul 7, 2016 · 1 comment
Open

urls with spaces unescaped #58

ghost opened this issue Jul 7, 2016 · 1 comment

Comments

@ghost
Copy link

ghost commented Jul 7, 2016

With a badly configured redirect it's possible to arrive at a url with unescaped spaces in the name; eg:

uk,nhs,wales)/sites3/docopen.cfm?637545f2-1143-e756-5c8403609089cb40&id=18400&orgid=268 20090729144455 http://www.wales.nhs.uk/sites3/docopen.cfm?orgid=268&ID=18400&637545F2-1143-E756-5C8403609089CB40 text/html 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ C:\Documents and Settings\All Users\Documents\My Pictures\Sample Pictures\CLHG building.JPG - 349 21033214 EA-TNA0709.www.nhs.uk-20090729135204-01475.arc.gz

Clearly this has gone completely wrong and the underlying record is unusable, but the fault in this record also prevents parsing of the CDX file. In this situation the CDX generation code might be better checking for and escaping spaces in the redirect url, while emitting a warning that the record is broken.

@ghost
Copy link
Author

ghost commented Jul 7, 2016

Just realised that this is a particular case of issue #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants