-
-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Addressable gem to handle URIs #174
Conversation
@@ -3,6 +3,7 @@ source "https://rubygems.org" | |||
gemspec | |||
|
|||
gem "json", :platforms => :mri_18 | |||
gem "addressable", '~> 2.3' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add this to the gemspec instead, so that the dependency is picked up when users gem install json-schema
.
Confirmed that this fixes #100 (or, at least, I can run all tests from a dir I'm not sure if the For escaping URIs, addressable happily parses them with spaces, but require 'addressable/uri'
require 'open-uri'
unescaped = Addressable::URI.parse('http://foo.com/spaces in/the path#/not/fragment')
escaped = Addressable::URI.parse(Addressable::URI.escape('http://foo.com/spaces in/the path#/not/fragment'))
open(unescaped) #=> URI::InvalidURIError
open(escaped) #=> OpenURI::HTTPError: 404 Not Found |
For the For the chomp thing I'll check and remove if its superfluous now. |
Loading local filesIn the last commit I tried to forbid using local file system references for security reasons. As @pd already said, this would break backwards compatibility (a lot of the reference tests actually rely on loading files from the disk). I'm not sure how to interpret section 5.27. id from draft-03 in the context of a root schema.
mean for e.g. this schema, what is the id/URI for the following schema: {
"$schema" => "http://json-schema.org/draft-03/schema#",
"type" => "object",
"properties" => {
"b" => {
"required" => true
}
}
} Currently, the best I can think of to not lose backwards compatibility, would be an option that allows loading relative references from disk. This option unfortunately needs to default to true then. |
I don't think this means we need to hold off on addressable until a 3.x release; it definitely solves problems today, means we could drop the URI::File code, etc. We just shouldn't break filesystem lookups (yet!). The issue with absolute paths already exists on master: If you force pushed this branch without 463f0e3, I think it's pretty much good to go. A few days back I left a comment in #152 with my ideas for how to deal with preventing arbitrary filesystem lookups, accessing remote URIs without meaning to, etc etc. I haven't had a chance to spike a version of that yet, but I think we can get it into 2.x, then swap to the more restrictive default in 3.x. |
@@ -7,8 +7,8 @@ def self.validate(current_schema, data, fragments, processor, validator, options | |||
if data.is_a?(String) | |||
error_message = "The property '#{build_fragment(fragments)}' must be a valid URI" | |||
begin | |||
URI.parse(URI.escape(data)) | |||
rescue URI::InvalidURIError | |||
Addressable::URI.parse(Addressable::URI.escape(data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you already explained this in the PR description (I didn't quite follow everything) but do we still need to escape the uri here now we're using addressable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the addressable readme, I suspect we need to parse urls from strings, and then normalise them (probably immediately before calling open on them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first comment: We "need" the escape call here, because only 'parse' won't raise the expected expception for line 278 in draft4-tests. I'd also suspect it to not be actually necessary.
Addressable::URI.parse(data)
has the following output
(byebug) Addressable::URI.parse(data)
Addressable::URI::InvalidURIError Exception: Cannot assemble URI string with ambiguous path: '::boo'
(byebug) Addressable::URI.parse(data).class
Addressable::URI
(byebug) Addressable::URI.parse(data).to_s
Addressable::URI::InvalidURIError Exception: Cannot assemble URI string with ambiguous path: '::boo'
nil
You see that there is an exception somewhere, at least the text, but its not thrown unless e.g. to_s
is called. I should probably file an issue on this as I'd expect an exception to be thrown by parse right away.
I'll go for to_s
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the second: It depends I guess. File uris are a problem in general so far. open does not like the file:// scheme nor escaped paths. I agree for the parts that are actual URIs but we don't now that before e.g. nested refs have been resolved against their parent scheme's URI.
So I wouldn't do much about this until we decided how to handle files.
uri = URI.parse("file://#{Dir.pwd}/#{data}") | ||
end | ||
uri = Addressable::URI.parse(data) | ||
# Check for absolute path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressable has a convert_path method that converts relative paths to file URIs :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't help us much, because open can't handle them. However, when we have a custom loader, I think its good when the scheme is properly set.
I wonder why there is no FileURI neither in the URI nor in Addressable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh what? That was half of my reasoning for using addressable. My bad. Then perhaps the URI::File class should be kept after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I side-stepped this in #175 by avoiding the open(foo).read
pattern for URIs with nil
or file
schemes. We shouldn't be using that on files anyhow, since it will leave dangling file descriptors open. eg:
$ pry
[1] pry(main)> $PROCESS_ID
=> 55886
[2] pry(main)> open('/etc/passwd').read
=> [lol snip]
# From another terminal, or after ctrl-z, whatever:
$ lsof -p 55886
ruby 55886 pd 8r REG 1,3 5253 515836 /private/etc/passwd
Definitely absurd that open(Addressable::URI.convert_path('/etc/passwd'))
fails yet open(Addressable::URI.parse('/etc/passwd'))
works fine, though =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also think we should keep URI::File.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What worries me, is that if I google other implementations of URI::File, I can't find any. In fact, only this gem comes up, which makes me wonder why nobody else is doing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a feature reques from a year ago and one comment mentions the URI::File solution.
I also wonder why this topic seems to have few/no attention, but I think URI::File is a better solution than writing a custom wrapper around open (i.e. its the better wrapper).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree; URI::File is the wrong direction to go in. We only use it so that we can keep using open-uri, but we shouldn't be using open-uri to open local files -- there are many more obvious means of doing that (eg, File.read, Pathname#read, etc etc).
We can assign schema URIs as file:///tmp/ohai.json
, but when we read that file, it should be using File.read(uri.path)
or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So going back to my original question (if we're not using file uris) should we use convert path here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge this; my Loader PR has additional changes to more aggressively detect files vs remote URIs, so I think it can be taken care of there.
@@ -0,0 +1,3 @@ | |||
{ | |||
"name" : "john" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No tests actually hit this file yet, unless I'm overlooking something.
I updated the PR to work along with #177 |
If we can simplify the code, that would be great. Maybe try removing any workarounds and then see if anyone can test this branch on windows?= |
I can test this on a windows box some time this weekend. |
Aforementioned windows box apparently runs ubuntu now so uhhh, scratch that plan. I have no further access to a windows machine to try this out on. =\ |
Any more concerns here? |
rescue URI::InvalidURIError | ||
error_message = "The property '#{build_fragment(fragments)}' must be a valid URI" | ||
# TODO | ||
# Addressable only throws an exception on to_s for invalid URI strings, although it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to fix this todo before merging? If not could the comment link to a github issue in addressable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any upstream reports of this. I'm not sure exactly what to report if I were to open an issue. It's partly an artifact of our test data:
Addressable::URI.parse('http:')
#=> raises Addressable::URI::InvalidURIError
Addressable::URI.parse('::boo')
#=> returns a URI
Addressable::URI.parse('::boo').to_s
#=> raises Addressable::URI::InvalidURIError: Cannot assemble URI string with ambiguous path: '::boo'
addressable has a lot of tests that assert it can raise on .parse()
alone, and indeed in most cases it does. I can't look up a URI RFC atm, but is ::boo
somehow valid? Is it ambiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using addressable 2.3.6 I get the following results:
a = Addressable::URI.parse('http:')
# raises Addressable::URI::InvalidURIError: Absolute URI missing hierarchical segment: 'http:'
# a == nil
b = Addressable::URI.parse('::boo')
# raises nothing but returns Addressable::URI::InvalidURIError
# hence, b holds a Addressable::URI::InvalidURIError
I'd expect the second statement to also raise an exception but in no case to return one.
I'll go open an issue in addressable (sporkmonger/addressable#177).
So for now I think we should leave this as is.
I'd like to see this merged, once all concerns from the comments are sorted out. I think that if we can 1. Get rid of some hacky fixes we have for unusual urls (because of limitations in URI), and 2. Get better url parsing, then this is the way to go |
@RST-J -- you'll need to rebase to deal with merge conflicts. Just got ruby+git+etc booted on a windows VM and the test suite is green, including from dirs with spaces in the path. Really wish we had some way to run tests on windows that wasn't so insane =) |
To ensure that invalid uris of the form "::http" throw exceptions rather than failing silently. This aught to make uri parsing more robust See sporkmonger/addressable#177 and voxpupuli#174
To ensure that invalid uris of the form "::http" throw exceptions rather than failing silently. This aught to make uri parsing more robust See sporkmonger/addressable#177 and voxpupuli#174
This would cover #173
It adds support for Asian URLs.
When I changed the validation for the URI format to Addressable, I thought that the value should not be escaped (it was introduced to establish compatibility with Asian URIs by eca92af). The value either is parsable as URI as is or not and in the latter case no valid URI.
What do you think?