-
-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed uuidtools #344
base: master
Are you sure you want to change the base?
Removed uuidtools #344
Conversation
638b6fb
to
2abbc19
Compare
2abbc19
to
109ecb1
Compare
@RST-J Do you have any thoughts about this? It will allow us to remove another dependency (although this one is included inside json-schema itself) |
@@ -54,6 +55,11 @@ def self.file_uri(uri) | |||
Addressable::URI.convert_path(parsed_uri.path) | |||
end | |||
|
|||
def self.fake_uri(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to name this hash_uri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it should rather have hash in the name than fake. But actually I would not call it hash_uri
because the method itself does not hash URIs but is used to hash a schema. So maybe hash_schema
would be a better name. But now it actually does not care about whether the input is a schema or not, it takes a string and returns a URI created from the hash of that string. So maybe uri_from_sha1
oder uri_from_sha1_hash
or something like that would be more precise. But also more verbose. I'm personally getting more and more a fan of being as precise as possible with naming methods as it eases reasoning about code (we likely have some worse candidates here). What do you think?
Removing the dependency is a good idea I think, we didn't use anything else from that and your replacement fits the purpose.
As I dont't know why and there is no reason given, I'm not sure whether this performance improvement is worthwhile or not. What do you know and think about that? |
I'm actually starting to have doubts about this now. Perhaps we should be using String.hash, which would be an order of magnitude faster. So far as I'm aware String.hash should realistically not have collisions (although I'm struggling to find good evidence of that) Also, it might be better to generate not just a hex string but a uri with a fake scheme that is immediately identifiable as an internally calculated value (something like "ref://af562bbc" or "internal://539daf54dd"?) |
Incidentally, @mjc did suggest using String.hash in a previous pull request (any credit for the idea should go to him!) |
I've now changed my mind entirely - I think we should simply use String.hash.to_s(16) (with an appropriate scheme to make it look like a URI). String.hash in MRI ruby uses siphash-2-4, which should be good enough for our needs and much faster than SHA1 |
I think, using a custom scheme is a good idea. It prevents accidental collisions, no matter how improbable they are. |
Right now, we cache schemas, and the key we use to cache them is either the URL we got them from, or if there's no URL, we generate a UUID from the json text of the schema. We're basically using a v5 UUID as a hashing function. What's more, json-schema includes it's own UUID library for doing this.
I've swapped that out, for generating a SHA1 hash instead. It should be fast, although in my own benchmarks, the speed is comparable to the current implementation (real world speed may depend on the size of the schemas being cached). At the very least, I believe it's clearer to use SHA1 (rather than UUID) and it also means we don't need an external ruby file copied into the project.