Skip to content

Releases: ekzhu/datasketch

Fix bug in storage

21 Jul 07:19
Compare
Choose a tag to compare
Fix bug in storage Pre-release
Pre-release

Fix a bug with UnorderedStorage.get_many (#56)

Fix bug in LSH Forest for Weighted MinHash

21 Nov 18:35
Compare
Choose a tag to compare
  • Fix issue #35
  • Test cases for checking consistency of hash value length in LSH.

Optional redis storage requirement.

08 Nov 15:43
Compare
Choose a tag to compare
Pre-release

Redis storage layer for MinHash LSH

08 Sep 21:25
Compare
Choose a tag to compare
Pre-release
  • Introduced a Redis storage layer for MinHash LSH. Thanks to @ae-foster
  • Added __hash__ method for Lean MinHash.

LSH Ensemble

31 Mar 17:06
Compare
Choose a tag to compare
  • Added a slightly simplified version of LSH Ensemble that supports containment search with MinHash data sketches.
  • An introduction on containment link.
  • Update documentations

Consistent MinHash hash values across Python versions

26 Mar 17:32
Compare
Choose a tag to compare

MinHash now uses Numpy's random number generator instead of Python's built-in random. This makes MinHash generate consistent hash values across different Python versions.

The side-effect is that now MinHash created before version 1.1.3 won’t work (i.e., jaccard, merge and union) correctly with those created after.

Introduce Lean MinHash and better documentation

15 Mar 05:18
Compare
Choose a tag to compare
  • LeanMinHash is a subclass of MinHash. It uses less memory and allows faster (de)serialization. See documentation for details.
  • Removed serialize, deserialize, and bytesize methods from MinHash. These are supported in LeanMinHash instead.
  • Serialized MinHash objects before this version will not be deserialized properly. To migrate see here.
  • Documentation now have its own website!

First stable release

12 Feb 19:41
Compare
Choose a tag to compare

After nearly 2 years working on this project on-and-off, the API is now stable, and the features of MinHash-related sketches are completed.

I will continue to add more data sketches and indexes.

MinHash LSH Forest

07 Jan 09:03
Compare
Choose a tag to compare
MinHash LSH Forest Pre-release
Pre-release
  • MinHash LSH Forest implementation and benchmark using synthetic data
  • Improve existing MinHash LSH benchmark using synthetic data for more tunable data distributions
  • Improve MinHash and LSH performance

Windows compatibility

10 Jul 01:30
Compare
Choose a tag to compare
Windows compatibility Pre-release
Pre-release
  • Fixed Issue #4 - int overflow error on Windows platform
  • Use Python build-in random number generator for better MinHash accuracy