Skip to content

Improved performance for MinHash and MinHashLSH

Compare
Choose a tag to compare
@ekzhu ekzhu released this 15 Dec 20:57
· 46 commits to master since this release
  • Performance improvement for MinHash's update method.
  • Make MinHash updates 4.5X faster by using update_batch method for bulk update on MinHash. [See API doc].(http://ekzhu.com/datasketch/documentation.html#datasketch.MinHash.update_batch)
  • Further performance gain by using bulk generation of MinHash using MinHash.bulk or MinHash.generator. See API doc and pull request.
  • Optional compression for MinHash LSH index by hashing the bucket key produced by MinHashLSH._H. See pull request. This leads to saving of memory/storage space used by the index.

Thank you @Sinusoidal36!