Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

Releases: microsoft/hyperspace

Hyperspace v0.4.0

29 Jan 19:03
Compare
Choose a tag to compare
Hyperspace v0.4.0 Pre-release
Pre-release

We are excited to announce the release of Hyperspace 0.4.0!

Notable new features / improvements:

  • Delta Lake support: Hyperspace v0.4.0 supports creating index on Delta Lake tables. Please refer to the user guide for more info.
  • Support for Databricks: #303 fixes the known issue when Hyperspace was run on Databricks. Hyperspace v0.4.0 can now run on Databricks Runtime 5.5 LTS & 6.4!
  • Globbing patterns for indexes: Globbing patterns can be used to specify a subset of source data to create/maintain index on. Please refer to the user guide on the usage.
  • Enhanced index statistics: A new API (hyperspace.index(indexName)) is introduced to get detailed index statistics such as the number of index files, index size in bytes, source update info, etc. Please check #286 for the sample output.
  • Hybrid Scan improvements: Hyperspace 0.4.0 brings in several improvements on Hybrid Scan such as a better mechanism to enable/disable the feature, rank algorithm improvements (#164), quick index refresh (#238), etc.
  • Pluggable source provider: This release introduces a (evolving) pluggable source provider API set so that different source formats can be plugged in. This enabled Delta Lake source to be plugged in, and there is on-going PR to support Iceberg tables (#320).
  • This release also includes various bug fixes / performance improvements. Please check here for the complete list of commits that went into the v0.4.0 release.

Breaking changes:

#268 fixes the issue where the signature calculation may produce a different result depending on the order of the input files. Thus, the indexes generated with v0.3.0 are not compatible with v0.4.0 and need to be reconstructed.

Thank you for trying it out and we look forward to your feedback!

Credits

Andrei Ionescu, Andrew Fogarty, Apoorve Dave, Eunjin Song, Gurleen Singh, Justin Breese, kaustubhkhare, Pouria Pirzadeh, Rahul Potharaju, Tarun Rajput, Terry Kim, Veysi Ertekin, Yash Datta

Hyperspace v0.3.0

18 Nov 00:15
Compare
Choose a tag to compare
Hyperspace v0.3.0 Pre-release
Pre-release

We are excited to announce the release of Hyperspace 0.3.0!

Notable new features / improvements:

  • Mutable dataset support: Hyperspace v0.3.0 supports mutable dataset where users can append or delete the source data.
    • Hybrid scan: Prior to v0.3.0, any change in the original dataset content required a full refresh to make the index usable again, which could be a costly operation. With the Hybrid scan, the existing index can be utilized along with newly appended and/or deleted source files, without explicit refresh operation. Please check out the doc on Hybrid Scan for more detail.
    • Incremental refresh: v0.3.0 introduces a "incremental" mode to refresh indexes. In this mode, index files are created only for the newly appended source files; deleted source files are also handled by removing them from the existing index files. Please check out the doc on Incremental Refresh for more detail.
  • Optimize index: The number of files for indexes can increase due to the incremental refreshes, degrading the performance. The new optimizeIndex API optimizes the existing indexes by merging index files to create an optimal number of files. Please check out the doc on Optimize Index for more detail.

Breaking changes:

In order to support features like Hybrid scan, incremental refresh, etc., the index metadata required unavoidable changes. Thus, the indexes generated with v0.2.0 are not compatible with v0.3.0 and need to be reconstructed.

Thank you for trying it out and we look forward to your feedback!

Credits
Andrew Fogarty, Apoorve Dave, Eunjin Song, Justin Breese, Pouria Pirzadeh, Rahul Potharaju, Tarun Rajput, Terry Kim, Veysi Ertekin, Yash Datta

Hyperspace v0.2.0

05 Aug 21:56
Compare
Choose a tag to compare
Hyperspace v0.2.0 Pre-release
Pre-release

We are excited to announce the release of Hyperspace 0.2.0!

Notable new features / improvements:

  • Python APIs (#36): You can now manage/utilize Hyperspace indexes using Python APIs. Please checkout the user guide to get started.
  • Support case-insensitive index column names (#78): The case sensitivity config in Spark (spark.sql.caseSensitive) is now respected when index column names are matched to find candidate indexes.
  • New signature provider(#77): The signature provider has been updated to compute the signature of an arbitrary logical plan (in addition to the existing file-based signature) to be more flexible (e.g, an upcoming feature where you can create an index on any dataframes)
  • FilterIndex rule improvement(#73): FilterIndex rule has been updated to match a logical plan where Project node is not present (e.g, SELECT * scenario).

Breaking changes:

In order to better support compatibility across Scala/Spark versions going forward, the team has decided to stop serializing logical plans with KyroSerializer and store the minimum info to reconstruct the original relation(#99). Thus, the indexes generated with v0.1.0 are not compatible with v0.2.0 and need to be reconstructed.

Thank you for trying it out and we look forward to your feedback!

Credits
Andrew Fogarty, Apoorve Dave, Eunjin Song, Justin Breese, Pouria Pirzadeh, Rahul Potharaju, Tarun Rajput, Terry Kim, Veysi Ertekin, Yash Datta

Hyperspace v0.1.0

22 Jun 03:25
Compare
Choose a tag to compare
Hyperspace v0.1.0 Pre-release
Pre-release

We are excited to announce the release of Hyperspace 0.1.0, the first preview release!

Please check out the user guide to get started.

Thank you for trying it out and we look forward to your feedback!

Credits
Apoorve Dave, Justin Breese, Pouria Pirzadeh, Rahul Potharaju, Tarun Rajput, Terry Kim