You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The data size was very useful for cost based optimizer in compute engine like spark.
But now the lance has to open LanceFileReader to get the data size for one fragment. If we use this sdk, it will have a lot of io requests for a large lance dataset.
Is there an other way to get the data size (approximate data size was also fine) like the count_rows function?
The text was updated successfully, but these errors were encountered:
Spark needs the Statistics from this interface.
The Statistics has three interfaces, sizeInBytes and numRows are useful for simple optimizer rules. The columnStats is also useful for CBO rules but it has lots of column statistics. I'm not sure the storage statistics can provide them all
The data size was very useful for cost based optimizer in compute engine like spark.
But now the lance has to open
LanceFileReader
to get the data size for one fragment. If we use this sdk, it will have a lot of io requests for a large lance dataset.Is there an other way to get the data size (approximate data size was also fine) like the
count_rows
function?The text was updated successfully, but these errors were encountered: