You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SOMASparseNdArray.read with result_order="row-major" is unexpectedly slow -- it is roughly 2X slower than calling read() (without sort), and then using PyArrow's sort_by method to perform the sort.
I would naively expect the TileDB-SOMA implementation to be faster as it is multi-threaded (Arrow is a single-threaded sort), or at worst they would be similar in speed.
Example, running on an EC2 instance in the same region as the S3 bucket:
the first two (12 and 13) are unsorted read, folllowed by Arrow Table sort - approx 2:40
the latter two (14 and 15) are read(result_order='row-major') - approx 5:00
Versions (please complete the following information):
tiledbsoma.__version__ 1.11.4
TileDB-Py version 0.29.0
TileDB core version (tiledb) 2.23.0
TileDB core version (libtiledbsoma) 2.23.0
python version 3.11.9.final.0
OS version Linux 6.8.0-1009-aws
The text was updated successfully, but these errors were encountered:
johnkerl
changed the title
[Bug][Python] sparse array read with result_order is slow
[Bug][Python] Sparse array read with result_order is slow
Jul 2, 2024
johnkerl
changed the title
[Bug][Python] Sparse array read with result_order is slow
[Bug][Python] Sparse array read with result_order is slow
Jul 2, 2024
The SOMASparseNdArray.read with
result_order="row-major"
is unexpectedly slow -- it is roughly 2X slower than callingread()
(without sort), and then using PyArrow'ssort_by
method to perform the sort.I would naively expect the TileDB-SOMA implementation to be faster as it is multi-threaded (Arrow is a single-threaded sort), or at worst they would be similar in speed.
Example, running on an EC2 instance in the same region as the S3 bucket:
read(result_order='row-major')
- approx 5:00Versions (please complete the following information):
The text was updated successfully, but these errors were encountered: