Data loader benchmarks for scRNA-seq counts et al.

A collaboration between scverse, Lamin, and anyone interested in contributing!

This repository contains benchmarking scripts & utilities for scRNA-seq data loaders and allows to collaboratively contribute new benchmarking results.

Quickstart

Setup:

git clone https://github.com/laminlabs/arrayloader-benchmarks
cd arrayloader-benchmarks
uv pip install -e ".[scdataset,annbatch]"  # provide tools you'd like to install
lamin connect laminlabs/arrayloader-benchmarks  # to contribute results to the hosted lamindb instance, call `lamin init` to create a new lamindb instance

Typical calls of the main benchmarking script are:

cd scripts
python run_loading_benchmark_on_collection.py annbatch   # run annbatch on collection Tahoe100M_tiny, n_datasets = 1
python run_loading_benchmark_on_collection.py MappedCollection   # run MappedCollection
python run_loading_benchmark_on_collection.py scDataset   # run scDataset
python run_loading_benchmark_on_collection.py annbatch --n_datasets -1  # run against all datasets in the collection
python run_loading_benchmark_on_collection.py annbatch --collection Tahoe100M --n_datasets -1  # run against the full 100M cells
python run_loading_benchmark_on_collection.py annbatch --collection Tahoe100M --n_datasets 1  # run against the the first dataset, 2M cells
python run_loading_benchmark_on_collection.py annbatch --collection Tahoe100M --n_datasets 5  # run against the the first dataset, 10M cells

You can choose between different benchmarking dataset collections.

When running the script, parameters and results are automatically tracked in a parquet file, along with source code, run environment, and input and output datasets.

Note: A previous version of this repo contained the benchmarking scripts accompanying the 2024 blog post: lamin.ai/blog/arrayloader-benchmarks.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
archive		archive
scripts		scripts
src/arrayloader_benchmarks		src/arrayloader_benchmarks
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
hatch.toml		hatch.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data loader benchmarks for scRNA-seq counts et al.

Quickstart

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

laminlabs/arrayloader-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Data loader benchmarks for scRNA-seq counts et al.

Quickstart

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages