fastaccess

Efficient random access to subsequences in FASTA files using byte-level seeking.

Installation

pip install fastaccess

From source (includes C++ backend for better performance):

pip install -e .

The C++ backend requires a C++17 compiler and CMake 3.15+. If unavailable, falls back to pure Python.

Quick Start

from fastaccess import FastaStore

fa = FastaStore("genome.fa")  # Builds index, caches for next time
seq = fa.fetch("chr1", 1000, 2000)  # 1-based inclusive coordinates

API

`FastaStore(path, use_cache=True, cache_dir=None)`

path: Path to FASTA file (plain or gzip-compressed .fa.gz)
use_cache: Save/load index from .fidx cache file
cache_dir: Custom directory for cache file (useful for read-only FASTA directories)

Methods

Method	Description
`fetch(name, start, stop, reverse_complement=False)`	Fetch subsequence (1-based inclusive)
`fetch_many(queries)`	Batch fetch list of `(name, start, stop)` tuples
`list_sequences()`	Get all sequence names
`get_length(name)`	Get sequence length
`get_description(name)`	Get FASTA header description
`get_info(name)`	Get dict with `name`, `description`, `length`
`rebuild_index()`	Force rebuild index and update cache
`is_cached()`	Check if loaded from cache
`cache_exists()`	Check if cache file exists
`get_cache_path()`	Get cache file path
`delete_cache()`	Delete cache file

Errors

KeyError: Sequence name not found
ValueError: Invalid coordinates (start < 1, stop < start, stop > length)

Features

Random access: Uses seek() to fetch only required bytes
Index caching: 7-40x faster reloading via .fidx cache files
Gzip support: Reads .fa.gz files directly
1-based inclusive coordinates: Standard bioinformatics convention
Format support: Wrapped/unwrapped sequences, Unix/Windows line endings
Uppercase output: All sequences returned uppercase

Performance

C++ Backend

Operation	Python	C++	Speedup
Index build (10MB)	70 ms	5 ms	13x
Reverse complement (8 KB)	0.21 ms	0.015 ms	14x
Small fetch (100 bp)	0.017 ms	0.017 ms	1x
Large fetch (100 KB)	0.36 ms	0.35 ms	1x

Check if C++ backend is active:

from fastaccess import using_cpp_backend
print(using_cpp_backend())  # True if available

Index Caching

Human genome (3 GB):
  First load:  ~2 seconds (builds index)
  With cache:  0.05 seconds (40x faster)

Cache is automatically invalidated when the FASTA file changes.

Example

from fastaccess import FastaStore

fa = FastaStore("hg38.fa")

# Get sequence info
print(fa.list_sequences())  # ["chr1", "chr2", ...]
print(fa.get_length("chr1"))  # 248956422

# Fetch regions
seq = fa.fetch("chr1", 1000, 2000)
rc = fa.fetch("chr1", 1000, 2000, reverse_complement=True)

# Batch fetch
regions = [("chr1", 1, 100), ("chr2", 500, 600)]
sequences = fa.fetch_many(regions)

Requirements

Python 3.8+
No runtime dependencies (pure Python fallback always works)

C++ backend (optional):

C++17 compiler
CMake 3.15+

Limitations

ASCII sequences only (DNA/RNA)
Gzip files require full decompression (no random access within compressed data)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
fastaccess		fastaccess
src/fastaccess_cpp		src/fastaccess_cpp
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
README.md		README.md
benchmark.py		benchmark.py
example.py		example.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastaccess

Installation

Quick Start

API

`FastaStore(path, use_cache=True, cache_dir=None)`

Methods

Errors

Features

Performance

C++ Backend

Index Caching

Example

Requirements

Limitations

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

nuniz/FASTAccess

Folders and files

Latest commit

History

Repository files navigation

fastaccess

Installation

Quick Start

API

FastaStore(path, use_cache=True, cache_dir=None)

Methods

Errors

Features

Performance

C++ Backend

Index Caching

Example

Requirements

Limitations

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

`FastaStore(path, use_cache=True, cache_dir=None)`

Packages