JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

Warning

Notice of Archival: In an effort to streamline TPU inference efforts in open source, we have migrated core functionality in Jetstream to the new tpu-inference repository. For this reason, we will be archiving Jetstream on February 1st 2026. Please note, archival does not mean deletion! Users will still be able to fork and clone Jetstream, we are simply shifting the repository to "read-only". To get Jetstream features and so much more, please check out tpu.vllm.ai.

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

JetStream Engine Implementation

Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

make install-deps

Run local server & Testing

Use the following commands to run a server locally:

# Start a server
python -m jetstream.core.implementations.mock.server

# Test local mock server
python -m jetstream.tools.requester

# Load test local mock server
python -m jetstream.tools.load_tester

Test core modules

# Test JetStream core orchestrator
python -m unittest -v jetstream.tests.core.test_orchestrator

# Test JetStream core server library
python -m unittest -v jetstream.tests.core.test_server

# Test JetStream lora adapter tensorstore
python -m unittest -v jetstream.tests.core.lora.test_adapter_tensorstore

# Test mock JetStream engine implementation
python -m unittest -v jetstream.tests.engine.test_mock_engine

# Test mock JetStream token utils
python -m unittest -v jetstream.tests.engine.test_token_utils
python -m unittest -v jetstream.tests.engine.test_utils

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
.github		.github
benchmarks		benchmarks
docs		docs
experimental		experimental
jetstream		jetstream
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS		AUTHORS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
license_preamble.txt		license_preamble.txt
pylintrc		pylintrc
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream Engine Implementation

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

Run local server & Testing

Test core modules

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 38

Uh oh!

Languages

License

AI-Hypercomputer/JetStream

Folders and files

Latest commit

History

Repository files navigation

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.

About

JetStream Engine Implementation

Jax

Pytorch

Documentation

JetStream Standalone Local Setup

Getting Started

Setup

Run local server & Testing

Test core modules

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 38

Uh oh!

Languages

Packages