Autocomp: AI Code Optimizer for Tensor Accelerators

Autocomp: AI Code Optimizer for Tensor Accelerators

Welcome to the code repository of Autocomp. Check out our introductory 📝 blog post!

Update (9/22/2025): Added code/documentation for setting up CUDA/KernelBench backend, plus code for RVV optimization. Check out 📝 blog post 2 for more details.

Update (11/3/2025): Added code/documentation for setting up Trainium backend. Check out 📝 blog post 3 for more details.

Update (11/18/2025): Added documentation for adding a new backend (ADDING_A_BACKEND.md), added the examples directory for example optimization traces, and published 📝 blog post 4 about how we optimized conv1d on Trainium.

📚 Paper: Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators

✏️ Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, and Yakun Sophia Shao (UC Berkeley)

What is Autocomp?

Autocomp is an LLM-driven code optimizer for tensor accelerators. Autocomp is designed to be portable and easy to use across a variety of hardware backends, and has already demonstrated strong performance across Gemmini, AWS Trainium, RVV, and CUDA.

How does Autocomp work?

Autocomp decomposes the optimization problem into a beam search, where each iteration is further divided into a planning phase and an implementation phase. Autocomp applies the user's domain knowledge, along with a variety of techniques to successfully explore the search space, in order to iteratively improve the code. For more details, see our paper.

⚙️ Setup

Backend Setup

Currently supported backends:

Gemmini (gemmini_setup.md)
Trainium (trn_setup.md)
CUDA via KernelBench (kb_setup.md)

Partially supported backends:

RISC-V Vector (RVV) on Canaan Kendryte K230. See k230 branch for code. As the implementation is very hacky, we do not currently recommend using this backend.

For instructions on adding a new backend, see ADDING_A_BACKEND.md.

LLM Endpoint Setup

Depending on the specific models you want to use, you will need to define the appropriate environment variables (e.g., OPENAI_API_KEY), or create the file autocomp/common/openai_key.py (or anthropic_key.py, gemini_key.py, together_key.py). The file should define the variable key as follows:

key = "YOUR_OPENAI_API_KEY"

🚀 Usage

autocomp/search/search.py is the entry point for running Autocomp optimization. Various parameters such as backend, models used, beam size, number of plans, number of code implementations, dropout, etc. can be configured here.

Notable parameters:

backend: The hardware backend to use. Currently supported backends are gemmini, trn, and cuda.
models: The list of models to use. For example, o3-mini, gpt-4o. A variety of endpoints (OpenAI, Anthropic, Gemini, Together) are supported but routing is somewhat hacky; see autocomp/common/llm_utils.py.
simulator: The evaluation method to use.
- For Gemmini, spike (only optimizes instruction counts, not cycle counts) or firesim
- For Trainium, trn
- For CUDA, kernelbench
iterations: The number of iterations to run.
search_strategy: The search strategy to use. Currently only beam is supported.
prob_type: The problem type to use.
- For Gemmini, gemm, conv, or admm-multifunction.
- For Trainium, trn-tutorial or trn-advanced.
- For CUDA, kb-level1, kb-level2, kb-level3, or kb-level4.
prob_id: The problem ID to use.

📁 Repository Structure

autocomp/ - Core Autocomp code.

search/ - Core search and optimization infrastructure
- search.py - Main search algorithm implementation. Implements the beam search described in the paper. Change search parameters within this file.
- llm_agent.py - LLM agents for planning and code optimization. Implements the two prompt phases described in the paper. The optimization menu is defined within this file.
- llm_ensemble.py - Wrapper around LLM agents that enables calls to be split between multiple agents.
- prob.py - Wrapper for tests (parsed from the tests/ directory) that edits the test file and appends LLM-generated code in order to test it.
- code_repo.py - Abstraction for managing code candidates generated during optimization.
backend/ - Hardware evaluation utilities for different backends.
- hardware_backend.py - Base class for hardware backends.
- gemmini_eval.py - Hardware evaluation utilities for Gemmini. Must configure paths to Chipyard/FireSim/Gemmini here.
- trn_eval.py - Hardware evaluation utilities for Trainium.
- kb_eval.py - Hardware evaluation utilities for KernelBench. Must configure path to KernelBench here.
common/ - Shared utilities and helper functions
- llm_utils.py - LLM interaction utilities. Works with OpenAI, Anthropic, Gemini, Together. Implements parallel calls for OpenAI and Together.
- my_logging.py - Custom logging functionality.
- utils.py - General utility functions.

prompts/ - Contains various prompts imported by autocomp/search/llm_agent.py.

gemmini/ - Prompts and examples used for Gemmini code optimization
- isa_prompt_conv.py - Accelerator ISA section of the prompt, used for GEMM and convolution.
- isa_prompt_admm.py - Accelerator ISA section of the prompt, used for TinyMPC.
- gemmini_rules.py - Rules section of the prompt (helps constrain output and encourage functional correctness).
- plan_prompt.py - Planning phase prompt (note that implementation prompt is entirely contained within autocomp/search/llm_agent.py above).
- tiling_example.py - Tiling optimization example.
- if_example.py - Conditional optimization example (from convolution).
- if_example_matmul.py - Conditional optimization example (from GEMM).
trn/ - Prompts and examples used for NKI (Trainium) optimization
- nki_isa_generator.py - Generates the ISA string for the NKI ISA. If optimizing a new workload, configure the set of instructions to use here.

sols/ - Contains baseline code for the benchmarks in the paper.

exo/ - Exo unoptimized and optimized baseline code for the GEMM benchmarks in the paper. sol{id}_exo_baseline.c is the unoptimized code and is used by autocomp/search/search.py as the starting code fro optimization.
gemm/ - Additional GEMM benchmarks used for schedule reuse. No hand-optimized code available.
exo-conv/ - Exo unoptimized and optimized baseline code for the convolution benchmarks in the paper.
admm-multifunction/ - TinyMPC unoptimized and optimized baseline code. Only problem IDs 1 and 2 are used in the paper. Run with FP32 4x4 Gemmini.
trn-tutorial/ - NKI (Trainium) unoptimized and optimized baseline code for the tutorial benchmarks in the paper.
trn-advanced/ - NKI (Trainium) unoptimized and optimized baseline code for the advanced benchmarks in the paper.

tests/ - Contains test cases corresponding to sols/ above.

examples/ - Contains examples of code optimized by Autocomp. Note that the generated code is specific to the input/output shapes used and may not be correct for other shapes.

📜 Citation

@misc{hong2025autocomp,
      title={Autocomp: LLM-Driven Code Optimization for Tensor Accelerators}, 
      author={Charles Hong and Sahil Bhatia and Alvin Cheung and Yakun Sophia Shao},
      year={2025},
      eprint={2505.18574},
      archivePrefix={arXiv},
      primaryClass={cs.PL},
      url={https://arxiv.org/abs/2505.18574}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
autocomp		autocomp
examples		examples
img		img
prompts		prompts
sols		sols
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Autocomp: AI Code Optimizer for Tensor Accelerators

What is Autocomp?

How does Autocomp work?

⚙️ Setup

Backend Setup

LLM Endpoint Setup

🚀 Usage

📁 Repository Structure

📜 Citation

About

Uh oh!

Releases

Packages

Languages

License

ucb-bar/autocomp

Folders and files

Latest commit

History

Repository files navigation

Autocomp: AI Code Optimizer for Tensor Accelerators

What is Autocomp?

How does Autocomp work?

⚙️ Setup

Backend Setup

LLM Endpoint Setup

🚀 Usage

📁 Repository Structure

📜 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages