Welcome to the code repository of Autocomp. Check out our introductory 📝 blog post!
Update (9/22/2025): Added code/documentation for setting up CUDA/KernelBench backend, plus code for RVV optimization. Check out 📝 blog post 2 for more details.
Update (11/3/2025): Added code/documentation for setting up Trainium backend. Check out 📝 blog post 3 for more details.
Update (11/18/2025): Added documentation for adding a new backend (ADDING_A_BACKEND.md), added the examples directory for example optimization traces, and published 📝 blog post 4 about how we optimized conv1d on Trainium.
📚 Paper: Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators
✏️ Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, and Yakun Sophia Shao (UC Berkeley)
Autocomp is an LLM-driven code optimizer for tensor accelerators. Autocomp is designed to be portable and easy to use across a variety of hardware backends, and has already demonstrated strong performance across Gemmini, AWS Trainium, RVV, and CUDA.
Autocomp decomposes the optimization problem into a beam search, where each iteration is further divided into a planning phase and an implementation phase. Autocomp applies the user's domain knowledge, along with a variety of techniques to successfully explore the search space, in order to iteratively improve the code. For more details, see our paper.
Currently supported backends:
- Gemmini (gemmini_setup.md)
- Trainium (trn_setup.md)
- CUDA via KernelBench (kb_setup.md)
Partially supported backends:
- RISC-V Vector (RVV) on Canaan Kendryte K230. See
k230branch for code. As the implementation is very hacky, we do not currently recommend using this backend.
For instructions on adding a new backend, see ADDING_A_BACKEND.md.
Depending on the specific models you want to use, you will need to define the appropriate environment variables (e.g., OPENAI_API_KEY), or create the file autocomp/common/openai_key.py (or anthropic_key.py, gemini_key.py, together_key.py). The file should define the variable key as follows:
key = "YOUR_OPENAI_API_KEY"autocomp/search/search.py is the entry point for running Autocomp optimization. Various parameters such as backend, models used, beam size, number of plans, number of code implementations, dropout, etc. can be configured here.
Notable parameters:
backend: The hardware backend to use. Currently supported backends aregemmini,trn, andcuda.models: The list of models to use. For example,o3-mini,gpt-4o. A variety of endpoints (OpenAI, Anthropic, Gemini, Together) are supported but routing is somewhat hacky; seeautocomp/common/llm_utils.py.simulator: The evaluation method to use.- For Gemmini,
spike(only optimizes instruction counts, not cycle counts) orfiresim - For Trainium,
trn - For CUDA,
kernelbench
- For Gemmini,
iterations: The number of iterations to run.search_strategy: The search strategy to use. Currently onlybeamis supported.prob_type: The problem type to use.- For Gemmini,
gemm,conv, oradmm-multifunction. - For Trainium,
trn-tutorialortrn-advanced. - For CUDA,
kb-level1,kb-level2,kb-level3, orkb-level4.
- For Gemmini,
prob_id: The problem ID to use.
autocomp/ - Core Autocomp code.
search/- Core search and optimization infrastructuresearch.py- Main search algorithm implementation. Implements the beam search described in the paper. Change search parameters within this file.llm_agent.py- LLM agents for planning and code optimization. Implements the two prompt phases described in the paper. The optimization menu is defined within this file.llm_ensemble.py- Wrapper around LLM agents that enables calls to be split between multiple agents.prob.py- Wrapper for tests (parsed from thetests/directory) that edits the test file and appends LLM-generated code in order to test it.code_repo.py- Abstraction for managing code candidates generated during optimization.
backend/- Hardware evaluation utilities for different backends.hardware_backend.py- Base class for hardware backends.gemmini_eval.py- Hardware evaluation utilities for Gemmini. Must configure paths to Chipyard/FireSim/Gemmini here.trn_eval.py- Hardware evaluation utilities for Trainium.kb_eval.py- Hardware evaluation utilities for KernelBench. Must configure path to KernelBench here.
common/- Shared utilities and helper functionsllm_utils.py- LLM interaction utilities. Works with OpenAI, Anthropic, Gemini, Together. Implements parallel calls for OpenAI and Together.my_logging.py- Custom logging functionality.utils.py- General utility functions.
prompts/ - Contains various prompts imported by autocomp/search/llm_agent.py.
gemmini/- Prompts and examples used for Gemmini code optimizationisa_prompt_conv.py- Accelerator ISA section of the prompt, used for GEMM and convolution.isa_prompt_admm.py- Accelerator ISA section of the prompt, used for TinyMPC.gemmini_rules.py- Rules section of the prompt (helps constrain output and encourage functional correctness).plan_prompt.py- Planning phase prompt (note that implementation prompt is entirely contained withinautocomp/search/llm_agent.pyabove).tiling_example.py- Tiling optimization example.if_example.py- Conditional optimization example (from convolution).if_example_matmul.py- Conditional optimization example (from GEMM).
trn/- Prompts and examples used for NKI (Trainium) optimizationnki_isa_generator.py- Generates the ISA string for the NKI ISA. If optimizing a new workload, configure the set of instructions to use here.
sols/ - Contains baseline code for the benchmarks in the paper.
exo/- Exo unoptimized and optimized baseline code for the GEMM benchmarks in the paper.sol{id}_exo_baseline.cis the unoptimized code and is used byautocomp/search/search.pyas the starting code fro optimization.gemm/- Additional GEMM benchmarks used for schedule reuse. No hand-optimized code available.exo-conv/- Exo unoptimized and optimized baseline code for the convolution benchmarks in the paper.admm-multifunction/- TinyMPC unoptimized and optimized baseline code. Only problem IDs 1 and 2 are used in the paper. Run with FP32 4x4 Gemmini.trn-tutorial/- NKI (Trainium) unoptimized and optimized baseline code for the tutorial benchmarks in the paper.trn-advanced/- NKI (Trainium) unoptimized and optimized baseline code for the advanced benchmarks in the paper.
tests/ - Contains test cases corresponding to sols/ above.
examples/ - Contains examples of code optimized by Autocomp. Note that the generated code is specific to the input/output shapes used and may not be correct for other shapes.
@misc{hong2025autocomp,
title={Autocomp: LLM-Driven Code Optimization for Tensor Accelerators},
author={Charles Hong and Sahil Bhatia and Alvin Cheung and Yakun Sophia Shao},
year={2025},
eprint={2505.18574},
archivePrefix={arXiv},
primaryClass={cs.PL},
url={https://arxiv.org/abs/2505.18574},
}

