β οΈ Early Development: This project is currently in its early development phase and not accepting external architecture submissions yet. Star/watch the repository to be notified when we open for contributions.
π View Live Leaderboard & Results
AI-HEXAGON is an objective benchmarking framework designed to evaluate neural network architectures independently of natural language processing tasks. By isolating architectural capabilities from training techniques and datasets, it enables meaningful and efficient comparisons between different neural network designs.
Traditional neural network benchmarking often conflates architectural performance with training techniques and dataset biases. This makes it challenging to:
- Isolate true architectural capabilities
- Iterate quickly on design changes
- Compare models fairly
AI-HEXAGON solves these challenges by:
- π Pure Architecture Focus: Tests that evaluate only the architecture, removing confounding factors like tokenization and dataset-specific optimizations
- β‘ Rapid Iteration: Enable quick testing of architectural changes without large-scale training
- π οΈ Flexible Testing: Support both standard benchmarking and custom test suites
- π Pure Architecture Evaluation: Tests fundamental capabilities independently
- βοΈ Controlled Environment: Fixed parameter budget and raw numerical inputs
- π Clear Metrics: Six independently measured fundamental capabilities
- π Transparent Implementation: Clean, framework-agnostic code
- π€ Automated Testing: GitHub Actions for fair, manipulation-proof evaluation
- π Live Results: Real-time benchmarking results at ai-hexagon.dev
Each architecture is evaluated on six fundamental capabilities:
Metric | Description |
---|---|
π§ Memory Capacity | Store and recall information from training data |
π State Management | Maintain and manipulate internal hidden states |
π― Pattern Recognition | Recognize and extrapolate sequences |
π Position Processing | Handle positional information within sequences |
π Long-Range Dependency | Manage dependencies over long sequences |
π Length Generalization | Process sequences longer than training examples |
ai-hexagon/
βββ ai_hexagon/
β βββ modules/ # Common neural network modules
βββ results/ # Model implementations and results
βββ suite.json # Default test suite configuration
βββ transformer/
βββ model.py # Transformer implementation
βββ modules/ # Custom modules (if needed)
The default suite enforces a 4MB parameter limit for fair comparisons:
Precision | Parameter Limit |
---|---|
Complex64 | 0.5M params |
Float32 | 1M params |
Float16 | 2M params |
Int8 | 4M params |
We welcome contributions once the project is ready for external input. To contribute:
- Fork: Create your own fork of the project
- Install: Run
poetry install
(optionally with--with dev,cuda12
) to get theai-hex
command - Implement: Add your model in
results/your_model_name/
- Document: Include comprehensive docstrings and references
- Submit: Create a pull request following our guidelines
- Wait: CI will automatically evaluate your model and update the leaderboard
Use ai-hex tests list
to see available tests, ai-hex tests show test_name
to view test schema, and ai-hex suite run ./path/to/model.py
to run your model against the suite.
We chose JAX and Flax for their:
- 𧩠Functional Design: Clear architecture definitions with immutable state
- β‘ Custom Operations: Comprehensive support through
jax.numpy
- π― Reproducibility: First-class random number handling
We mandate einops
for complex tensor operations to enhance readability. Compare:
# Traditional approach - hard to understand the transformation
x = x.reshape(batch, x.shape[1], x.shape[-2]*2, x.shape[-1]//2)
x = x.transpose(0, 2, 1, 3)
# Using einops - crystal clear intent
x = rearrange(x, 'b t (h d) c -> b (h t) (d c)')
import flax.linen as nn
from einops import rearrange
class Transformer(nn.Module):
"""
Transformer Decoder Stack architecture from 'Attention Is All You Need'.
Reference: https://arxiv.org/abs/1706.03762
"""
hidden_dim: int = 256
num_layers: int = 4
num_heads: int = 4
@nn.compact
def __call__(self, x):
# Architecture implementation
return x
Test suites use a JSON configuration format:
{
"name": "General 1M",
"description": "General architecture performance evaluation",
"metrics": [
{
"name": "Memory Capacity",
"description": "Information storage and recall capability",
"tests": [
{
"weight": 1.0,
"test": {
"name": "hash_map",
"seed": 0,
"key_length": 8,
"value_length": 64,
"num_pairs_range": [32, 65536],
"vocab_size": 1024
}
}
]
}
]
}
π Results are automatically generated via GitHub Actions to ensure fairness. The leaderboard is updated in real-time at ai-hexagon.dev.
If you find AI-HEXAGON helpful, consider buying me a coffee!
This project is licensed under the MIT License - see the LICENSE file for details.
If you use AI-HEXAGON in your research, please cite it as:
@software{ai_hexagon_2024,
author = {Jirka Klimes},
title = {AI-HEXAGON: Neural Architecture Benchmarking Framework},
month = feb,
year = 2024,
publisher = {Zenodo},
doi = {10.5281/zenodo.14060642},
url = {https://doi.org/10.5281/zenodo.14060642}
}