EvolvingLMMs-Lab
diff --git a/‎.github/workflows/claude.yml
Lines changed: 2 additions & 0 deletions b/‎.github/workflows/claude.yml
Lines changed: 2 additions & 0 deletions
diff --git a/‎.gitignore
Lines changed: 3 additions & 1 deletion b/‎.gitignore
Lines changed: 3 additions & 1 deletion
diff --git a/‎CLAUDE.md
Lines changed: 169 additions & 0 deletions b/‎CLAUDE.md
Lines changed: 169 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 4 additions & 2 deletions b/‎README.md
Lines changed: 4 additions & 2 deletions
diff --git a/‎bug_report.md
Lines changed: 0 additions & 131 deletions b/‎bug_report.md
Lines changed: 0 additions & 131 deletions
diff --git a/‎docs/README.md
Lines changed: 23 additions & 5 deletions b/‎docs/README.md
Lines changed: 23 additions & 5 deletions
@@ -0,0 +1,2 @@
+- name: Claude Code Action Official
+  uses: anthropics/claude-code-action@beta
@@ -47,4 +47,6 @@ scripts/
 .venv
 outputs/
 span.log
-uv.lock
+uv.lock
+workspace/*
+.claude/*
@@ -0,0 +1,169 @@
+# Development Guidelines
+
+This document contains critical information about working with this codebase. Follow these guidelines precisely.
+
+## Core Development Rules
+
+1. Package Management
+   - ONLY use uv, NEVER pip
+   - Installation: `uv add package`
+   - Running tools: `uv run tool`
+   - Upgrading: `uv add --dev package --upgrade-package package`
+   - FORBIDDEN: `uv pip install`, `@latest` syntax
+
+2. Code Quality
+   - Type hints required for all code
+   - Public APIs must have docstrings
+   - Functions must be focused and small
+   - Follow existing patterns exactly
+   - Line length: 88 chars maximum
+
+3. Testing Requirements
+   - Framework: `uv run pytest`
+   - Async testing: use anyio, not asyncio
+   - Coverage: test edge cases and errors
+   - New features require tests
+   - Bug fixes require regression tests
+
+4. Code Style
+    - PEP 8 naming (snake_case for functions/variables)
+    - Class names in PascalCase
+    - Constants in UPPER_SNAKE_CASE
+    - Document with docstrings
+    - Use f-strings for formatting
+
+- For commits fixing bugs or adding features based on user reports add:
+  ```bash
+  git commit --trailer "Reported-by:<name>"
+  ```
+  Where `<name>` is the name of the user.
+
+- For commits related to a Github issue, add
+  ```bash
+  git commit --trailer "Github-Issue:#<number>"
+  ```
+- NEVER ever mention a `co-authored-by` or similar aspects. In particular, never
+  mention the tool used to create the commit message or PR.
+
+## Development Philosophy
+
+- **Simplicity**: Write simple, straightforward code
+- **Readability**: Make code easy to understand
+- **Performance**: Consider performance without sacrificing readability
+- **Maintainability**: Write code that's easy to update
+- **Testability**: Ensure code is testable
+- **Reusability**: Create reusable components and functions
+- **Less Code = Less Debt**: Minimize code footprint
+
+## Coding Best Practices
+
+- **Early Returns**: Use to avoid nested conditions
+- **Descriptive Names**: Use clear variable/function names (prefix handlers with "handle")
+- **Constants Over Functions**: Use constants where possible
+- **DRY Code**: Don't repeat yourself
+- **Functional Style**: Prefer functional, immutable approaches when not verbose
+- **Minimal Changes**: Only modify code related to the task at hand
+- **Function Ordering**: Define composing functions before their components
+- **TODO Comments**: Mark issues in existing code with "TODO:" prefix
+- **Simplicity**: Prioritize simplicity and readability over clever solutions
+- **Build Iteratively** Start with minimal functionality and verify it works before adding complexity
+- **Run Tests**: Test your code frequently with realistic inputs and validate outputs
+- **Build Test Environments**: Create testing environments for components that are difficult to validate directly
+- **Functional Code**: Use functional and stateless approaches where they improve clarity
+- **Clean logic**: Keep core logic clean and push implementation details to the edges
+- **File Organsiation**: Balance file organization with simplicity - use an appropriate number of files for the project scale
+
+
+## Core Components
+
+- `__main__.py`: Main entry point
+- `api`: API for the project
+- `tasks`: Tasks for the project
+- `models`: Models for the project
+- `loggers`: Loggers for the project
+- `utils`: Utility functions for the project
+- `tests`: Tests for the project
+- `configs`: Configs for the project
+- `data`: Data for the project
+
+Launch Command:
+
+```bash
+python -m lmms_eval --model qwen2_5_vl --model_args pretrained=Qwen/Qwen2.5-VL-3B-Instruct,max_pixels=12845056,attn_implementation=sdpa --tasks mmmu,mme,mmlu_flan_n_shot_generative --batch_size 128 --limit 8 --device cuda:0
+```
+
+
+
+
+## Pull Requests
+
+- Create a detailed message of what changed. Focus on the high level description of
+  the problem it tries to solve, and how it is solved. Don't go into the specifics of the
+  code unless it adds clarity.
+
+- NEVER ever mention a `co-authored-by` or similar aspects. In particular, never
+  mention the tool used to create the commit message or PR.
+
+## Python Tools
+
+## Code Formatting
+
+1. Ruff
+   - Format: `uv run ruff format .`
+   - Check: `uv run ruff check .`
+   - Fix: `uv run ruff check . --fix`
+   - Critical issues:
+     - Line length (88 chars)
+     - Import sorting (I001)
+     - Unused imports
+   - Line wrapping:
+     - Strings: use parentheses
+     - Function calls: multi-line with proper indent
+     - Imports: split into multiple lines
+
+2. Type Checking
+   - Tool: `uv run pyright`
+   - Requirements:
+     - Explicit None checks for Optional
+     - Type narrowing for strings
+     - Version warnings can be ignored if checks pass
+
+3. Pre-commit
+   - Config: `.pre-commit-config.yaml`
+   - Runs: on git commit
+   - Tools: Prettier (YAML/JSON), Ruff (Python)
+   - Ruff updates:
+     - Check PyPI versions
+     - Update config rev
+     - Commit config first
+
+## Error Resolution
+
+1. CI Failures
+   - Fix order:
+     1. Formatting
+     2. Type errors
+     3. Linting
+   - Type errors:
+     - Get full line context
+     - Check Optional types
+     - Add type narrowing
+     - Verify function signatures
+
+2. Common Issues
+   - Line length:
+     - Break strings with parentheses
+     - Multi-line function calls
+     - Split imports
+   - Types:
+     - Add None checks
+     - Narrow string types
+     - Match existing patterns
+
+3. Best Practices
+   - Check git status before commits
+   - Run formatters before type checks
+   - Keep changes minimal
+   - Follow existing patterns
+   - Document public APIs
+   - Test thoroughly
@@ -14,13 +14,15 @@
 
 🏠 [LMMs-Lab Homepage](https://www.lmms-lab.com/) | 🤗 [Huggingface Datasets](https://huggingface.co/lmms-lab) | <a href="https://emoji.gg/emoji/1684-discord-thread"><img src="https://cdn3.emoji.gg/emojis/1684-discord-thread.png" width="14px" height="14px" alt="Discord_Thread"></a> [discord/lmms-eval](https://discord.gg/zdkwKUqrPy)
 
-📖 [Supported Tasks (90+)](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/current_tasks.md) | 🌟 [Supported Models (30+)](https://github.com/EvolvingLMMs-Lab/lmms-eval/tree/main/lmms_eval/models) | 📚 [Documentation](docs/README.md)
+📖 [Supported Tasks (100+)](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/current_tasks.md) | 🌟 [Supported Models (30+)](https://github.com/EvolvingLMMs-Lab/lmms-eval/tree/main/lmms_eval/models) | 📚 [Documentation](docs/README.md)
 
 ---
 
 ## Annoucement
 
-We warmly welcome contributions from the open-source community!
+- [2025-07] 🚀🚀 We have released the `lmms-eval-0.4`. Please refer to the [release notes](https://github.com/EvolvingLMMs-Lab/lmms-eval/releases/tag/v0.4.0) for more details. This is a major update with new features and improvements, for users wish to use `lmms-eval-0.3` please refer to the branch `stable/v0d3`.
+
+- [2025-04] 🚀🚀 Introducing Aero-1-Audio — a compact yet mighty audio model. We have officially supports evaluation for Aero-1-Audio and it supports batched evaluations! Feel free to try out.
 
 - [2025-07] 🎉🎉 We welcome the new task [PhyX](https://phyx-bench.github.io/), the first large-scale benchmark designed to assess models capacity for physics-grounded reasoning in visual scenarios.
 - [2025-06] 🎉🎉 We welcome the new task [VideoMathQA](https://mbzuai-oryx.github.io/VideoMathQA), designed to evaluate mathematical reasoning in real-world educational videos.
 
@@ -1,12 +1,30 @@
 # LMMs Eval Documentation
 
-Welcome to the docs for `lmms-eval`!
+Welcome to the documentation for `lmms-eval` - a unified evaluation framework for Large Multimodal Models!
+
+This framework enables consistent and reproducible evaluation of multimodal models across various tasks and modalities including images, videos, and audio.
+
+## Overview
+
+`lmms-eval` provides:
+- Standardized evaluation protocols for multimodal models
+- Support for image, video, and audio tasks
+- Easy integration of new models and tasks
+- Reproducible benchmarking with shareable configurations
 
 Majority of this documentation is adapted from [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness/)
 
 ## Table of Contents
 
-* To learn about the command line flags, see the [commands](commands.md)
-* To learn how to add a new moddel,  see the [Model Guide](model_guide.md).
-* For a crash course on adding new tasks to the library, see our [Task Guide](task_guide.md).
-* If you need to upload your datasets into correct HF format with viewer supported, please refer to [tools](https://github.com/EvolvingLMMs-Lab/lmms-eval/tree/pufanyi/hf_dataset_docs/tools)
+* **[Commands Guide](commands.md)** - Learn about command line flags and options
+* **[Model Guide](model_guide.md)** - How to add and integrate new models
+* **[Task Guide](task_guide.md)** - Create custom evaluation tasks
+* **[Current Tasks](current_tasks.md)** - List of all supported evaluation tasks
+* **[Run Examples](run_examples.md)** - Example commands for running evaluations
+* **[Version 0.3 Features](lmms-eval-0.3.md)** - Audio evaluation and new features
+* **[Throughput Metrics](throughput_metrics.md)** - Understanding performance metrics
+
+## Additional Resources
+
+* For dataset formatting tools, see [lmms-eval tools](https://github.com/EvolvingLMMs-Lab/lmms-eval/tree/main/tools)
+* For the latest updates, visit our [GitHub repository](https://github.com/EvolvingLMMs-Lab/lmms-eval)
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+- name: Claude Code Action Official`
	`2`	`+ uses: anthropics/claude-code-action@beta`