Akasha LaLaMReT is an open benchmarking project designed to evaluate the mathematical, physical, programming, and language & text reasoning capabilities of large language models (LLMs). The goal is to assess the depth of logical reasoning, problem-solving, and analytical skills that LLMs demonstrate across these diverse domains.
- Features
- Structure of the Test
- Test Dataset
- Evaluation Methodology
- Contributing
- License
- Acknowledgments
- Stay Updated
- Mathematical Reasoning: Tests covering algebra, calculus, discrete math, and more.
- Physics Problem-Solving: Challenges on conceptual and numerical physics problems.
- Programming & Algorithmic Thinking: Code debugging, algorithm implementation, and complexity analysis.
- Language & Text Reasoning: Evaluates tasks such as text manipulation, natural language understanding, and logical reasoning in language-based challenges.
- Algebra & Linear Equations
- Geometry & Trigonometry
- Calculus: Derivatives, Integrals, Limits
- Probability & Statistics
- Discrete Mathematics: Graph Theory, Logic
- Number Theory & Combinatorics
- Classical Mechanics: Newtonian Physics
- Electromagnetism
- Thermodynamics & Statistical Mechanics
- Quantum Mechanics
- Relativity: Special & General
- Fluid Dynamics
- Code Debugging & Interpretation
- Algorithm Design: Sorting, Searching, Recursion
- Data Structures: Linked Lists, Trees, Graphs
- Complexity Analysis: Big-O, Optimization
- Code Completion Tasks
- Parallel & Distributed Computing
- Text Analysis & Manipulation: Character counting, substring search, regular expression matching
- Natural Language Understanding: Interpreting semantics, summarizing content, paraphrasing
- Logical Reasoning in Text: Solving language-based puzzles, analyzing sentence structures, deducing contextual meaning
- Pattern Recognition: Identifying trends, classifying content, extracting key information
Name | Problem Type | Example Question | Expected Output |
---|---|---|---|
Math |
Name | Problem Type | Example Question | Expected Output |
---|---|---|---|
Physics |
Name | Problem Type | Example Question | Expected Output |
---|---|---|---|
Fibonacci Modulo | Algorithm Design (Recursion & Modular Arithmetic) | Compute the n-th Fibonacci number modulo p | fₙ mod p |
Name | Problem Type | Example Question | Expected Output |
---|---|---|---|
Count Character | Text Analysis | How many times does the letter "r" appear in "strawbarry"? | 3 |
Each test case is scored based on the following criteria:
- Accuracy: Is the final answer correct?
- Understanding: Does the solution demonstrate a sound grasp of the underlying concepts (mathematical, physical, programming, or language-based)?
- Efficiency & Optimization: Is the approach optimal or efficient, particularly for algorithmic challenges?
- Explainability: Can the model provide clear and logical reasoning for its answer?
Scoring System: (To be developed further)
We welcome contributions! If you have new test cases or improvements, please follow these steps:
- Fork the repository.
- Add your test cases to the dataset.
- Submit a pull request with a detailed description of your changes.
This project is open-source and released under the MIT License.
Special thanks to all contributors who have helped expand the dataset and refine the evaluation metrics.
Follow the project for the latest benchmarks and findings on LLM reasoning abilities.