Akasha LaLaMReT - Large Language Model Reasoning Test

Akasha LaLaMReT is an open benchmarking project designed to evaluate the mathematical, physical, programming, and language & text reasoning capabilities of large language models (LLMs). The goal is to assess the depth of logical reasoning, problem-solving, and analytical skills that LLMs demonstrate across these diverse domains.

🚀 Features

Mathematical Reasoning: Tests covering algebra, calculus, discrete math, and more.
Physics Problem-Solving: Challenges on conceptual and numerical physics problems.
Programming & Algorithmic Thinking: Code debugging, algorithm implementation, and complexity analysis.
Language & Text Reasoning: Evaluates tasks such as text manipulation, natural language understanding, and logical reasoning in language-based challenges.

🏗️ Structure of the Test

1. Mathematics

Algebra & Linear Equations
Geometry & Trigonometry
Calculus: Derivatives, Integrals, Limits
Probability & Statistics
Discrete Mathematics: Graph Theory, Logic
Number Theory & Combinatorics

2. Physics

Classical Mechanics: Newtonian Physics
Electromagnetism
Thermodynamics & Statistical Mechanics
Quantum Mechanics
Relativity: Special & General
Fluid Dynamics

3. Programming & Algorithmic Thinking

Code Debugging & Interpretation
Algorithm Design: Sorting, Searching, Recursion
Data Structures: Linked Lists, Trees, Graphs
Complexity Analysis: Big-O, Optimization
Code Completion Tasks
Parallel & Distributed Computing

4. Language & Text Reasoning

Text Analysis & Manipulation: Character counting, substring search, regular expression matching
Natural Language Understanding: Interpreting semantics, summarizing content, paraphrasing
Logical Reasoning in Text: Solving language-based puzzles, analyzing sentence structures, deducing contextual meaning
Pattern Recognition: Identifying trends, classifying content, extracting key information

📊 Test Dataset

Math Dataset

Name	Problem Type	Example Question	Expected Output
Math

Physics Dataset

Name	Problem Type	Example Question	Expected Output
Physics

Programming Dataset

Name	Problem Type	Example Question	Expected Output
Fibonacci Modulo	Algorithm Design (Recursion & Modular Arithmetic)	Compute the n-th Fibonacci number modulo p	fₙ mod p

Language & Text Reasoning Dataset

Name	Problem Type	Example Question	Expected Output
Count Character	Text Analysis	How many times does the letter "r" appear in "strawbarry"?	3

🎯 Evaluation Methodology

Each test case is scored based on the following criteria:

Accuracy: Is the final answer correct?
Understanding: Does the solution demonstrate a sound grasp of the underlying concepts (mathematical, physical, programming, or language-based)?
Efficiency & Optimization: Is the approach optimal or efficient, particularly for algorithmic challenges?
Explainability: Can the model provide clear and logical reasoning for its answer?

Scoring System: (To be developed further)

📌 Contributing

We welcome contributions! If you have new test cases or improvements, please follow these steps:

Fork the repository.
Add your test cases to the dataset.
Submit a pull request with a detailed description of your changes.

📜 License

This project is open-source and released under the MIT License.

📢 Acknowledgments

Special thanks to all contributors who have helped expand the dataset and refine the evaluation metrics.

🌟 Stay Updated

Follow the project for the latest benchmarks and findings on LLM reasoning abilities.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Language & Text Reasoning Dataset/Count Character		Language & Text Reasoning Dataset/Count Character
Programming Dataset/Luogu P4000		Programming Dataset/Luogu P4000
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Akasha LaLaMReT - Large Language Model Reasoning Test

Table of Contents

🚀 Features

🏗️ Structure of the Test

1. Mathematics

2. Physics

3. Programming & Algorithmic Thinking

4. Language & Text Reasoning

📊 Test Dataset

Math Dataset

Physics Dataset

Programming Dataset

Language & Text Reasoning Dataset

🎯 Evaluation Methodology

📌 Contributing

📜 License

📢 Acknowledgments

🌟 Stay Updated

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

License

OpenSumeru/Akasha-LaLaMReT

Folders and files

Latest commit

History

Repository files navigation

Akasha LaLaMReT - Large Language Model Reasoning Test

Table of Contents

🚀 Features

🏗️ Structure of the Test

1. Mathematics

2. Physics

3. Programming & Algorithmic Thinking

4. Language & Text Reasoning

📊 Test Dataset

Math Dataset

Physics Dataset

Programming Dataset

Language & Text Reasoning Dataset

🎯 Evaluation Methodology

📌 Contributing

📜 License

📢 Acknowledgments

🌟 Stay Updated

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages