Skip to content

OpenSumeru/Akasha-LaLaMReT

Repository files navigation

Akasha LaLaMReT - Large Language Model Reasoning Test

License: MIT GitHub stars GitHub forks

Akasha LaLaMReT is an open benchmarking project designed to evaluate the mathematical, physical, programming, and language & text reasoning capabilities of large language models (LLMs). The goal is to assess the depth of logical reasoning, problem-solving, and analytical skills that LLMs demonstrate across these diverse domains.


Table of Contents

  1. Features
  2. Structure of the Test
  3. Test Dataset
  4. Evaluation Methodology
  5. Contributing
  6. License
  7. Acknowledgments
  8. Stay Updated

🚀 Features

  • Mathematical Reasoning: Tests covering algebra, calculus, discrete math, and more.
  • Physics Problem-Solving: Challenges on conceptual and numerical physics problems.
  • Programming & Algorithmic Thinking: Code debugging, algorithm implementation, and complexity analysis.
  • Language & Text Reasoning: Evaluates tasks such as text manipulation, natural language understanding, and logical reasoning in language-based challenges.

🏗️ Structure of the Test

1. Mathematics

  • Algebra & Linear Equations
  • Geometry & Trigonometry
  • Calculus: Derivatives, Integrals, Limits
  • Probability & Statistics
  • Discrete Mathematics: Graph Theory, Logic
  • Number Theory & Combinatorics

2. Physics

  • Classical Mechanics: Newtonian Physics
  • Electromagnetism
  • Thermodynamics & Statistical Mechanics
  • Quantum Mechanics
  • Relativity: Special & General
  • Fluid Dynamics

3. Programming & Algorithmic Thinking

  • Code Debugging & Interpretation
  • Algorithm Design: Sorting, Searching, Recursion
  • Data Structures: Linked Lists, Trees, Graphs
  • Complexity Analysis: Big-O, Optimization
  • Code Completion Tasks
  • Parallel & Distributed Computing

4. Language & Text Reasoning

  • Text Analysis & Manipulation: Character counting, substring search, regular expression matching
  • Natural Language Understanding: Interpreting semantics, summarizing content, paraphrasing
  • Logical Reasoning in Text: Solving language-based puzzles, analyzing sentence structures, deducing contextual meaning
  • Pattern Recognition: Identifying trends, classifying content, extracting key information

📊 Test Dataset

Math Dataset

Name Problem Type Example Question Expected Output
Math

Physics Dataset

Name Problem Type Example Question Expected Output
Physics

Programming Dataset

Name Problem Type Example Question Expected Output
Fibonacci Modulo Algorithm Design (Recursion & Modular Arithmetic) Compute the n-th Fibonacci number modulo p fₙ mod p

Language & Text Reasoning Dataset

Name Problem Type Example Question Expected Output
Count Character Text Analysis How many times does the letter "r" appear in "strawbarry"? 3

🎯 Evaluation Methodology

Each test case is scored based on the following criteria:

  1. Accuracy: Is the final answer correct?
  2. Understanding: Does the solution demonstrate a sound grasp of the underlying concepts (mathematical, physical, programming, or language-based)?
  3. Efficiency & Optimization: Is the approach optimal or efficient, particularly for algorithmic challenges?
  4. Explainability: Can the model provide clear and logical reasoning for its answer?

Scoring System: (To be developed further)


📌 Contributing

We welcome contributions! If you have new test cases or improvements, please follow these steps:

  1. Fork the repository.
  2. Add your test cases to the dataset.
  3. Submit a pull request with a detailed description of your changes.

📜 License

This project is open-source and released under the MIT License.


📢 Acknowledgments

Special thanks to all contributors who have helped expand the dataset and refine the evaluation metrics.


🌟 Stay Updated

Follow the project for the latest benchmarks and findings on LLM reasoning abilities.

About

Large Language Model Reasoning Test

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •