Awesome-deep-reasoning

Collect the awesome works evolved around reasoning models like O1/R1! You can also find the collection here.

News

OpenAI publishes a deep-research capability.
OpenAI has launched the latest o3 model: o3-mini & o3-mini-high, which specifically support science, math and coding. These two models are available in ChatGPT App, Poe, etc.
NVIDIA-NIM has supported the DeepSeek-R1 model.
Qwen has launched a powerful multi-modal MoE model: Qwen2.5-Max, this model is available in the Bailian platform.
CodeGPT: VSCode co-pilot now supports R1.

Highlights

DeepSeek repos:

DeepSeek-R1 - DeepSeek-R1 official repository.

Qwen repos:

Qwen-QwQ - Qwen 2.5 official repository, with QwQ.

S1 from stanford - From Feifei Li team, a distillation and test-time compute impl which can match the performance of O1 and R1.

Papers

DeepSeek-R1-Tech-Report
DeepSeek-V3 Tech-Report
Qwen QwQ Technical blog - QwQ: Reflect Deeply on the Boundaries of the Unknown
OpenAI-o1 Announcement - Learning to Reason with Large Language Models
Qwen-math-PRM-Tech-Report(MCTS/PRM)
Qwen2.5 Tech-Report
DeepSeek Math Tech-Report(GRPO)
Kimi K1.5 Tech-Report
Qwen-Math-PRM - The Lessons of Developing Process Reward Models in Mathematical Reasoning
Large Language Models for Mathematical Reasoning: Progresses and Challenges (EACL 2024)
Large Language Models Cannot Self-Correct Reasoning Yet (ICLR 2024)
AT WHICH TRAINING STAGE DOES CODE DATA HELP LLM REASONING? (ICLR 2024)
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought [ code ]
LlamaV-o1 - Rethinking Step-by-step Visual Reasoning in LLMs
rStar-Math - Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
MathScale - Scaling Instruction Tuning for Mathematical Reasoning
LLMS CAN PLAN ONLY IF WE TELL THEM - A new CoT method: AoT+
SFT Memorizes, RL Generalizes - A research from DeepMind shows the effect of SFT and RL.
Frontier AI systems have surpassed the self-replicating red line - A paper from Fudan university indicates that LLM has surpassed the self-replicating red line.
LIMO - Less is More for Reasoning: Use 817 samples to train a model that surpasses the o1 level models.
Underthinking of Reasoning models - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Competitive Programming with Large Reasoning Models - OpenAI: Competitive Programming with Large Reasoning Models

Models

DeepSeek series:

Model ID	ModelScope	Hugging Face
DeepSeek R1	Model Link	Model Link
DeepSeek V3	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-32B	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-14B	Model Link	Model Link
DeepSeek-R1-Distill-Llama-8B	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-7B	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-1.5B	Model Link	Model Link
DeepSeek-R1-GGUF	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-32B-GGUF	Model Link	Model Link
DeepSeek-R1-Distill-Llama-8B-GGUF	Model Link	Model Link

Qwen series:

Model ID	ModelScope	Hugging Face
QwQ-32B-Preview	Model Link	Model Link
QVQ-72B-Preview	Model Link	Model Link
QwQ-32B-Preview-GGUF	Model Link	Model Link
QVQ-72B-Preview-bnb-4bit	Model Link	Model Link

Others:

Model ID	ModelScope	Hugging Face
Qwen2-VL-2B-GRPO-8k	-	Model Link

Infra

Open R1 by Hugging Face: https://github.com/huggingface/open-r1
- This repo is the official repo of Hugging Face to reproduce the training infra of DeepSeek-R1
TinyZero: https://github.com/Jiayi-Pan/TinyZero
- Clean, minimal, accessible reproduction of DeepSeek R1-Zero
SimpleRL-Reason: https://github.com/hkust-nlp/simpleRL-reason
- Use OpenRLHF to reproduce DeepSeek-R1
Ragen: https://github.com/ZihanWang314/RAGEN
- A General-Purpose Reasoning Agent Training Framework and reproduce DeepSeek-R1
TRL: https://github.com/huggingface/trl
- Hugging Face official training framework which supports open-source GRPO and other RL algorithms.
OpenRLHF: https://github.com/OpenRLHF/OpenRLHF
- An RL repo which supports RLs(supports REINFORCE++)
R1-V: https://github.com/Deep-Agent/R1-V
- Multi-modal R1
Logic-RL: https://github.com/Unakar/Logic-RL
Align-Anything: https://github.com/PKU-Alignment/align-anything
- Training All-modality Model with Feedback

Datasets

Dolphin-R1 (HuggingFace | ModelScope) - 800k samples dataset to train DeepSeek-R1 Distill models.
R1-Distill-SFT (HuggingFace | ModelScope)
NuminaMath-TIR - Tool-integrated reasoning (TIR) plays a crucial role in this competition.
NuminaMath-CoT - Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner.
BAAI-TACO - TACO is a benchmark for code generation with 26443 problems.
OpenThoughts-114k - Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles!
Bespoke-Stratos-17k - A reasoning dataset of questions, reasoning traces, and answers.
Clevr_CoGenT_TrainA_R1 - A multi-modal dataset for training MM R1 model.
clevr_cogen_a_train - A R1-distilled visual reasoning dataset.
S1k - A dataset for training S1 model.

Evaluation

MATH-500 - A subset of 500 problems from the MATH benchmark that OpenAI created in their Let's Verify Step by Step paper
AIME-2024 - This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024.
AIME-VALIDATION - All 90 problems come from AIME 22, AIME 23, and AIME 24
MATH-LEVEL-4 - A subset of level 4 problems from the MATH benchmark.
MATH-LEVEL-5 - A subset of level 5 problems from the MATH benchmark.
aimo-validation-amc - All 83 samples come from AMC12 2022, AMC12 2023
GPQA-Diamond - Diamond subset from GPQA benchmark.
Codeforces-Python-Submissions - A dataset of Python submissions from Codeforces.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-deep-reasoning

Table of Contents

News

Highlights

DeepSeek repos:

Qwen repos:

Papers

Models

Infra

Datasets

Evaluation

Tools

About

Releases

Packages

Contributors 3

Languages

modelscope/awesome-deep-reasoning

Folders and files

Latest commit

History

Repository files navigation

Awesome-deep-reasoning

Table of Contents

News

Highlights

DeepSeek repos:

Qwen repos:

Papers

Models

Infra

Datasets

Evaluation

Tools

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages