Collect the awesome works evolved around reasoning models like O1/R1! You can also find the collection here.
- OpenAI publishes a deep-research capability.
- OpenAI has launched the latest o3 model: o3-mini & o3-mini-high, which specifically support science, math and coding. These two models are available in ChatGPT App, Poe, etc.
- NVIDIA-NIM has supported the DeepSeek-R1 model.
- Qwen has launched a powerful multi-modal MoE model: Qwen2.5-Max, this model is available in the Bailian platform.
- CodeGPT: VSCode co-pilot now supports R1.
DeepSeek-R1 - DeepSeek-R1 official repository.
Qwen-QwQ - Qwen 2.5 official repository, with QwQ.
S1 from stanford - From Feifei Li team, a distillation and test-time compute impl which can match the performance of O1 and R1.
- DeepSeek-R1-Tech-Report
- DeepSeek-V3 Tech-Report
- Qwen QwQ Technical blog - QwQ: Reflect Deeply on the Boundaries of the Unknown
- OpenAI-o1 Announcement - Learning to Reason with Large Language Models
- Qwen-math-PRM-Tech-Report(MCTS/PRM)
- Qwen2.5 Tech-Report
- DeepSeek Math Tech-Report(GRPO)
- Kimi K1.5 Tech-Report
- Qwen-Math-PRM - The Lessons of Developing Process Reward Models in Mathematical Reasoning
- Large Language Models for Mathematical Reasoning: Progresses and Challenges (EACL 2024)
- Large Language Models Cannot Self-Correct Reasoning Yet (ICLR 2024)
- AT WHICH TRAINING STAGE DOES CODE DATA HELP LLM REASONING? (ICLR 2024)
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought [ code ]
- LlamaV-o1 - Rethinking Step-by-step Visual Reasoning in LLMs
- rStar-Math - Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
- MathScale - Scaling Instruction Tuning for Mathematical Reasoning
- LLMS CAN PLAN ONLY IF WE TELL THEM - A new CoT method: AoT+
- SFT Memorizes, RL Generalizes - A research from DeepMind shows the effect of SFT and RL.
- Frontier AI systems have surpassed the self-replicating red line - A paper from Fudan university indicates that LLM has surpassed the self-replicating red line.
- LIMO - Less is More for Reasoning: Use 817 samples to train a model that surpasses the o1 level models.
- Underthinking of Reasoning models - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
- Competitive Programming with Large Reasoning Models - OpenAI: Competitive Programming with Large Reasoning Models
DeepSeek series:
Model ID | ModelScope | Hugging Face |
---|---|---|
DeepSeek R1 | Model Link | Model Link |
DeepSeek V3 | Model Link | Model Link |
DeepSeek-R1-Distill-Qwen-32B | Model Link | Model Link |
DeepSeek-R1-Distill-Qwen-14B | Model Link | Model Link |
DeepSeek-R1-Distill-Llama-8B | Model Link | Model Link |
DeepSeek-R1-Distill-Qwen-7B | Model Link | Model Link |
DeepSeek-R1-Distill-Qwen-1.5B | Model Link | Model Link |
DeepSeek-R1-GGUF | Model Link | Model Link |
DeepSeek-R1-Distill-Qwen-32B-GGUF | Model Link | Model Link |
DeepSeek-R1-Distill-Llama-8B-GGUF | Model Link | Model Link |
Qwen series:
Model ID | ModelScope | Hugging Face |
---|---|---|
QwQ-32B-Preview | Model Link | Model Link |
QVQ-72B-Preview | Model Link | Model Link |
QwQ-32B-Preview-GGUF | Model Link | Model Link |
QVQ-72B-Preview-bnb-4bit | Model Link | Model Link |
Others:
Model ID | ModelScope | Hugging Face |
---|---|---|
Qwen2-VL-2B-GRPO-8k | - | Model Link |
- Open R1 by Hugging Face: https://github.com/huggingface/open-r1
- This repo is the official repo of Hugging Face to reproduce the training infra of DeepSeek-R1
- TinyZero: https://github.com/Jiayi-Pan/TinyZero
- Clean, minimal, accessible reproduction of DeepSeek R1-Zero
- SimpleRL-Reason: https://github.com/hkust-nlp/simpleRL-reason
- Use OpenRLHF to reproduce DeepSeek-R1
- Ragen: https://github.com/ZihanWang314/RAGEN
- A General-Purpose Reasoning Agent Training Framework and reproduce DeepSeek-R1
- TRL: https://github.com/huggingface/trl
- Hugging Face official training framework which supports open-source GRPO and other RL algorithms.
- OpenRLHF: https://github.com/OpenRLHF/OpenRLHF
- An RL repo which supports RLs(supports REINFORCE++)
- R1-V: https://github.com/Deep-Agent/R1-V
- Multi-modal R1
- Logic-RL: https://github.com/Unakar/Logic-RL
- Align-Anything: https://github.com/PKU-Alignment/align-anything
- Training All-modality Model with Feedback
- Dolphin-R1 (HuggingFace | ModelScope) - 800k samples dataset to train DeepSeek-R1 Distill models.
- R1-Distill-SFT (HuggingFace | ModelScope)
- NuminaMath-TIR - Tool-integrated reasoning (TIR) plays a crucial role in this competition.
- NuminaMath-CoT - Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner.
- BAAI-TACO - TACO is a benchmark for code generation with 26443 problems.
- OpenThoughts-114k - Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles!
- Bespoke-Stratos-17k - A reasoning dataset of questions, reasoning traces, and answers.
- Clevr_CoGenT_TrainA_R1 - A multi-modal dataset for training MM R1 model.
- clevr_cogen_a_train - A R1-distilled visual reasoning dataset.
- S1k - A dataset for training S1 model.
- MATH-500 - A subset of 500 problems from the MATH benchmark that OpenAI created in their Let's Verify Step by Step paper
- AIME-2024 - This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024.
- AIME-VALIDATION - All 90 problems come from AIME 22, AIME 23, and AIME 24
- MATH-LEVEL-4 - A subset of level 4 problems from the MATH benchmark.
- MATH-LEVEL-5 - A subset of level 5 problems from the MATH benchmark.
- aimo-validation-amc - All 83 samples come from AMC12 2022, AMC12 2023
- GPQA-Diamond - Diamond subset from GPQA benchmark.
- Codeforces-Python-Submissions - A dataset of Python submissions from Codeforces.