Skip to content

modelscope/awesome-deep-reasoning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-deep-reasoning

Collect the awesome works evolved around reasoning models like O1/R1! You can also find the collection here.

Table of Contents

News

  • OpenAI publishes a deep-research capability.
  • OpenAI has launched the latest o3 model: o3-mini & o3-mini-high, which specifically support science, math and coding. These two models are available in ChatGPT App, Poe, etc.
  • NVIDIA-NIM has supported the DeepSeek-R1 model.
  • Qwen has launched a powerful multi-modal MoE model: Qwen2.5-Max, this model is available in the Bailian platform.
  • CodeGPT: VSCode co-pilot now supports R1.

Highlights

DeepSeek repos:

DeepSeek-R1 Stars - DeepSeek-R1 official repository.

Qwen repos:

Qwen-QwQ Stars - Qwen 2.5 official repository, with QwQ.

S1 from stanford - From Feifei Li team, a distillation and test-time compute impl which can match the performance of O1 and R1.

Papers

Models

DeepSeek series:

Model ID ModelScope Hugging Face
DeepSeek R1 Model Link Model Link
DeepSeek V3 Model Link Model Link
DeepSeek-R1-Distill-Qwen-32B Model Link Model Link
DeepSeek-R1-Distill-Qwen-14B Model Link Model Link
DeepSeek-R1-Distill-Llama-8B Model Link Model Link
DeepSeek-R1-Distill-Qwen-7B Model Link Model Link
DeepSeek-R1-Distill-Qwen-1.5B Model Link Model Link
DeepSeek-R1-GGUF Model Link Model Link
DeepSeek-R1-Distill-Qwen-32B-GGUF Model Link Model Link
DeepSeek-R1-Distill-Llama-8B-GGUF Model Link Model Link

Qwen series:

Model ID ModelScope Hugging Face
QwQ-32B-Preview Model Link Model Link
QVQ-72B-Preview Model Link Model Link
QwQ-32B-Preview-GGUF Model Link Model Link
QVQ-72B-Preview-bnb-4bit Model Link Model Link

Others:

Model ID ModelScope Hugging Face
Qwen2-VL-2B-GRPO-8k - Model Link

Infra

Datasets

  • Dolphin-R1 (HuggingFace | ModelScope) - 800k samples dataset to train DeepSeek-R1 Distill models.
  • R1-Distill-SFT (HuggingFace | ModelScope)
  • NuminaMath-TIR - Tool-integrated reasoning (TIR) plays a crucial role in this competition.
  • NuminaMath-CoT - Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner.
  • BAAI-TACO - TACO is a benchmark for code generation with 26443 problems.
  • OpenThoughts-114k - Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles!
  • Bespoke-Stratos-17k - A reasoning dataset of questions, reasoning traces, and answers.
  • Clevr_CoGenT_TrainA_R1 - A multi-modal dataset for training MM R1 model.
  • clevr_cogen_a_train - A R1-distilled visual reasoning dataset.
  • S1k - A dataset for training S1 model.

Evaluation

  • MATH-500 - A subset of 500 problems from the MATH benchmark that OpenAI created in their Let's Verify Step by Step paper
  • AIME-2024 - This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024.
  • AIME-VALIDATION - All 90 problems come from AIME 22, AIME 23, and AIME 24
  • MATH-LEVEL-4 - A subset of level 4 problems from the MATH benchmark.
  • MATH-LEVEL-5 - A subset of level 5 problems from the MATH benchmark.
  • aimo-validation-amc - All 83 samples come from AMC12 2022, AMC12 2023
  • GPQA-Diamond - Diamond subset from GPQA benchmark.
  • Codeforces-Python-Submissions - A dataset of Python submissions from Codeforces.

Tools