论文就是你所需要的。
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
Scaling Laws for Neural Language Models | 2020 | OpenAI | ./papers/00001-scaling-laws.pdf |
Let's Verify Step by Step | 2023 | OpenAI | ./papers/00028-Verify-Step-by-Step.pdf |
Efficient Training of Language Models to Fill in the Middle | 2022 | OpenAI | ./papers/00035-fim.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
DeepSeek-V3 Technical Report | 2024 | DeepSeek-AI | ./papers/00042-DeepSeek-V3.pdf |
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | 2025 | DeepSeek-AI | ./papers/00045-DeepSeek_R1.pdf |
Inference-Time Scaling for Generalist Reward Modeling | 2025 | DeepSeek-AI | ./papers/00056-DeepSeek-GRM.pdf |
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | 2024 | DeepSeek-AI | ./papers/00057-GRPO.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
The Llama 3 Herd of Models | 2024 | Meta | ./papers/00006-Llama3.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
Reducing Activation Recomputation in Large Transformer Models | 2022 | NVIDIA | ./papers/00012-selective-activation-recomputation.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
AlphaMath Almost Zero: process Supervision without process | 2024 | Alibaba Group | ./papers/00007-AlphaMath.pdf |
Qwen2.5-Coder Technical Report | 2024 | Alibaba Group | ./papers/00033-Qwen2.5-Coder.pdf |
Qwen3 Technical Report | 2025 | Alibaba Group | ./papers/00069-Qwen3_Technical_Report.pdf |
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models | 2025 | Alibaba Group | ./papers/00073-Qwen3-Embedding.pdf |
WebSailor: Navigating Super-human Reasoning for Web Agent | 2025 | Alibaba Group | ./papers/00082-WebSailor.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
Kimi K2: Open Agentic Intelligence | 2025 | KimiTeam | ./papers/00083-Kimi-K2.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
YaRN: Efficient Context Window Extension of Large Language Models | 2023 | EleutherAI | ./papers/00034-YaRN.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | 2024 | Tsinghua University | ./papers/00003-MiniCPM.pdf |
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | 2024 | Tsinghua University | ./papers/00005-ChatGLM.pdf |
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | 2024 | Tsinghua University | ./papers/00038-LongWriter.pdf |
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search | 2024 | Tsinghua University | ./papers/00043-ReST-MCTS.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | 2024 | Peking University | ./papers/00040-Math-Shepherd.pdf |
ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding | 2024 | Peking University | ./papers/00049-ChartMoE.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning | 2025 | Carnegie Mellon University | ./papers/00053-MRT.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
Understanding R1-Zero-Like Training: A Critical Perspective | 2025 | National University of Singapore | ./papers/00048-understand-r1-zero.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
LIMO: Less is More for Reasoning | 2025 | Shanghai Jiao Tong University | ./papers/00051-LIMO.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
A Comprehensive Survey on Long Context Language Modeling | 2025 | Nanjing University | ./papers/00058-LCLM.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems | 2024 | Renmin University of China | ./papers/00037-o1-like.pdf |
Search-o1: Agentic Search-Enhanced Large Reasoning Models | 2025 | Renmin University of China | ./papers/00059-Search-o1.pdf |
WebThinker: Empowering Large Reasoning Models with Deep Research Capability | 2025 | Renmin University of China | ./papers/00074-WebThinker.pdf |
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning | 2025 | Renmin University of China | ./papers/00062-R1-Searcher.pdf |
论文 | 年份 | 论文单位 | 笔记地址 |
---|---|---|---|
DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning | 2025 | University of Illinois Urbana-Champaign | ./papers/00060-DeepRetrieval.pdf |
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | 2025 | University of Illinois Urbana-Champaign | ./papers/00061-Search-R1.pdf |
- Overleaf: https://www.overleaf.com/