A curated list of resources from UC Berkeley's LLM Agents MOOC courses. All resources listed here are directly from course materials, slides, and supplemental readings.
AI Agents are autonomous systems that can reason, plan, and act to accomplish goals. This list focuses on Large Language Model (LLM) based agents and their applications, curated from UC Berkeley's comprehensive MOOC courses.
Course: Large Language Model Agents
Instructors: Dawn Song (UC Berkeley), Xinyun Chen (Google DeepMind)
- YouTube Playlist: Watch Lectures
- Topics: Foundation of LLMs, Reasoning, Planning, Tool use, Agent infrastructure, Code generation, Robotics, Web automation
- Guest Speakers: Denny Zhou, Shunyu Yao, Chi Wang, Jerry Liu, Burak Gokturk, Omar Khattab, Graham Neubig, Nicolas Chapados, Yuandong Tian, Jim Fan, Percy Liang, Ben Mann
Course: Advanced Large Language Model Agents
Instructors: Dawn Song (UC Berkeley), Xinyun Chen (Google DeepMind), Kaiyu Yang (Meta FAIR)
- YouTube Playlist: Watch Lectures
- Topics: Inference-time techniques, Post-training methods, Search and planning, Code generation & verification, Mathematics & theorem proving
- Guest Speakers: Jason Weston, Yu Su, Hanna Hajishirzi, Charles Sutton, Ruslan Salakhutdinov, Caiming Xiong, Thomas Hubert, Sean Welleck, Swarat Chaudhuri
Course: Agentic AI
Instructors: Dawn Song (UC Berkeley), Xinyun Chen (Meta)
- YouTube Playlist: Watch Lectures
- Topics: LLM foundations, Reasoning, Planning, Agentic frameworks, Code generation, Robotics, Web automation, Scientific discovery
- Guest Speakers: Yann Dubois, Yangqing Jia, Jiantao Jiao, Weizhu Chen, Noam Brown, Sida Wang, James Zou, Clay Bavor, Oriol Vinyals, Peter Stone
All papers listed below are from course supplemental readings.
- Large Language Models as Optimizers - Using LLMs as optimizers
- Large Language Models Cannot Self-Correct Reasoning Yet - Limitations of self-correction
- Teaching Large Language Models to Self-Debug - Self-debugging techniques
- Chain-of-Thought Reasoning Without Prompting - Emergent reasoning
- Premise Order Matters in Reasoning with Large Language Models - Reasoning order effects
- Chain-of-Thought Empowers Transformers to Solve Inherently Serial Problems - Sequential reasoning
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model - DPO method
- Iterative Reasoning Preference Optimization - Iterative reasoning alignment
- Chain-of-Verification Reduces Hallucination in Large Language Models - Verification chains
- Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback - Preference learning
- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization - Implicit reasoning
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models - Long-term memory
- Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents - World models for planning
- Tree Search for Language Model Agents - Tree search methods
- ReAct: Synergizing Reasoning and Acting in Language Models - Reasoning + Acting framework
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation - Multi-agent framework
- StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows - State-driven workflows
- Compound AI Systems & DSPy Framework - Composable AI systems
- SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering - Software engineering agents
- OpenHands: An Open Platform for AI Software Developers as Generalist Agents - Generalist coding agents
- Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities - Security vulnerability detection
- From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code - Vulnerability detection
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents - Web interaction agents
- Mind2Web: Towards a Generalist Agent for the Web - Generalist web agents
- WebArena: A Realistic Web Environment for Building Autonomous Agents - Web agent benchmark
- VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks - Visual web agents
- OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments - OS-level agents
- AGUVIS: Unified Pure Vision Agents For Autonomous GUI Interaction - Vision-based GUI agents
- WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? - Knowledge work agents
- WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks - Compositional planning
- TapeAgents: a Holistic Framework for Agent Development and Optimization - Agent optimization
- AlphaProof: when reinforcement learning meets formal mathematics - IMO problem solving
- LeanDojo: Theorem Proving with Retrieval-Augmented Language Models - Theorem proving
- Autoformalization with Large Language Models - Autoformalization
- Autoformalizing Euclidean Geometry - Geometry formalization
- Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs - Proof guidance
- miniCTX: Neural Theorem Proving with Long-Contexts - Long-context theorem proving
- Lean-STaR: Learning to Interleave Thinking and Proving - Thinking and proving
- ImProver: Agent-Based Automated Proof Optimization - Proof optimization
- An In-Context Learning Agent for Formal Theorem-Proving - In-context theorem proving
- Symbolic Regression with a Learned Concept Library - Symbolic regression
- Project GR00T: A Blueprint for Generalist Robotics - Generalist robotics
- Voyager: An Open-Ended Embodied Agent with Large Language Models - Minecraft agent
- Eureka: Human-Level Reward Design via Coding Large Language Models - Reward design
- DrEureka: Language Model Guided Sim-To-Real Transfer - Sim-to-real transfer
- Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning - Racing agents
- SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL - Whole-body RL
- The Virtual Lab of AI agents designs new SARS-CoV-2 nanobodies - Scientific discovery agents
- Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents - Research paper agents
- OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs - Literature synthesis
- Privtrans: Automatically Partitioning Programs for Privilege Separation - Privilege separation
- DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks - Prompt injection detection
- AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases - Agent security
- Progent: Programmable Privilege Control for LLM Agents - Privilege control
- DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models - Trustworthiness evaluation
- Representation Engineering: A Top-Down Approach to AI Transparency - AI transparency
- Extracting Training Data from Large Language Models - Data extraction
- The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks - Memorization testing
- Survey on Evaluation of LLM-based Agents - Comprehensive evaluation survey
- Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations - Statistical evaluation
- τ2-Bench: Evaluating Conversational Agents in a Dual-Control Environment - Conversational agent evaluation
- Introducing SWE-bench Verified - Verified code evaluation
- BrowseComp: a benchmark for browsing agents - Web browsing benchmark
- Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping - Planning with transformers
- Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces - Fast and slow reasoning
- Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets - Global optimization
- SurCo: Learning Linear Surrogates For Combinatorial Nonlinear Optimization Problems - Combinatorial optimization
- Multi-Agent AI - Noam Brown (OpenAI) lecture
- Multi-Agent Systems in the Era of LLMs - Oriol Vinyals (Google DeepMind) lecture
All frameworks and tools listed below were mentioned in the UC Berkeley courses.
- AutoGen - Multi-agent conversation framework (Microsoft) - Featured in Fall 2024 Lecture 3
- DSPy - Programming—not prompting—language models - Featured in Fall 2024 Lecture 5
- LlamaIndex - Data framework for LLM applications - Featured in Fall 2024 Lecture 3 (Multimodal Knowledge Assistant)
- SWE-agent - Agent for automated software engineering - Paper in Fall 2024 Lecture 6
- OpenHands - Open platform for AI software developers - Paper in Fall 2024 Lecture 6
- Lean - Theorem prover and programming language - Used in Spring 2025 mathematics lectures
- LeanDojo - Theorem proving with retrieval-augmented language models - Paper in Spring 2025 Lecture 9
- WorkArena - Benchmark for knowledge work agents - Paper in Fall 2024 Lecture 7
- TapeAgents - Holistic framework for agent development and optimization - Paper in Fall 2024 Lecture 7
All benchmarks listed below were mentioned in the courses.
- SWE-bench - Software engineering benchmark - Mentioned in Fall 2024 Lecture 6
- SWE-bench Verified - Verified code evaluation - Mentioned in Spring 2025
- WebArena - Realistic web environment - Paper in Spring 2025 Lecture 6
- VisualWebArena - Visual web tasks - Mentioned in Spring 2025 Lecture 6
- Mind2Web - Generalist web agent benchmark - Paper in Spring 2025 Lecture 6
- BrowseComp - Browsing agent benchmark - Mentioned in Spring 2025
- OSWORLD - Open-ended OS tasks - Paper in Spring 2025 Lecture 7
- AGUVIS - GUI interaction tasks - Paper in Spring 2025 Lecture 7
- τ2-Bench - Dual-control environment evaluation - Mentioned in Spring 2025
- WorkArena - Knowledge work tasks - Paper in Fall 2024 Lecture 7
- LLM Agents Discord - UC Berkeley LLM Agents community (Official course community)
Contributions welcome! However, please note that this list focuses exclusively on resources from UC Berkeley's LLM Agents MOOC courses. If you'd like to add resources, they must be:
- From course slides or lectures
- From course supplemental readings
- Explicitly mentioned by instructors or guest speakers
To the extent possible under law, the contributors have waived all copyright and related or neighboring rights to this work.
This list is curated exclusively from the excellent UC Berkeley LLM Agents MOOC courses:
- Fall 2024: LLM Agents MOOC
- Spring 2025: Advanced LLM Agents MOOC
- Fall 2025: Agentic AI MOOC
All credit goes to the course instructors, guest speakers, and the UC Berkeley team for creating these comprehensive educational resources.
Maintained by: arvind
Last Updated: January 2026
If you find this list helpful, please consider giving it a ⭐️
Made with ❤️ by the AI Agents community