Skip to content

ContextLab/final-project-llm-course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Final Project: Your LLM Research Capstone

Accept this assignment: GitHub Classroom Link

Due: March 9, 2026 at 11:59 PM EST

Click the link above to create your private repository for this assignment. Complete your work in Google Colab, then push your notebook, writeup, and presentation to the repository before the deadline.


Overview

The final project is your opportunity to synthesize everything you've learned throughout this course into an ambitious, cutting-edge project that showcases your mastery of large language models and conversational AI. This is a capstone experience where you'll tackle a challenging problem that would be nearly impossible in a traditional course—but is achievable with the help of modern GenAI tools like Claude, ChatGPT, and GitHub Copilot.

Unlike the structured problem sets, this project is open-ended and research-oriented. You have the creative freedom to explore novel applications, replicate and extend published research, build innovative systems, or conduct rigorous evaluations of LLM capabilities. The goal is to produce work that is intellectually sophisticated, technically ambitious, and potentially publishable or deployable in the real world.

This is your chance to:

  • Explore a topic you're genuinely passionate about
  • Push the boundaries of what you thought possible
  • Create something meaningful that extends beyond the classroom
  • Build a portfolio piece that demonstrates your AI/ML capabilities
  • Potentially contribute to ongoing research in the field

You will work in teams of 2-3 students over 4 weeks (Weeks 7-10), leveraging your combined skills and interests to tackle problems that would be challenging for any individual. The collaborative nature of this project mirrors real-world research and industry practices.

Learning Objectives

By completing this project, you will:

  1. Synthesize course concepts: Integrate knowledge from all assignments—from rule-based systems (ELIZA) to modern transformers (GPT), from embeddings to fine-tuning.
  2. Design and execute research: Formulate research questions, design methodologies, implement solutions, and critically evaluate results.
  3. Master LLM tools and frameworks: Work extensively with Hugging Face, OpenAI/Anthropic APIs, PyTorch, and other modern ML frameworks.
  4. Leverage GenAI for ambitious projects: Use AI coding assistants to implement complex systems that would traditionally take months.
  5. Communicate technical work: Present findings clearly through code, visualizations, presentations, and written reports.
  6. Think critically about AI: Evaluate limitations, biases, ethical implications, and societal impacts of your work.
  7. Collaborate effectively: Work in a team to divide tasks, integrate components, and produce cohesive results.

Project Scope

Your project should represent approximately 4 weeks of focused work for a team of 2-3 students (Weeks 7-10). This translates to roughly 15-20 hours per person per week, though the actual time investment may vary based on your background and the project's complexity.

Scale and Ambition

With GenAI assistance, you can tackle projects that would have been impossible just a few years ago. However, be realistic about scope:

Appropriate scale:

  • Fine-tuning a model on a custom dataset for a specific task
  • Building a multi-component system (e.g., RAG with custom embeddings and retrieval)
  • Replicating a recent paper and extending it with new experiments
  • Creating a novel evaluation framework for LLM capabilities
  • Developing an agent-based system with multiple tools and reasoning components

Too small:

  • Simple prompt engineering exercises
  • Basic API wrapper applications
  • Straightforward classification tasks using existing models
  • Tutorials or literature reviews without implementation

Too ambitious (probably):

  • Training a foundation model from scratch
  • Building a production-scale application with full frontend/backend
  • Projects requiring weeks of data collection
  • Multiple independent research questions that should be separate projects

Finding the Right Balance

A good rule of thumb: your project should involve at least 3-4 substantial technical components (e.g., custom data processing + model fine-tuning + evaluation framework + analysis), where each component requires thoughtful design and implementation.

Project Types

Your project should fall into one or more of the following categories:

1. Novel Applications of LLMs

Design and implement a new application that leverages LLMs in creative ways. Focus on solving a real problem or creating something genuinely useful.

Examples:

  • Domain-specific assistants (medical diagnosis helper, legal research tool, educational tutor)
  • Creative tools (story co-writer, game master, music lyric generator)
  • Accessibility applications (sign language translation, simplified text generation)
  • Professional tools (code documentation generator, literature review assistant)

2. Research Replication and Extension

Reproduce results from a recent research paper, then extend the work with new experiments, datasets, or analyses.

Examples:

  • Replicate a fine-tuning approach from a paper, then test on new domains
  • Reproduce an evaluation study, then extend with additional models or metrics
  • Implement a published architecture, then ablate components to understand contributions

3. New Model Architectures or Training Approaches

Explore modifications to existing architectures or novel training procedures. This is advanced but achievable with GenAI help.

Examples:

  • Modified attention mechanisms for specific tasks
  • Novel fine-tuning approaches (e.g., parameter-efficient methods)
  • Hybrid architectures combining different model types
  • Custom loss functions or training objectives

4. Analysis and Evaluation of LLMs

Conduct rigorous empirical studies of LLM capabilities, limitations, or behaviors.

Examples:

  • Systematic evaluation of reasoning abilities across multiple models
  • Bias analysis in specific domains or demographic groups
  • Consistency and reliability testing under various conditions
  • Comparative studies of different prompting strategies or model families

5. Multimodal and Agent-Based Systems

Build systems that integrate multiple modalities (text, image, audio) or implement autonomous agents with reasoning and tool use.

Examples:

  • Vision-language systems for image description or visual question answering
  • Agents that can plan, use tools, and complete complex tasks
  • Multi-agent systems with specialized roles and communication
  • Cross-modal retrieval or generation systems

6. Safety, Alignment, and Ethics

Investigate critical issues around AI safety, bias, fairness, or societal impact.

Examples:

  • Jailbreaking detection and mitigation strategies
  • Bias measurement and debiasing techniques
  • Factuality and hallucination analysis
  • Privacy and security considerations in LLM applications

Project Ideas

Here are 20 diverse project ideas to inspire you. These span different difficulty levels and interest areas. Feel free to use these as starting points, combine elements, or create something entirely new.

RAG and Retrieval Systems

  1. Domain-Specific RAG System: Build a retrieval-augmented generation system for a specialized domain (e.g., medical literature, legal documents, historical archives). Implement custom embeddings, retrieval strategies, and evaluate against baselines.

  2. Conversational Search Engine: Create a multi-turn conversational search system that maintains context, asks clarifying questions, and provides source citations. Compare different retrieval and reranking strategies.

  3. Hybrid Retrieval Architecture: Combine dense embeddings, sparse retrieval (BM25), and knowledge graphs for enhanced retrieval. Systematically evaluate each component's contribution.

Fine-Tuning and Adaptation

  1. Low-Resource Language Adaptation: Fine-tune an LLM for a low-resource language or dialect using limited data. Explore techniques like cross-lingual transfer, data augmentation, and parameter-efficient fine-tuning.

  2. Scientific Writing Assistant: Fine-tune a model to help researchers write better papers by learning from high-quality publications in a specific field. Implement style transfer and citation generation.

  3. Personalized Conversation Models: Develop methods to adapt dialogue models to individual user preferences and communication styles while preserving helpfulness and safety.

Dialogue and Interaction

  1. Multi-Turn Reasoning Dialogues: Build a conversational system that can engage in extended reasoning tasks (math problems, logic puzzles, strategic planning) while explaining its thought process.

  2. Empathetic Support Chatbot: Create a psychologically-informed chatbot for emotional support or mental health check-ins. Evaluate empathy, safety, and effectiveness using human evaluation and automated metrics.

  3. Socratic Tutoring System: Implement an educational assistant that uses Socratic questioning to guide students through problem-solving rather than providing direct answers.

Code and Technical Applications

  1. Code Documentation Generator: Build a system that generates comprehensive documentation for code repositories, including docstrings, README files, and tutorials. Test on diverse codebases and programming languages.

  2. Bug Detection and Repair: Develop an LLM-based system for finding and fixing bugs in code. Create a benchmark and compare against existing tools.

  3. Technical Interview Practice System: Create an interactive system that conducts realistic technical interviews, provides feedback, and adapts difficulty based on performance.

Creative and Cultural Applications

  1. Interactive Fiction Engine: Build a system for collaborative storytelling where the model maintains narrative coherence, character consistency, and player agency across long interactions.

  2. Cross-Cultural Communication Assistant: Develop a tool that helps bridge cultural communication gaps by explaining context, suggesting phrasing, and highlighting potential misunderstandings.

  3. Historical Dialogue Simulation: Create historically-accurate conversational agents representing figures from different time periods, trained on period-appropriate texts and evaluated for authenticity.

Cognitive Science and Modeling

  1. LLMs as Cognitive Models: Systematically evaluate whether LLMs exhibit human-like cognitive biases, heuristics, and reasoning patterns. Compare model behavior to psychological literature.

  2. Language Acquisition Simulation: Investigate how LLMs "learn" linguistic structures by analyzing learning trajectories, comparing to child language acquisition data.

  3. Theory of Mind in LLMs: Design experiments to test whether models exhibit theory of mind capabilities—understanding beliefs, intentions, and mental states of others.

Interpretability and Analysis

  1. Mechanistic Interpretability Study: Use techniques like attention visualization, activation probing, and causal interventions to understand how models perform specific tasks (e.g., factual recall, arithmetic, syntax).

  2. Failure Mode Taxonomy: Systematically categorize and analyze LLM failures across multiple models and tasks. Develop predictors for when failures occur and potential mitigations.

Advanced Reasoning and Agents

  1. Multi-Step Planning Agent: Build an agent that can decompose complex goals, create plans, execute actions with tools, and adapt based on feedback. Test on challenging benchmarks.

  2. Scientific Hypothesis Generator: Create a system that reads research papers, identifies gaps, and proposes testable hypotheses. Evaluate novelty and plausibility with domain experts.

Requirements

Your final project must include four key deliverables:

1. Project Proposal (Due Week 8)

Submit a 1-2 page proposal that includes:

  • Title and team members: Who is working on this?
  • Motivation: What problem are you solving? Why does it matter?
  • Research questions: What specific questions will you investigate?
  • Approach: What methods, models, and datasets will you use?
  • Expected outcomes: What do you hope to achieve or demonstrate?
  • Timeline: How will you allocate work over the project period?
  • Division of labor: How will team members split responsibilities?

The proposal helps you clarify your ideas and allows us to provide early feedback to ensure your project is appropriately scoped.

2. Code Implementation (Due Week 10)

Submit a well-documented Jupyter notebook that:

  • Runs completely in Google Colaboratory without errors
  • Includes clear markdown explanations of your approach, decisions, and findings
  • Contains all code necessary to reproduce your results
  • Automatically downloads/loads any required datasets, models, or resources
  • Follows best practices: modular code, informative comments, reproducible results
  • Includes visualizations, tables, and figures to illustrate key findings
  • Provides clear instructions for running experiments or using your system

Code quality matters: Your notebook should be something you'd be proud to share publicly or include in a portfolio.

3. Presentation (Due Week 10)

Create a 10-12 minute video presentation and present it in class:

  • Video submission: Upload to YouTube (unlisted is fine) and include the link in your writeup
  • In-class component: Be prepared to present your video and lead a 5-10 minute discussion/Q&A
  • Content: Cover motivation, approach, key results, insights, and limitations
  • Visuals: Use slides, demos, visualizations—make it engaging and clear
  • Team participation: All team members should contribute to the presentation

The presentation should be accessible to your classmates—assume they're smart but may not know the specific technical details of your domain.

4. Written Writeup (Due Week 10)

Submit a 2-5 page writeup (not including references or appendices) structured as:

Introduction (0.5-1 page)

  • Motivation and background
  • Research questions or goals
  • Why this problem is interesting/important

Approach (1-2 pages)

  • Methods, models, datasets used
  • Technical details of your implementation
  • Design decisions and justifications

Results (1-1.5 pages)

  • Key findings, with figures and tables
  • Quantitative metrics and qualitative observations
  • Comparison to baselines or related work

Discussion (0.5-1 page)

  • Interpretation of results
  • Limitations and potential improvements
  • Broader implications
  • Future directions

References

  • Cite relevant papers, tools, datasets
  • Follow a consistent citation format (APA, IEEE, etc.)

Write clearly and concisely. Think of this as a mini research paper suitable for a workshop or course anthology.

Timeline and Milestones

Week 7: Team Formation and Brainstorming

  • Form teams of 2-3 students
  • Brainstorm project ideas
  • Review recent papers and existing systems for inspiration
  • Discuss with instructor or peers to refine ideas
  • Deliverable: Team formed, preliminary idea identified

Week 8: Proposal, Feedback, and Implementation Starts

  • Develop detailed project proposal (1-2 pages)
  • Identify datasets, models, and tools needed
  • Create timeline and divide initial responsibilities
  • Submit proposal and receive feedback
  • Begin implementation: setup environment, load data, test baseline approaches
  • Deliverable: Proposal submitted, initial codebase, preliminary experiments

Week 9: Core Development and Iteration

  • Implement main technical components
  • Run experiments and collect results
  • Iterate based on findings—adjust approaches as needed
  • Begin drafting writeup and presentation materials
  • Deliverable: Major progress on implementation, preliminary results

Week 10: Final Push and Presentations

  • Finalize code and ensure reproducibility
  • Complete all experiments and analysis
  • Finish writeup and presentation video
  • Present to class and engage in discussions
  • Deliverable: All final materials submitted, presentation delivered

Grading Rubric

Your project will be evaluated holistically, with the final grade broken down as:

Proposal (10%)

  • Clarity: Is the project clearly defined?
  • Feasibility: Is the scope appropriate for the timeline?
  • Significance: Does it address an interesting problem?

Implementation (40%)

  • Technical quality: Is the code well-written, documented, and reproducible?
  • Sophistication: Does it demonstrate mastery of course concepts?
  • Functionality: Does it work as intended?
  • Reproducibility: Can results be replicated in Colab?

Results and Analysis (20%)

  • Rigor: Are experiments well-designed and thorough?
  • Insights: Do results provide meaningful insights?
  • Evaluation: Are appropriate metrics and baselines used?
  • Critical thinking: Are limitations and interpretations discussed thoughtfully?

Presentation (15%)

  • Clarity: Is the presentation clear and well-organized?
  • Engagement: Is it interesting and accessible?
  • Completeness: Does it cover all key aspects?
  • Participation: Did all team members contribute?

Writeup (10%)

  • Writing quality: Is it clear, concise, and well-structured?
  • Completeness: Does it cover all required sections?
  • Depth: Does it provide sufficient detail and analysis?
  • References: Are sources properly cited?

Teamwork and Collaboration (5%)

  • Distribution of work: Did all team members contribute meaningfully?
  • Integration: Are components well-integrated into a cohesive whole?
  • Process: Did the team work effectively together?

Note: Exceptional projects that go above and beyond expectations may receive bonus points. Projects that demonstrate novel insights, publishable quality, or significant real-world applicability will be highlighted.

Resources and Support

Course Materials

Review all assignments and lectures—they contain techniques, tools, and insights directly applicable to your project:

  • Assignment 1 (ELIZA): Rule-based systems, pattern matching, conversation management
  • Assignment 2 (SPAM): Classification, evaluation metrics, text preprocessing
  • Assignment 3 (Wikipedia): Embeddings (LDA, BERT, GPT-2, Llama), clustering, dimensionality reduction (PCA, UMAP)
  • Assignment 4 (Customer Service): Dialogue systems, context management, response generation
  • Assignment 5 (GPT): Model architectures, training from scratch, transformer mechanics

Key Tools and Frameworks

  • Hugging Face Transformers: Pre-trained models, tokenizers, fine-tuning pipelines
  • Hugging Face Datasets: Access to thousands of datasets
  • OpenAI API / Anthropic API: Access to GPT-4, Claude, and other commercial models
  • LangChain: Building LLM applications, chains, and agents
  • FAISS / Pinecone: Vector databases for retrieval systems
  • PyTorch / TensorFlow: Deep learning frameworks
  • Weights & Biases / TensorBoard: Experiment tracking and visualization
  • Ollama: Running open-source models locally

Finding Datasets

Recent Research

Stay current with cutting-edge work:

Getting Help

  • Discord: Use our course Discord for questions, discussions, and collaboration
  • Office Hours: Schedule time to discuss project ideas, troubleshoot issues, or get feedback
  • Peers: Discuss ideas with other teams—collaboration across teams is encouraged
  • GenAI Tools: Use ChatGPT, Claude, GitHub Copilot extensively—document what you use

Tips for Success

Start Early

Don't underestimate the time required. Even with GenAI assistance, projects take longer than expected. Starting early gives you time to iterate, handle unexpected challenges, and produce polished results.

Embrace Failure and Iteration

Not everything will work on the first try. That's normal in research. Build in time to try multiple approaches, debug issues, and refine your methods based on what you learn.

Use GenAI Aggressively

This is where you can be truly ambitious. Use tools like Claude, ChatGPT, and GitHub Copilot to:

  • Implement complex architectures you haven't built before
  • Write data processing pipelines quickly
  • Debug tricky errors
  • Generate boilerplate code
  • Understand unfamiliar libraries or papers
  • Brainstorm ideas and approaches

Remember: You're responsible for understanding and validating what AI generates. Always test, verify, and critically evaluate AI-generated code.

Scope Appropriately

It's better to do one thing really well than three things poorly. If you find your project is too ambitious, narrow the focus rather than delivering incomplete work.

Communicate Clearly

Technical sophistication is important, but so is clear communication. Make sure your code, writeup, and presentation are understandable to intelligent non-experts.

Document Everything

Keep track of experiments, decisions, and results as you go. This makes writing the final report much easier and helps you understand what worked and why.

Leverage Your Team's Strengths

Different team members may excel at different aspects (coding, experimentation, writing, visualization). Divide labor strategically but ensure everyone understands the full project.

Make It Yours

Choose a project you're genuinely excited about. Passion and curiosity will sustain you through challenges and lead to better results.

Example Projects from Prior Years

Note: As this is a new course, we don't yet have prior student projects to showcase. However, here are examples of the caliber of work we hope to see:

  • Cognitive bias evaluation framework: Systematic testing of whether LLMs exhibit human-like cognitive biases, with comparisons across model families
  • Domain-adapted medical QA system: Fine-tuned model for answering medical questions with retrieval augmentation, evaluated against human experts
  • Interactive mathematical reasoning tutor: Socratic dialogue system that guides students through proofs and problem-solving
  • Multimodal scientific figure understanding: System that can interpret graphs, diagrams, and figures from research papers and answer questions about them
  • Debate agent with evidence retrieval: Autonomous system that can argue positions on controversial topics by retrieving and citing evidence

Your project can match or exceed these examples. With modern tools and GenAI assistance, sophisticated projects are within reach.

Submission Guidelines

GitHub Classroom Submission

This assignment is submitted via GitHub Classroom. Follow these steps:

  1. Accept the assignment: Click the assignment link provided in Canvas or by your instructor

  2. Clone your repository:

    git clone https://github.com/ContextLab/final-project-llm-course-YOUR_USERNAME.git
  3. Complete your work:

    • Work in Google Colab, Jupyter, or your preferred environment
    • Save your notebook, writeup, and presentation materials to the repository
  4. Commit and push your changes:

    git add .
    git commit -m "Complete final project"
    git push
  5. Verify submission: Check that your latest commit appears in your GitHub repository before the deadline

Deadlines

  • Project Proposal: Week 8 (submit via GitHub Classroom)
  • Final Submission (Code, Writeup, Presentation): March 9, 2026 at 11:59 PM EST

Technical Requirements

Google Colaboratory Compatibility

Your project must run in Google Colab. This means:

  • Use publicly accessible datasets (or include download code)
  • Include all installation commands in the notebook (!pip install ...)
  • Don't rely on local files or proprietary data
  • Test in a fresh Colab session before submission
  • Consider compute constraints (RAM, GPU availability)
  • Use !nvidia-smi to check GPU allocation if needed

Model and Data Accessibility

  • Use open-source models (Hugging Face) or provide API keys instructions
  • For large models, use quantized versions or API access
  • Include code to automatically download datasets
  • Document any manual setup steps clearly

Reproducibility

  • Set random seeds for reproducible results
  • Include version numbers for key libraries
  • Save and load model checkpoints appropriately
  • Document any hyperparameters used

Documentation

  • Use markdown cells liberally to explain your approach
  • Include docstrings for functions
  • Comment complex code sections
  • Provide a "Quick Start" section at the top of your notebook
  • Include a table of contents for longer notebooks

Final Thoughts

This final project is your chance to create something remarkable. You've learned about the full spectrum of language models—from simple pattern matching to sophisticated transformers. You've implemented these systems, evaluated them, and thought critically about their capabilities and limitations.

Now it's time to apply that knowledge to a problem you care about. With GenAI tools at your disposal, you can tackle projects that would have been impossible for a student team just a few years ago. Take advantage of this unique moment in history.

The best projects will:

  • Demonstrate technical sophistication and mastery
  • Provide novel insights or create genuinely useful systems
  • Show critical thinking about AI capabilities and limitations
  • Be executed with care, rigor, and attention to detail
  • Communicate results clearly and engagingly

We're excited to see what you create. Good luck, and remember: ambitious goals combined with thoughtful execution lead to extraordinary results.


Questions? Use the course Discord or attend office hours. We're here to help you succeed.

Ready to start? Form your team, start brainstorming, and prepare to build something amazing.

About

Final project from llm-course

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published