Skip to content

langwatch/cookbooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Evaluations Cookbooks

A collection of practical notebooks demonstrating systematic approaches to building, evaluating, and improving AI applications. These cookbooks provide hands-on guidance for developing more effective AI systems through data-driven methodologies.

Overview

Building effective AI applications requires more than just connecting to the latest LLM API. This repository provides structured approaches to developing systems that are reliable, efficient, and continuously improving. Each notebook in this collection focuses on a specific technique and walks through a methodical process for:

  1. Establishing evaluation frameworks - Creating robust metrics to measure performance
  2. Systematic improvement - Using data-driven approaches to enhance capabilities
  3. Performance visualization - Tracking improvements and identifying bottlenecks

Notebooks

A step-by-step guide to building and improving a Retrieval-Augmented Generation (RAG) application. This notebook covers:

  • Implementing effective retrieval strategies
  • Evaluating RAG performance with meaningful metrics
  • Systematically improving retrieval and generation quality

Learn how to fine-tune embedding models to significantly improve retrieval performance. This notebook covers:

  • Fine-tuning open-source embedding models using triplet loss
  • Evaluating and visualizing performance improvements
  • Applying techniques from industry case studies (like Ramp's transaction categorization)

Explore how to enhance retrieval performance by implementing metadata filtering in RAG applications. This notebook covers:

  • Implementing both semantic search and metadata-filtered search approaches
  • Evaluating and comparing approaches using industry-standard metrics
  • Drawing data-driven insights to optimize your own retrieval systems

Learn how to measure and improve tool calling capabilities in AI assistants using precision and recall metrics. This notebook covers:

  • Creating a framework for evaluating tool selection decisions
  • Analyzing per-tool performance to identify specific improvement areas
  • Systematically enhancing multi-tool coordination for complex tasks

Getting Started

  1. Clone this repository
  2. Install the required dependencies: pip install -r requirements.txt
  3. Open the notebooks in Jupyter or your preferred notebook environment
  4. Follow along with the step-by-step instructions

Contributing

Contributions are welcome! If you have ideas for new notebooks or improvements to existing ones, please open an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

example projects that use langwatch's features.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •