Skip to content

Latest commit

 

History

History
73 lines (54 loc) · 2.07 KB

README.md

File metadata and controls

73 lines (54 loc) · 2.07 KB

How to teach new things to your AI

A hands-on workshop exploring how to work with text embeddings for search and retrieval, using modern Python tools and libraries.

Companion to the talk "How to teach new things to your AI".

Overview

This workshop teaches the fundamentals of working with text embeddings through a practical Jupyter notebook that guides participants through:

  • Text extraction from PDFs
  • Semantic text chunking
  • Creating and working with embeddings
  • Vector similarity search
  • Reranking search results
  • Building a simple RAG (Retrieval Augmented Generation) system

Prerequisites

  • Python 3.12
  • Basic familiarity with Python and Jupyter notebooks
  • Understanding of basic NLP concepts
  • A text editor (VS Code recommended)

Setup

  1. Install Python 3.12 using a version manager like:

  2. Clone this repository and navigate to the project directory:

git clone [repository-url]
cd [repository-name]
  1. Create and activate a virtual environment:
uv venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows
  1. Install dependencies:
uv pip install -r requirements.txt

Getting Started

  1. Launch Jupyter Notebook:
jupyter notebook
  1. Open embeddings.ipynb and follow along with the tutorial.

What You'll Learn

  • How to extract and process text from PDF documents
  • Techniques for semantic text chunking
  • Creating and working with text embeddings
  • Implementing vector similarity search using DuckDB
  • Using rerankers to improve search results
  • Building a simple question-answering system

Additional Resources