Skip to content

How to teach new things to my AI. A small workshop using Jupyter Notebooks, DuckDB and Python.

Notifications You must be signed in to change notification settings

jackbravo/embeddings-workshop

Repository files navigation

How to teach new things to your AI

A hands-on workshop exploring how to work with text embeddings for search and retrieval, using modern Python tools and libraries.

Companion to the talk "How to teach new things to your AI".

Overview

This workshop teaches the fundamentals of working with text embeddings through a practical Jupyter notebook that guides participants through:

  • Text extraction from PDFs
  • Semantic text chunking
  • Creating and working with embeddings
  • Vector similarity search
  • Reranking search results
  • Building a simple RAG (Retrieval Augmented Generation) system

Prerequisites

  • Python 3.12
  • Basic familiarity with Python and Jupyter notebooks
  • Understanding of basic NLP concepts
  • A text editor (VS Code recommended)

Setup

  1. Install Python 3.12 using a version manager like:

  2. Clone this repository and navigate to the project directory:

git clone [repository-url]
cd [repository-name]
  1. Create and activate a virtual environment:
uv venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows
  1. Install dependencies:
uv pip install -r requirements.txt

Getting Started

  1. Launch Jupyter Notebook:
jupyter notebook
  1. Open embeddings.ipynb and follow along with the tutorial.

What You'll Learn

  • How to extract and process text from PDF documents
  • Techniques for semantic text chunking
  • Creating and working with text embeddings
  • Implementing vector similarity search using DuckDB
  • Using rerankers to improve search results
  • Building a simple question-answering system

Additional Resources

About

How to teach new things to my AI. A small workshop using Jupyter Notebooks, DuckDB and Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published