🎯 Vector Embeddings

📌 Overview

This project integrates ChromaDB, Google Gemini AI, and YouTube Transcript API to handle vector embeddings, text processing, and CRUD operations. It supports semantic search, text embeddings, and AI-generated keynotes from video transcripts.

🚀 Features

ChromaDB integration for vector storage
Google Gemini AI for text embeddings and summarization
YouTube Transcript API to fetch video transcripts
CRUD operations (Create, Read, Update, Delete) on embeddings
Persistent and ephemeral database modes using chromadb
File handling for storing transcripts and notes

🛠️ Setup & Installation

1️⃣ Install Dependencies

Ensure you have Python installed, then run:

pip install -r requirements.txt

2️⃣ Set Up Environment Variables

Create a .env file with:

GOOGLE_API_KEY=your_gemini_api_key

3️⃣ Run the Project

python main.py

📝 Key Functionalities

🔹 ChromaDB Client

Defined in utils.py, the get_client() function initializes a ChromaDB client:

def get_client(client_type=None, path=None):
    if client_type.lower() != 'persistent' and path is None:
        return chromadb.EphemeralClient()
    return chromadb.PersistentClient(path=path)

Persistent Mode: Saves embeddings for future retrieval.
Ephemeral Mode: Stores data only for the current session.

🔹 Creating & Managing Collections

Collections store vector embeddings and text data.

def get_or_create_collection(client, name='my_collection', embedding_function=None, data_loader=None):
    return client.get_or_create_collection(
        name=name,
        embedding_function=embedding_function,
        data_loader=data_loader
    )

🔹 Generating AI-Based Keynotes

Extracts keynotes from a transcript using Gemini AI:

response = genai_model.generate_content(prompt + transcript, stream=False)
with open(notes_path, "w") as file:
    file.write(response.text)

📌 Usage Guide

1️⃣ Fetch and Store Video Transcripts

transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en', 'en-US'])
formatted_transcript = TextFormatter().format_transcript(transcript)

2️⃣ Generate and Store Keynotes

response = genai_model.generate_content("Extract keynotes: " + transcript)
collection.upsert(ids=[video_id], documents=[response.text])

3️⃣ Perform CRUD Operations

Insert data: collection.upsert()
Retrieve data: collection.get()
Update existing data: collection.update()
Delete data: collection.delete()

🔗 Additional Information

ChromaDB Docs: https://github.com/chroma-core/chroma
Google Gemini AI: https://ai.google.dev/
YouTube Transcript API: https://pypi.org/project/youtube-transcript-api/

📜 License

This project is licensed under the GNU GENERAL PUBLIC LICENSE - see the LICENSE file for details.

📬 Contact

For any inquiries or contributions, please feel free to reach out.

GitHub Profile: kivanc57
Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
source		source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 Vector Embeddings

📌 Overview

🚀 Features

🛠️ Setup & Installation

1️⃣ Install Dependencies

2️⃣ Set Up Environment Variables

3️⃣ Run the Project

📝 Key Functionalities

🔹 ChromaDB Client

🔹 Creating & Managing Collections

🔹 Generating AI-Based Keynotes

📌 Usage Guide

1️⃣ Fetch and Store Video Transcripts

2️⃣ Generate and Store Keynotes

3️⃣ Perform CRUD Operations

🔗 Additional Information

📜 License

📬 Contact

About

Languages

License

kivanc57/vector_embeddings

Folders and files

Latest commit

History

Repository files navigation

🎯 Vector Embeddings

📌 Overview

🚀 Features

🛠️ Setup & Installation

1️⃣ Install Dependencies

2️⃣ Set Up Environment Variables

3️⃣ Run the Project

📝 Key Functionalities

🔹 ChromaDB Client

🔹 Creating & Managing Collections

🔹 Generating AI-Based Keynotes

📌 Usage Guide

1️⃣ Fetch and Store Video Transcripts

2️⃣ Generate and Store Keynotes

3️⃣ Perform CRUD Operations

🔗 Additional Information

📜 License

📬 Contact

About

Resources

License

Stars

Watchers

Forks

Languages