This project integrates ChromaDB, Google Gemini AI, and YouTube Transcript API to handle vector embeddings, text processing, and CRUD operations. It supports semantic search, text embeddings, and AI-generated keynotes from video transcripts.
- ChromaDB integration for vector storage
- Google Gemini AI for text embeddings and summarization
- YouTube Transcript API to fetch video transcripts
- CRUD operations (Create, Read, Update, Delete) on embeddings
- Persistent and ephemeral database modes using
chromadb
- File handling for storing transcripts and notes
Ensure you have Python installed, then run:
pip install -r requirements.txt
Create a .env
file with:
GOOGLE_API_KEY=your_gemini_api_key
python main.py
Defined in utils.py
, the get_client()
function initializes a ChromaDB client:
def get_client(client_type=None, path=None):
if client_type.lower() != 'persistent' and path is None:
return chromadb.EphemeralClient()
return chromadb.PersistentClient(path=path)
- Persistent Mode: Saves embeddings for future retrieval.
- Ephemeral Mode: Stores data only for the current session.
Collections store vector embeddings and text data.
def get_or_create_collection(client, name='my_collection', embedding_function=None, data_loader=None):
return client.get_or_create_collection(
name=name,
embedding_function=embedding_function,
data_loader=data_loader
)
Extracts keynotes from a transcript using Gemini AI:
response = genai_model.generate_content(prompt + transcript, stream=False)
with open(notes_path, "w") as file:
file.write(response.text)
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en', 'en-US'])
formatted_transcript = TextFormatter().format_transcript(transcript)
response = genai_model.generate_content("Extract keynotes: " + transcript)
collection.upsert(ids=[video_id], documents=[response.text])
- Insert data:
collection.upsert()
- Retrieve data:
collection.get()
- Update existing data:
collection.update()
- Delete data:
collection.delete()
- ChromaDB Docs: https://github.com/chroma-core/chroma
- Google Gemini AI: https://ai.google.dev/
- YouTube Transcript API: https://pypi.org/project/youtube-transcript-api/
This project is licensed under the GNU GENERAL PUBLIC LICENSE - see the LICENSE file for details.
For any inquiries or contributions, please feel free to reach out.
- GitHub Profile: kivanc57
- Email: [email protected]