Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding embedding system #59

Merged
merged 6 commits into from
Jul 12, 2023
Merged

adding embedding system #59

merged 6 commits into from
Jul 12, 2023

Conversation

zimventures
Copy link
Contributor

Tackles #45

The embedding application is responsible for converting text into embeddings. The only provider that is currently implemented is OpenAI, but the hooks are in place to support others.

The three views that were added (list, delete, and create) all return JsonResponse object and do not render any HTML. The intention is that these views will be called from other applications.

Copy link
Contributor

@alex-nork alex-nork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Something I'm wondering is how we go from the query in a scan's rules to embeddings that we're using to do VSS on redis.

chirps/chirps/urls.py Show resolved Hide resolved
chirps/embedding/providers/openai.py Show resolved Hide resolved
@zimventures
Copy link
Contributor Author

Looks good. Something I'm wondering is how we go from the query in a scan's rules to embeddings that we're using to do VSS on redis.

Good question:

The idea is that targets which need to generate embeddings (Redis & Pinecone) will convert the incoming text to embeddings by calling the new embedding_create route directly. Usually when calling views directly, you're doing it from another view. Because we're doing it from a task, things get a little.... weird. While we do have the user object, constructing an HttpRequest object isn't very straightforward (you need the users session, and other stuff that is generally set by the middleware).

In order to get around from needing all that stuff, I'm going to refactor out all of the logic in embedding.views:create() into a library function that can be called directly by the task AND the view.

Example:

from embedding.lib import create_embedding

# Generate embedding from within the provider
embedding_data = create_embedding(text, model, service, user)

@zimventures zimventures merged commit 2b6d7e9 into main Jul 12, 2023
2 checks passed
@zimventures zimventures deleted the zim/45 branch August 14, 2023 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants