Azure Cognitive Search - Generate Embeddings

DISCLAIMER: This app is not for production use and is strictly a sample for how customers can generate embeddings using OpenAI models as inputs for machine learning models or search applications

This repository is built with code samples in Python, Javascript, and the OpenAI API to generate query and document embeddings. The embeddings are created using the OpenAI text-embedding-ada-002, and the resulting embeddings are saved in a JSON file for each input data. The goal of this project is to provide a simple and efficient method for generating embeddings, which can be used for a variety of natural language processing tasks, such as semantic search, clustering, and classification.

Features

Generate embeddings from your documents after they have been pre-processed and enriched
One click export for docVectors and queryVectors

Getting Started

Prerequisites

Azure OpenAI or OpenAI service
Text data in JSON format
For javascript sample, you will need Node (I'm using version 16.x)
For Python sample, you will need Python (I'm using a jupyter notebook version 3.9)

Once you've met all the pre-requisites, you'll need to follow these steps:

Clone this repository to your local machine using git clone https://github.com/farzad528/embeddings-generator.git
Change into the newly created directory using cd [your directory here]
For javascript, install the required dependencies by running npm install or yarn install;

3.1 For javascript, simply run the nodejs file by using the node name-of-file.js command

3.2 For python, simply run the jupyter notebook or by using the python name-of-file.ipynb command

Configure variables 4.1 Create a .env file in the folder root with the following info:

 OPENAI_SERVICE_NAME=your-azure-openai-service-name
 DEPLOYMENT_NAME=text-embedding-ada-002
 OPENAI_API_VERSION=2022-12-01
 OPENAI_API_KEY=your-api-key

Once you generate the embeddings, navigate to the output folder and you will see the queryVector and docVector json files that you can use for vector retrieval

Conclusion

I hope you find this repository useful for generating embeddings for your search application. If you have any questions or suggestions, please feel free to open an issue and I'll be happy to help.

Contributing

If you'd like to contribute to this repository, please feel free to do so! There are many ways to contribute, such as fixing bugs, improving the documentation, or adding new features. To get started, simply fork this repository and make your changes. When you're ready, submit a pull request and I'll take a look.

Learn More

To learn more about OpenAI, take a look at the following resources:

Embeddings - learn about Embeddings.
Azure OpenAI - read the official Azure OpenAI documentation
Azure Cognitive Search - read the official Azure Cogntiive Search documentation

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
javascript		javascript
python		python
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure Cognitive Search - Generate Embeddings

Features

Getting Started

Prerequisites

Conclusion

Contributing

Learn More

About

Releases

Packages

Languages

ruoccofabrizio/embeddings-generator

Folders and files

Latest commit

History

Repository files navigation

Azure Cognitive Search - Generate Embeddings

Features

Getting Started

Prerequisites

Conclusion

Contributing

Learn More

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages