|
| 1 | +# OpenAI Embeddings Example |
| 2 | + |
| 3 | +This example demonstrates how to generate and use text embeddings with the OpenAI API through the Terraform provider for OpenAI. |
| 4 | + |
| 5 | +## What are embeddings? |
| 6 | + |
| 7 | +Embeddings are vector representations of text that capture their semantic meaning. They are useful for: |
| 8 | + |
| 9 | +- Semantic search |
| 10 | +- Similarity comparison between texts |
| 11 | +- Clustering and classification |
| 12 | +- Recommendation systems |
| 13 | +- And other natural language processing applications |
| 14 | + |
| 15 | +## Prerequisites |
| 16 | + |
| 17 | +1. Terraform installed |
| 18 | +2. An OpenAI API key |
| 19 | +3. The OpenAI provider installed in `~/.terraform.d/plugins/` |
| 20 | + |
| 21 | +## Configuration |
| 22 | + |
| 23 | +1. Make sure you have the OpenAI provider correctly installed: |
| 24 | + ``` |
| 25 | + mkdir -p ~/.terraform.d/plugins/registry.terraform.io/fjcorp/openai/1.0.0/darwin_arm64 |
| 26 | + cp ~/path/to/binary/terraform-provider-openai ~/.terraform.d/plugins/registry.terraform.io/fjcorp/openai/1.0.0/darwin_arm64/ |
| 27 | + ``` |
| 28 | + |
| 29 | +2. Configure the necessary environment variables: |
| 30 | + ``` |
| 31 | + export OPENAI_API_KEY="your-api-key" |
| 32 | + # If you belong to an organization: |
| 33 | + export OPENAI_ORGANIZATION_ID="your-organization-id" |
| 34 | + ``` |
| 35 | + |
| 36 | +## Usage |
| 37 | + |
| 38 | +This example includes: |
| 39 | + |
| 40 | +1. **Basic Embedding**: Embedding generation for a single text |
| 41 | +2. **Base64 Format Embedding**: Example of using an alternative format |
| 42 | +3. **Multiple Embeddings**: Generating embeddings for multiple texts in a single request |
| 43 | +4. **Embeddings with Custom Dimensions**: Example of using newer models with specific dimensions |
| 44 | + |
| 45 | +To run the example: |
| 46 | + |
| 47 | +``` |
| 48 | +terraform init |
| 49 | +terraform apply |
| 50 | +``` |
| 51 | + |
| 52 | +## Understanding the code |
| 53 | + |
| 54 | +The `main.tf` file demonstrates: |
| 55 | + |
| 56 | +- How to configure the OpenAI provider |
| 57 | +- How to use the embeddings module for different use cases |
| 58 | +- How to work with different parameters (model, format, dimensions) |
| 59 | +- How to handle multiple texts in a single request |
| 60 | + |
| 61 | +## Important notes |
| 62 | + |
| 63 | +- The generated embeddings can be large, so they are not shown directly in the Terraform output |
| 64 | +- The `text-embedding-ada-002` model has a limit of 8192 input tokens |
| 65 | +- The total number of embeddings is limited per request and per model |
| 66 | +- For newer models like `text-embedding-3-small`, you can specify the number of dimensions of the resulting vector |
| 67 | + |
| 68 | +## API and Provider Limitations |
| 69 | + |
| 70 | +**Important**: The OpenAI API does not currently provide a way to list or retrieve existing embeddings. As a result, this provider only supports creating embeddings as a resource (`openai_embedding`) and does not include a data source for retrieving previously created embeddings. |
| 71 | + |
| 72 | +### Import Limitations |
| 73 | + |
| 74 | +When importing existing embeddings, you'll face the following limitations: |
| 75 | + |
| 76 | +1. **Partial Resource State**: Only basic metadata is imported (ID, created date, etc.), but the actual embedding vectors are not available |
| 77 | +2. **No Retrieval API**: The OpenAI API has no endpoint to retrieve previously created embeddings, so the import process cannot fetch the original vector data |
| 78 | +3. **Resource Replacement**: After import, applying the configuration will replace the imported resource with a newly created one |
| 79 | + |
| 80 | +### Import Workaround |
| 81 | + |
| 82 | +This module handles imports by: |
| 83 | +1. Using simulated embeddings rather than the actual vectors (which can't be retrieved) |
| 84 | +2. Providing a fault-tolerant structure that works with both new and imported resources |
| 85 | +3. Accepting that imports are primarily for tracking existing resources, not for retrieving the actual embedding vectors |
| 86 | + |
| 87 | +To import an existing embedding resource: |
| 88 | + |
| 89 | +```bash |
| 90 | +terraform import module.my_embedding.openai_chat_completion.embedding_simulation chatcmpl-XXXXXXXXXXXXXXXXXXXX |
| 91 | +``` |
| 92 | + |
| 93 | +After import, a subsequent `terraform apply` will replace the imported resource with a newly created one, since the original embedding vectors cannot be retrieved from the API. |
| 94 | + |
| 95 | +The provider's implementation supports all the official OpenAI API parameters for embeddings: |
| 96 | +- `input`: Required - The text to embed (string or array of strings) |
| 97 | +- `model`: Required - ID of the model to use (e.g., "text-embedding-ada-002") |
| 98 | +- `dimensions`: Optional - The number of dimensions for the embeddings (only for text-embedding-3 and later models) |
| 99 | +- `encoding_format`: Optional - Format for the embeddings, either "float" (default) or "base64" |
| 100 | +- `user`: Optional - A unique identifier representing your end-user |
| 101 | + |
| 102 | +Unlike other OpenAI resources, embeddings cannot be retrieved after creation, so store the results as needed in your application. |
| 103 | + |
| 104 | +## Example of use in real applications |
| 105 | + |
| 106 | +The generated embeddings can be exported and used in: |
| 107 | + |
| 108 | +- Vector databases like Pinecone, Milvus, or Weaviate |
| 109 | +- Semantic search systems |
| 110 | +- Sentiment analysis and text classification |
| 111 | +- Content similarity or duplication detection |
0 commit comments