GitHub - aws-samples/fine-tune-embedding-models-on-sagemaker: This repository contains samples for fine-tuning embedding models using Amazon SageMaker. Embedding models are useful for tasks such as semantic similarity, text clustering, and information retrieval. Fine-tuning these models on your specific domain data can greatly improve their performance.

Fine-Tuning Embedding Models on Amazon SageMaker

This repository contains samples for fine-tuning embedding models using Amazon SageMaker.
Embedding models are useful for tasks such as semantic similarity, text clustering, and information retrieval.
By fine-tuning embedding model on data that is representative of the target domain or task, the model can learn to capture the relevant semantics, jargon, and contextual relationships that are crucial for that domain.
Domain-specific embeddings can significantly improve the quality of vector representations, leading to more accurate retrieval of relevant context from the vector database. This, in turn, enhances the performance of the RAG system in terms of generating more accurate and relevant responses.

sentence-transformer/multiple-negatives-ranking-loss/: This directory contains a Jupyter notebook demonstrating how to fine-tune a sentence-transfomer embedding model using the Multiple Negatives Ranking Loss function which is recommended when in your training data you only have positive pairs, for example, only pairs of similar texts like pairs of paraphrases, pairs of duplicate questions, pairs of (query, response), or pairs of (source_language, target_language).
We are using the Multiple Negatives Ranking Loss function because we are utilizing Bedrock FAQ as the training data, which consists of pairs of questions and answers.
The code in this directory is used in the AWS blog post Improve RAG accuracy with finetuned embedding models on Sagemaker

Security

We welcome contributions from the community! If you have an example or sample for fine-tuning embedding models on SageMaker, please feel free to submit a pull request. Your contribution will help others in their journey of fine-tuning embedding models.

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
sentence-transformer/multiple-negatives-ranking-loss		sentence-transformer/multiple-negatives-ranking-loss
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-Tuning Embedding Models on Amazon SageMaker

Contents

Security

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

aws-samples/fine-tune-embedding-models-on-sagemaker

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning Embedding Models on Amazon SageMaker

Contents

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages