feat(embeddings): add Matryoshka Embeddings node#5797
feat(embeddings): add Matryoshka Embeddings node#5797hztBUAA wants to merge 1 commit intoFlowiseAI:mainfrom
Conversation
…tion Add a new Matryoshka Embeddings node that wraps any existing embeddings model and truncates output vectors to a specified number of dimensions. This enables efficient storage of embeddings from models trained with the Matryoshka loss function, where the most significant information is concentrated in the first dimensions of the vector. Closes FlowiseAI#4361
Summary of ChangesHello @hztBUAA, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the platform's capability to handle embedding vectors by introducing a Matryoshka Embeddings node. This addition allows for dynamic truncation of embedding dimensions, which is crucial for optimizing performance and storage when working with Matryoshka-trained models. The implementation provides a flexible and integrated solution, ensuring that users can leverage this advanced embedding technique seamlessly within their existing workflows. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new Matryoshka Embeddings node, which wraps existing embedding models to truncate output vectors. The implementation is clean, well-structured, and includes a comprehensive set of unit tests for both the core logic and the node itself. I have two suggestions for improvement: one to enhance the robustness of the core MatryoshkaEmbeddings class by adding constructor validation, and another to make the input parsing in the node implementation stricter to ensure only integer dimensions are accepted.
| const dimensions = parseInt(dimensionsStr, 10) | ||
| if (isNaN(dimensions) || dimensions <= 0) { | ||
| throw new Error('Dimensions must be a positive integer') | ||
| } |
There was a problem hiding this comment.
The current implementation uses parseInt to convert the dimensions string to a number. This can lead to unexpected behavior for non-integer inputs, for example parseInt('128.9') will be silently converted to 128. It's better to enforce that the user provides an integer. Using Number() and Number.isInteger() provides stricter validation and rejects floating-point values, which aligns better with the requirement for an integer dimension count.
| const dimensions = parseInt(dimensionsStr, 10) | |
| if (isNaN(dimensions) || dimensions <= 0) { | |
| throw new Error('Dimensions must be a positive integer') | |
| } | |
| const dimensions = Number(dimensionsStr) | |
| if (!Number.isInteger(dimensions) || dimensions <= 0) { | |
| throw new Error('Dimensions must be a positive integer') | |
| } |
| constructor(params: MatryoshkaEmbeddingsParams) { | ||
| super(params) | ||
| this.embeddings = params.embeddings | ||
| this.dimensions = params.dimensions | ||
| } |
There was a problem hiding this comment.
The constructor for MatryoshkaEmbeddings should validate the dimensions parameter to ensure it's a positive integer. While the Flowise node that uses this class performs validation, the class itself is not robust against direct instantiation with invalid values (e.g., zero, negative numbers, or non-integers). This could lead to unexpected behavior, like vector.slice(0, -5) if a negative dimension is passed. Adding validation here makes the component more self-contained and safer to use in other contexts. You'll also need to add unit tests for these new validation checks in matryoshkaEmbeddings.test.ts.
constructor(params: MatryoshkaEmbeddingsParams) {
super(params)
if (!Number.isInteger(params.dimensions) || params.dimensions <= 0) {
throw new Error('Dimensions must be a positive integer.')
}
this.embeddings = params.embeddings
this.dimensions = params.dimensions
}|
Thanks for the review and feedback. I am following up on this PR now and will either push the requested changes or reply point-by-point shortly. |
|
Quick follow-up: I am reviewing the feedback and will update this PR shortly. |
Summary
How It Works
The node takes two inputs:
After the underlying model generates full-dimensional vectors, the wrapper truncates them by keeping only the first N dimensions via
vector.slice(0, dimensions).New Files
packages/components/src/matryoshkaEmbeddings.ts- CoreMatryoshkaEmbeddingswrapper class extending LangChain'sEmbeddingspackages/components/nodes/embeddings/MatryoshkaEmbedding/MatryoshkaEmbedding.ts- Flowise node implementationpackages/components/nodes/embeddings/MatryoshkaEmbedding/matryoshka.svg- Node iconTest Plan
MatryoshkaEmbeddingsclass (vector truncation, edge cases, delegation)init()method (validation, integration)Closes #4361