Skip to content

feat(embeddings): add Matryoshka Embeddings node#5797

Open
hztBUAA wants to merge 1 commit intoFlowiseAI:mainfrom
hztBUAA:feat/matryoshka-embeddings
Open

feat(embeddings): add Matryoshka Embeddings node#5797
hztBUAA wants to merge 1 commit intoFlowiseAI:mainfrom
hztBUAA:feat/matryoshka-embeddings

Conversation

@hztBUAA
Copy link

@hztBUAA hztBUAA commented Feb 20, 2026

Summary

  • Adds a new Matryoshka Embeddings node that wraps any existing embeddings model and truncates output vectors to a user-specified number of dimensions
  • Enables efficient storage and retrieval with embedding models trained using the Matryoshka loss function, where the most significant information is concentrated in the first dimensions of the vector
  • Implemented as a composable wrapper node: users connect any Embeddings node as input and specify the target dimension count, with no changes needed to existing embedding or vector store nodes

How It Works

The node takes two inputs:

  1. Embeddings - Any existing embeddings node (OpenAI, Cohere, HuggingFace, Ollama, etc.)
  2. Dimensions - The target number of dimensions to truncate to

After the underlying model generates full-dimensional vectors, the wrapper truncates them by keeping only the first N dimensions via vector.slice(0, dimensions).

New Files

  • packages/components/src/matryoshkaEmbeddings.ts - Core MatryoshkaEmbeddings wrapper class extending LangChain's Embeddings
  • packages/components/nodes/embeddings/MatryoshkaEmbedding/MatryoshkaEmbedding.ts - Flowise node implementation
  • packages/components/nodes/embeddings/MatryoshkaEmbedding/matryoshka.svg - Node icon

Test Plan

  • Unit tests for MatryoshkaEmbeddings class (vector truncation, edge cases, delegation)
  • Unit tests for the node's init() method (validation, integration)
  • All 16 tests passing
PASS nodes/embeddings/MatryoshkaEmbedding/MatryoshkaEmbedding.test.ts
PASS src/matryoshkaEmbeddings.test.ts
Test Suites: 2 passed, 2 total
Tests:       16 passed, 16 total

Closes #4361

…tion

Add a new Matryoshka Embeddings node that wraps any existing embeddings model
and truncates output vectors to a specified number of dimensions. This enables
efficient storage of embeddings from models trained with the Matryoshka loss
function, where the most significant information is concentrated in the first
dimensions of the vector.

Closes FlowiseAI#4361
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hztBUAA, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the platform's capability to handle embedding vectors by introducing a Matryoshka Embeddings node. This addition allows for dynamic truncation of embedding dimensions, which is crucial for optimizing performance and storage when working with Matryoshka-trained models. The implementation provides a flexible and integrated solution, ensuring that users can leverage this advanced embedding technique seamlessly within their existing workflows.

Highlights

  • New Matryoshka Embeddings Node: A new 'Matryoshka Embeddings' node has been introduced, designed to wrap any existing embeddings model and truncate output vectors to a user-specified number of dimensions.
  • Efficient Storage and Retrieval: This feature enables more efficient storage and retrieval with embedding models trained using the Matryoshka loss function, where critical information is concentrated in the initial dimensions of the vector.
  • Composable Wrapper Implementation: The node is implemented as a composable wrapper, allowing users to connect any existing Embeddings node as input and specify the target dimension count without requiring changes to other embedding or vector store nodes.
Changelog
  • packages/components/nodes/embeddings/MatryoshkaEmbedding/MatryoshkaEmbedding.test.ts
    • Added comprehensive unit tests for the Matryoshka Embedding Flowise node, covering metadata, input validation, and truncation logic.
  • packages/components/nodes/embeddings/MatryoshkaEmbedding/MatryoshkaEmbedding.ts
    • Implemented the Flowise node for Matryoshka Embeddings, defining its structure, inputs (embeddings and dimensions), and initialization logic with validation.
  • packages/components/nodes/embeddings/MatryoshkaEmbedding/matryoshka.svg
    • Added a custom SVG icon for the new Matryoshka Embeddings node.
  • packages/components/src/matryoshkaEmbeddings.test.ts
    • Added unit tests for the core MatryoshkaEmbeddings class, verifying vector truncation for both single queries and document lists, and ensuring proper delegation to the underlying embeddings.
  • packages/components/src/matryoshkaEmbeddings.ts
    • Implemented the core MatryoshkaEmbeddings wrapper class, extending LangChain's Embeddings and providing the logic to truncate embedding vectors.
Activity
  • Unit tests for both the MatryoshkaEmbeddings class and the Flowise node's init() method were added and passed successfully, with all 16 tests completing without errors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Matryoshka Embeddings node, which wraps existing embedding models to truncate output vectors. The implementation is clean, well-structured, and includes a comprehensive set of unit tests for both the core logic and the node itself. I have two suggestions for improvement: one to enhance the robustness of the core MatryoshkaEmbeddings class by adding constructor validation, and another to make the input parsing in the node implementation stricter to ensure only integer dimensions are accepted.

Comment on lines +55 to +58
const dimensions = parseInt(dimensionsStr, 10)
if (isNaN(dimensions) || dimensions <= 0) {
throw new Error('Dimensions must be a positive integer')
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation uses parseInt to convert the dimensions string to a number. This can lead to unexpected behavior for non-integer inputs, for example parseInt('128.9') will be silently converted to 128. It's better to enforce that the user provides an integer. Using Number() and Number.isInteger() provides stricter validation and rejects floating-point values, which aligns better with the requirement for an integer dimension count.

Suggested change
const dimensions = parseInt(dimensionsStr, 10)
if (isNaN(dimensions) || dimensions <= 0) {
throw new Error('Dimensions must be a positive integer')
}
const dimensions = Number(dimensionsStr)
if (!Number.isInteger(dimensions) || dimensions <= 0) {
throw new Error('Dimensions must be a positive integer')
}

Comment on lines +31 to +35
constructor(params: MatryoshkaEmbeddingsParams) {
super(params)
this.embeddings = params.embeddings
this.dimensions = params.dimensions
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The constructor for MatryoshkaEmbeddings should validate the dimensions parameter to ensure it's a positive integer. While the Flowise node that uses this class performs validation, the class itself is not robust against direct instantiation with invalid values (e.g., zero, negative numbers, or non-integers). This could lead to unexpected behavior, like vector.slice(0, -5) if a negative dimension is passed. Adding validation here makes the component more self-contained and safer to use in other contexts. You'll also need to add unit tests for these new validation checks in matryoshkaEmbeddings.test.ts.

    constructor(params: MatryoshkaEmbeddingsParams) {
        super(params)
        if (!Number.isInteger(params.dimensions) || params.dimensions <= 0) {
            throw new Error('Dimensions must be a positive integer.')
        }
        this.embeddings = params.embeddings
        this.dimensions = params.dimensions
    }

@hztBUAA
Copy link
Author

hztBUAA commented Feb 25, 2026

Thanks for the review and feedback. I am following up on this PR now and will either push the requested changes or reply point-by-point shortly.

@hztBUAA
Copy link
Author

hztBUAA commented Feb 25, 2026

Quick follow-up: I am reviewing the feedback and will update this PR shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Allow using Matryoshka Embeddings

1 participant