Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to execute a python script using FastEmbed #315

Open
NimaAra opened this issue Nov 23, 2024 · 10 comments
Open

Fails to execute a python script using FastEmbed #315

NimaAra opened this issue Nov 23, 2024 · 10 comments

Comments

@NimaAra
Copy link

NimaAra commented Nov 23, 2024

I am trying to execute the following module using python 3.12.4 on .NET 8:

from fastembed import TextEmbedding

def generate_embeddings(documents: list[str]) -> list:
    embedding_model = TextEmbedding() #1
    embeddings_list = list(embedding_model.embed(documents)) #2
    
    return [
        [1.23, 4.56, 7.89],
        [9.87, 6.54, 3.21],
    ]

I also have the following integration:

public sealed class FastEmbedIntegration : IDisposable
{
	private readonly PyObject _module;

	private readonly ILogger logger;

	internal FastEmbedIntegration(IPythonEnvironment env)
	{
		this.logger = env.Logger;
		using (GIL.Acquire())
		{
			logger.LogInformation("Importing module {ModuleName}", "fast_embed");
			_module = Import.ImportModule("fast_embed");
		}
	}

	public void Dispose() => _module.Dispose();

	public IReadOnlyList<IReadOnlyList<double>> Generate(IReadOnlyList<string> items)
	{
		using (GIL.Acquire())
		{
			using PyObject __underlyingPythonFunc = this._module.GetAttr("generate_embeddings");
			using PyObject itemsArg = PyObject.From(items);
			using PyObject __result_pyObject = __underlyingPythonFunc.Call(itemsArg);
			return __result_pyObject.As<IReadOnlyList<IReadOnlyList<double>>>();
		}
	}
}

Which I call using:

using FastEmbedIntegration embeds = new(pythonEnv);
List<string> entries = new();
entries.Add("hello there");
IReadOnlyList<IReadOnlyList<double>> vectors = embeds.Generate(entries);

Which throws:

image

If I comment the lines #1 and #2 then I correctly receive the arrays.

@tonybaloney
Copy link
Owner

Did you try with the source generator as well?

@NimaAra
Copy link
Author

NimaAra commented Nov 24, 2024

Yeah, In Visual Studio it works just fine both using SourceGen or if I take the SourceGen code and set it up explicitly. The same code does not work on LINQPad. I cannot think of why the same generated code works in VS but not LINQPad, FYI the following works fine in LINQPad using your test_basic example; I am not sure why the python using FastEmbed does not.

public sealed class ExampleDirectIntegration : IDisposable
{
	private readonly PyObject module;

	private readonly ILogger<IPythonEnvironment> logger;

	internal ExampleDirectIntegration(IPythonEnvironment env)
	{
		this.logger = env.Logger;
		using (GIL.Acquire())
		{
			logger.LogInformation("Importing module {ModuleName}", "test_basic");
			module = Import.ImportModule("test_basic");
		}
	}

	public void Dispose()
	{
		logger.LogInformation("Disposing module");
		module.Dispose();
	}

	public double TestIntFloat(long a, double b)
	{
		using (GIL.Acquire())
		{
			logger.LogInformation("Invoking Python function: {FunctionName}", "test_int_float");
			using var __underlyingPythonFunc = this.module.GetAttr("test_int_float");
			using PyObject a_pyObject = PyObject.From(a);
			using PyObject b_pyObject = PyObject.From(b);
			using var __result_pyObject = __underlyingPythonFunc.Call(a_pyObject, b_pyObject);
			return __result_pyObject.As<double>();
		}
	}
}

Great project btw, very much appreciated.

@tonybaloney
Copy link
Owner

I'll spin up a project and see what's happening. I've never used LINQPad before though!

Please comment with your thoughts on this issue too--
#314 I'm looking for feedback on what isn't obvious when you're starting out.

@tonybaloney
Copy link
Owner

Just a side note. The return type of your function could be more specific:

def generate_embeddings(documents: list[str]) -> list:

I would specify:

def generate_embeddings(documents: list[str]) -> list[list[float]]:

If they are lists. If they are sequences but not necessarily lists, you can use the typing.Sequence. See ref

@tonybaloney
Copy link
Owner

I added your code to the Simple Demo project and it worked as expected

image

16c3eff

I can't see anything obvious?

When you get the exception, look at the InnerException because it contains the Python stack trace as a property, that will give you more info

@NimaAra
Copy link
Author

NimaAra commented Nov 24, 2024

Yeah as I mentioned, this works in Visual Studio but not in LINQPad. I am getting this exception:

image

This is the integration I am using:

public sealed class FastEmbedIntegration : IDisposable
{
	private readonly PyObject module;

	private readonly ILogger logger;

	internal FastEmbedIntegration(ILogger<IPythonEnvironment> logger)
	{
		this.logger = logger;
		using (GIL.Acquire())
		{
			logger.LogInformation("Importing module {ModuleName}", "fast_embed");
			module = Import.ImportModule("fast_embed");
		}
	}

	public void Dispose()
	{
		logger.LogInformation("Disposing module");
		module.Dispose();
	}


	public IReadOnlyList<IReadOnlyList<double>> GenerateEmbeddings(IReadOnlyList<string> items)
	{
		using (GIL.Acquire())
		{
			logger.LogInformation("Invoking Python function: {FunctionName}", "generate_embeddings");
			using PyObject __underlyingPythonFunc = this.module.GetAttr("generate_embeddings");
			using PyObject a_pyObject = PyObject.From(items);			
			using PyObject __result_pyObject = __underlyingPythonFunc.Call(a_pyObject);
			return __result_pyObject.As<IReadOnlyList<IReadOnlyList<double>>>();
		}
	}
}

And this is the python:

from fastembed import TextEmbedding

embedding_model = TextEmbedding()

def generate_embeddings(documents: list[str]) -> list[list[float]]:
    embeddings_list = list(embedding_model.embed(documents))    
    return embeddings_list

   
documents = [
    "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
    "fastembed is supported by and maintained by Qdrant.",
]

embeddings = generate_embeddings(documents)
print(embeddings)

@NimaAra
Copy link
Author

NimaAra commented Nov 24, 2024

Also a side question if I may. Are classes supported? e.g.

from fastembed import TextEmbedding

class EmbeddingService:
    """
    Service class for generating embeddings using fastembed.
    Ensures the embedding model is initialized only once.
    """
    def __init__(self):
        self.embedding_model = TextEmbedding()

    def generate_embeddings(self, documents: list[str]) -> list[list[float]]:
        """
        Generate embeddings for a list of documents.

        Args:
            documents (list[str]): List of strings to embed.

        Returns:
            list[list[float]]: A list of embeddings, where each embedding is a list of floats.
        """
        embeddings_list = list(self.embedding_model.embed(documents))
        return embeddings_list

service = EmbeddingService()
documents = [
    "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
    "fastembed is supported by and maintained by Qdrant.",
]

embeddings = service.generate_embeddings(documents)
print(embeddings)

@tonybaloney
Copy link
Owner

Calling a method on a class instance? Or casting a Python class back to a CLR type?

@NimaAra
Copy link
Author

NimaAra commented Nov 24, 2024

Whichever I can get working I guess. Running CodeGen on the above class based python resulted in the following integration:

public static class FastEmbedClassExtensions
{
    private static IFastEmbedClass? instance;

    private static ReadOnlySpan<byte> HotReloadHash => "fffa817c9a4729f2c6a7536b2b15f1d4"u8;

    public static IFastEmbedClass FastEmbedClass(this IPythonEnvironment env)
    {
        if (instance is null)
        {
            instance = new FastEmbedClassInternal(env.Logger);
        }
        Debug.Assert(!env.IsDisposed());
        return instance;
    }

    public static void UpdateApplication(Type[]? updatedTypes) {
        instance?.ReloadModule();
    }

    private class FastEmbedClassInternal : IFastEmbedClass
    {
        private PyObject module;

        private readonly ILogger<IPythonEnvironment> logger;
        

        internal FastEmbedClassInternal(ILogger<IPythonEnvironment> logger)
        {
            this.logger = logger;
            using (GIL.Acquire())
            {
                logger.LogDebug("Importing module {ModuleName}", "fast_embed_class");
                module = Import.ImportModule("fast_embed_class");
                
            }
        }

        void IReloadableModuleImport.ReloadModule() {
            logger.LogDebug("Reloading module {ModuleName}", "fast_embed_class");
            using (GIL.Acquire())
            {
                Import.ReloadModule(ref module);
                // Dispose old functions
                
                
            }
        }

        public void Dispose()
        {
            logger.LogDebug("Disposing module {ModuleName}", "fast_embed_class");
            
            module.Dispose();
        }

        
    }
}

@tonybaloney
Copy link
Owner

we don't do any codegen for classes (yet) so what you'd need to do is:

  1. Load the module
  2. Get the EmbeddingService type from the module
  3. Call it (run __init__)
  4. Run GetAttr("method") to get the methods on the instance and call those

It would look roughly like this:

public IReadOnlyList<IReadOnlyList<double>> GenerateEmbeddings(IReadOnlyList<string> items)
	{
		using (GIL.Acquire())
		{
			logger.LogInformation("Invoking Python function: {FunctionName}", "EmbeddingService.generate_embeddings");
                        // Get Class
			using PyObject __underlyingPythonType = this.module.GetAttr("EmbeddingService");
                        // Run __init__ (provide args if required. Call takes param[] PyObject)
			using PyObject __instance= __underlyingPythonType.Call();
                        // Cast the param
			using PyObject a_pyObject = PyObject.From(items);
                        // Get the method
                        using PyObject __underlyingPythonMethod = __instance.GetAttr("generate_embeddings");
			using PyObject __result_pyObject = __underlyingPythonMethod.Call(a_pyObject);
			return __result_pyObject.As<IReadOnlyList<IReadOnlyList<double>>>();
		}
	}

For improved performance, I would store the __underlyingPythonType as a property of the class (similar to how the code gen does it for functions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants