-
-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async Stream Genenerator? #604
Comments
I am using the stream generator mainly because of the input_embedding implementation when starting a stream. If there is a way to recreate something like that with an asynchronous dynamic generator, that would be helpful too. This is what i have for the Embedding in the StreamingGenerator and it works fine: self.embedding_layer = next(m for m in model.modules if isinstance(m, ExLlamaV2Embedding))
self.emb = self.prepair_embedding_tensor(e_str)
def prepair_embedding_tensor(self,s:str) -> torch.tensor:
s_tokens = self.tokenizer.encode(
s,
add_bos=True,
encode_special_tokens=True
)
print("s_tokens Shape:", s_tokens.shape)
embedding_tensor = self.embedding_layer.forward(hidden_states=s_tokens)
print("Embedding Tensor Shape:", embedding_tensor.shape)
return embedding_tensor
self.generator.begin_stream_ex(a_ctx, sampler_settings, input_embeddings=self.emb) If something similar is possible with the async dynamic generator, please help. |
The regular streaming generator doesn't support batching, let alone continuous batching like you'd want here. It would be much simpler to add indexed embeddings to the dynamic generator, which I suppose I do want to get to. The main challenge is that the indices would need to persist and be unique over the lifetime of the model to not cause errors with caching. E.g. if a context starts with So I need to cook up some kind of system for creating indices and managing the lifetime of the corresponding embeddings. |
Thank you for your response! I'm using the dynamic generator for now, so I can implement the embeddings with it when the time comes. I wish you success with this implementation and once again, thank you! |
Using Stream Generator Like Async Dynamic Generator for Concurrent Text Generation
Description:
Hello,
I have been working with the stream generator and I am wondering if there is a way to use it in a manner similar to the Async Dynamic generator to generate two texts simultaneously using the same model.
I've attempted the following approach with the example code, however the generation process keeps waiting for the completion of the first text before starting the next one:
My Question:
Any help or advice would be greatly appreciated!
Thank you for your time and assistance!
The text was updated successfully, but these errors were encountered: