[Feature] Use local tokenizer #37

gespispace · 2024-04-20T10:08:15Z

FireCoder currently uses a tokenizer to determine the maximum length of the prompt for autocomplete. To achieve this, FireCoder sends the text to the llama.cpp tokenizer endpoint. However, this process takes time and cannot be used if the user is working on the cloud. It is important to provide as much context as possible, but the current method has some issues.

Firstly, to use the llama.cpp tokenizer, the user must download the server and model. However, this is not convenient for users who want to work with the cloud.
Secondly, preparing a prompt can take more than 2 seconds, which can be time-consuming.
Finally, FireCoder has a complex algorithm for selecting the maximum suitable length of the prompt with the minimum request to llama.cpp.

The solution is to use a local tokenizer that can be directly called from the extension. There are two possible options for this:

Use tokenizers, but it works poorly when combined with nodejs bindings, so further investigation is needed.
Use transformers.js, which should work well, but it still needs to be tested.

gespispace added the enhancement New feature or request label Apr 20, 2024

gespispace self-assigned this Apr 20, 2024

gespispace linked a pull request Apr 22, 2024 that will close this issue

feat(prompt): use tokenizer from @xenova/transformers #38

Merged

gespispace closed this as completed in #38 Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Use local tokenizer #37

[Feature] Use local tokenizer #37

gespispace commented Apr 20, 2024

[Feature] Use local tokenizer #37

[Feature] Use local tokenizer #37

Comments

gespispace commented Apr 20, 2024