-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summarize the extracted Content #33
Comments
This is something I have in mind but I'm not convinced yet that we should start using LLMs in gitingest. That being said if you want to start a PoC of this, I'm indeed very interested to see how it turns out |
Sure, Thanks |
I agree that summarisation should be done properly with other tools which will likely require a full RAG/vector store setup due to the size of the context required. Unless using googles 1M+ context window it's not a simple task to just pull everything here. |
You both have a valid point here: in the long term, LLMs and vectorisation techniques will be mandatory to achieve the best summary possible. So let me rephrase what I said earlier: The idea behind this is:
In the meantime, feel free to either:
|
I totally agree with you. A few suggestions from my end,
|
Very good point, making it optional is a good approach to this transition, with the option to use API models as well |
Hey, I am glad this post is really engaging ! I am working currently on a PoC for the same, as suggested by @argishh The option for letting the users their own API key is very feasible as the Target audiences are developers so they might know how to get their API key , alternatively we can also write instructions to do so, however the large repositories can exhaust upto 22K tokens even more! So that's why Gemini provides aroud 1M token per minutes , than Groq or other llm models . plus we can have other optimizing like reducing or bucket algorithmn to optimize , let's see how it goes . Right now I am engaged with ICPC .But I have a prototype for small repositories ready, feel free to checkout! and create issues , once I am done with ICPC regionals i.e 5th January I will implement the ideas I have and shared here for large repositories . Check this out , meanwhile : https://github.com/Sarahkhan20/GitZen (It works for small repositories for now ) |
@Sarahkhan20 nice work! I've gone through the code, and so far it looks great. I'll try it out next. I'm also interested in knowing how you're planning to optimize it for larger repos. Do not hesitate to reach out if you require any help to ideate or implement. |
I agree with @cyclotruc here. Before we move on to add more features, and trust me adding an LLM summarisation feature would be huge, and probably will need a lot of testing and time. Before we move on to adding such features, we should make sure that the working of the current version of gitingest is nice and robust. Also that we employ the best coding practices moving forward. Being a Machine Learning Engineer myself, I cannot help but want to work on such features! |
Definitely @joydeep049 . That's why @Sarahkhan20 started working on a separate PoC first. It'll take a while before the PoC is functional for larger repos, which would require summarization the most. So there's no rush as of now. |
Sometimes to understand what exactly the codebase does we need to summarize the whole codebase , so add a feature for summarizing. Shall I work on this feature? using META or gemini Model.
The text was updated successfully, but these errors were encountered: