In order to use this tool you need two things: an API endpoint and a Qdrant vector DB instance.
In order to use this tool you need a LLM being served in a HTTP endpoint.
You can experiment with this tool in the open-source way using Ollama. It makes it easy to serve a model locally and works on all major operating systems. It also automatically tries to use your GPU for faster performance.
Once Ollama is installed following the instructions on their website, follow these steps:
-
Start Ollama
OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=localhost:8000 ollama serve
NOTE: If you use a different host make sure to pass it as an argument when using this tool (i.e.,
--host=localhostand--port=11434). -
Pull the models:
First, pull the Granite Code Model. The Granite 8b base serves as the base model for this project.
OLLAMA_HOST=localhost:8000 ollama pull granite-code:8b
Then pull the Mistral model:
OLLAMA_HOST=localhost:8000 ollama pull mistral:latest
-
Import the customized settings for the Granite model
These settings make the Granite model more conservative.
cd modelfiles/granite-code-jbang-8b OLLAMA_HOST=localhost:8000 ollama create granite-code-jbang:8b -f ./ModelfileAlternatively, if you have enough memory (32Gb or more) you can try the 20b one:
ollama pull granite-code:20b cd modelfiles/granite-code-jbang-20b OLLAMA_HOST=localhost:8000 ollama create granite-code-jbang:20b -f ./ModelfileThen, when using the application, pass the appropriate model name (i.e.,
--model-name=granite-code-jbang:20b).
The Qdrant database is needed to load and persist embeddings.
NOTE: If you are only using the data command then it is not needed.
podman run -d --rm --name qdrant -p 6334:6334 -p 6333:6333 qdrant/qdrant:v1.7.4-unprivilegedThis tool works as a standalone application or as a JBang plugin.
NOTE: this requires Camel 4.8.1-SNAPSHOT or greater locally.
-
Build
mvn install
-
Add to Camel JBang Plugins
jbang -Dcamel.jbang.version=4.8.1-SNAPSHOT camel@apache/camel plugin add --gav org.apache.camel.jbang.ai:camel-jbang-plugin-explain:1.0.0-SNAPSHOT --description "Explain things using AI" explain
Build the package as standalone
mvn -Pstandalone packageNOTE: If you are using the JBang plugin, replace all in the following commands java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar by jbang -Dcamel.jbang.version=4.8.1-SNAPSHOT camel@apache/camel explain.
Show all available commands:
java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar --help
First, make sure you have loaded data into the DB. You need to do this anytime you recreate the Vector DB
java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar loadThen, ask questions
java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar whatis --model-name=granite-code:8b --system-prompt="You are a coding assistant specialized in Apache Camel" "How can I enable manual commits for the Kafka component?"
java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar whatis --model-name=granite-code-jbang:8b --system-prompt="You are a coding assistant specialized in Apache Camel" "Is load balance enabled by default in the MongoDB component?"
java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar whatis --model-name=granite-code:8b --system-prompt="You are a coding assistant specialized in Apache Camel" "Is the client ID required for JMS 2.0 for the JMS component?"You can generate LLM training datasets from the catalog information.
JSON and Parquet files are generated in the dataset directory.
Generate training data using the component information:
java -jar target/camel-jbang-plugin-explain-4.8.0-jar-with-dependencies.jar data generate --model-name mistral:latest --data-type componentsGenerate training data using the dataformat information:
java -jar target/camel-jbang-plugin-explain-4.8.0-jar-with-dependencies.jar data generate --model-name mistral:latest --data-type dataformatNOTE: A GPU is needed for this, otherwise it takes a very long time to generate the dataset (several days instead of about a day)
In addition to dataformat and components, you can also generate datasets for: language, beans and eips.
To upload the components' dataset:
huggingface-cli upload --repo-type dataset my-org/camel-components .To upload the data formats dataset:
huggingface-cli upload --repo-type dataset my-org/camel-dataformats .Before you prepare your dataset, you need to install 2 tools: asciidoc and pandoc. It also assumes you have the Camel source code on your system.
.Linux installation
sudo dnf install -y asciidoc pandoc.macOS installation
brew install asciidoc pandocThen, convert the documentation from Camel:
scripts/prepare-docs-for-dataset.sh /path/to/your/camel/code/baseDump the data:
java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar data dump --data-type component-documentation --source-pathTo generate the taxonomy locally, follow these steps.
Download the taxonomy from https://github.com/megacamelus/taxonomy
Download the documentation repo from https://github.com/megacamelus/camel-upstream-info/tree/main. Then update the data using:
make fetch-docs fetch-componentsThen, then run the following command to regenerate the taxonomy:
java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar generate taxonomy --author orpiske \
--document-repo https://github.com/megacamelus/camel-upstream-info \
--document-commit e83af34070dcb575c96329ae1d5a9620ff8b4899 \
--document-path $HOME/code/other/camel-assistant-taxonomy/camel-upstream-info/camel-components
--taxonomy-path $HOME/code/python/instruct-lab/taxonomy/knowledge/technical_manual/apache/camel/features/componentsNote:
- taxonomy-path: the path to the taxonomy used to train with InstructLab
- document-path: the path for the documents referenced in the taxonomy. InstructLab does not need those, but this application needs it to use to regenerate the QnA.
After that, you can run InstructLab training steps.