Getting started

Setting up the pre-requisites

In order to use this tool you need two things: an API endpoint and a Qdrant vector DB instance.

LLM API Endpoint

In order to use this tool you need a LLM being served in a HTTP endpoint.

You can experiment with this tool in the open-source way using Ollama. It makes it easy to serve a model locally and works on all major operating systems. It also automatically tries to use your GPU for faster performance.

Once Ollama is installed following the instructions on their website, follow these steps:

Start Ollama
```
OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=localhost:8000 ollama serve
```
NOTE: If you use a different host make sure to pass it as an argument when using this tool (i.e., --host=localhost and --port=11434).
Pull the models:

First, pull the Granite Code Model. The Granite 8b base serves as the base model for this project.
```
OLLAMA_HOST=localhost:8000 ollama pull granite-code:8b
```
Then pull the Mistral model:
```
OLLAMA_HOST=localhost:8000 ollama pull mistral:latest
```
Import the customized settings for the Granite model

These settings make the Granite model more conservative.
```
cd modelfiles/granite-code-jbang-8b
OLLAMA_HOST=localhost:8000 ollama create granite-code-jbang:8b -f ./Modelfile
```
Alternatively, if you have enough memory (32Gb or more) you can try the 20b one:
```
ollama pull granite-code:20b
cd modelfiles/granite-code-jbang-20b
OLLAMA_HOST=localhost:8000 ollama create granite-code-jbang:20b -f ./Modelfile
```
Then, when using the application, pass the appropriate model name (i.e., --model-name=granite-code-jbang:20b).

Vector DB: Qdrant

The Qdrant database is needed to load and persist embeddings.

NOTE: If you are only using the data command then it is not needed.

podman run -d --rm --name qdrant -p 6334:6334 -p 6333:6333 qdrant/qdrant:v1.7.4-unprivileged

Running

This tool works as a standalone application or as a JBang plugin.

Running as Camel JBang Plugin

Install the plugin

NOTE: this requires Camel 4.8.1-SNAPSHOT or greater locally.

Build and install

Build
```
mvn install
```

Add to Camel JBang Plugins

jbang -Dcamel.jbang.version=4.8.1-SNAPSHOT camel@apache/camel plugin add --gav org.apache.camel.jbang.ai:camel-jbang-plugin-explain:1.0.0-SNAPSHOT --description "Explain things using AI"  explain

Running as Standalone Application

Build the package as standalone

mvn -Pstandalone package

Usage Examples

NOTE: If you are using the JBang plugin, replace all in the following commands java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar by jbang -Dcamel.jbang.version=4.8.1-SNAPSHOT camel@apache/camel explain.

Show all available commands:

java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar --help

Asking questions

First, make sure you have loaded data into the DB. You need to do this anytime you recreate the Vector DB

java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar load

Then, ask questions

java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar whatis --model-name=granite-code:8b --system-prompt="You are a coding assistant specialized in Apache Camel" "How can I enable manual commits for the Kafka component?"

java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar whatis --model-name=granite-code-jbang:8b --system-prompt="You are a coding assistant specialized in Apache Camel" "Is load balance enabled by default in the MongoDB component?"

java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar whatis --model-name=granite-code:8b --system-prompt="You are a coding assistant specialized in Apache Camel" "Is the client ID required for JMS 2.0 for the JMS component?"

Generate a training dataset

You can generate LLM training datasets from the catalog information.

JSON and Parquet files are generated in the dataset directory.

Components

Generate training data using the component information:

java -jar target/camel-jbang-plugin-explain-4.8.0-jar-with-dependencies.jar data generate --model-name mistral:latest --data-type components

Data formats

Generate training data using the dataformat information:

java -jar target/camel-jbang-plugin-explain-4.8.0-jar-with-dependencies.jar data generate --model-name mistral:latest --data-type dataformat

NOTE: A GPU is needed for this, otherwise it takes a very long time to generate the dataset (several days instead of about a day)

Others

In addition to dataformat and components, you can also generate datasets for: language, beans and eips.

Upload

To upload the components' dataset:

huggingface-cli upload --repo-type dataset my-org/camel-components .

To upload the data formats dataset:

huggingface-cli upload --repo-type dataset my-org/camel-dataformats .

Generate the documentation dump for training dataset

Before you prepare your dataset, you need to install 2 tools: asciidoc and pandoc. It also assumes you have the Camel source code on your system.

.Linux installation

sudo dnf install -y asciidoc pandoc

.macOS installation

brew install asciidoc pandoc

Then, convert the documentation from Camel:

scripts/prepare-docs-for-dataset.sh /path/to/your/camel/code/base

Dump the data:

java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar data dump --data-type component-documentation --source-path

Generate the taxonomy for InstructLab

To generate the taxonomy locally, follow these steps.

Download the taxonomy from https://github.com/megacamelus/taxonomy

Download the documentation repo from https://github.com/megacamelus/camel-upstream-info/tree/main. Then update the data using:

make fetch-docs fetch-components

Then, then run the following command to regenerate the taxonomy:

java -jar target/camel-jbang-plugin-explain-4.7.0-jar-with-dependencies.jar generate taxonomy --author orpiske \
   --document-repo https://github.com/megacamelus/camel-upstream-info \
   --document-commit e83af34070dcb575c96329ae1d5a9620ff8b4899 \
   --document-path $HOME/code/other/camel-assistant-taxonomy/camel-upstream-info/camel-components
   --taxonomy-path $HOME/code/python/instruct-lab/taxonomy/knowledge/technical_manual/apache/camel/features/components

Note:

taxonomy-path: the path to the taxonomy used to train with InstructLab
document-path: the path for the documents referenced in the taxonomy. InstructLab does not need those, but this application needs it to use to regenerate the QnA.

After that, you can run InstructLab training steps.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
modelfiles		modelfiles
scripts		scripts
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Getting started

Setting up the pre-requisites

LLM API Endpoint

Vector DB: Qdrant

Running

Running as Camel JBang Plugin

Install the plugin

Build and install

Running as Standalone Application

Usage Examples

Asking questions

Generate a training dataset

Components

Data formats

Others

Upload

Generate the documentation dump for training dataset

Generate the taxonomy for InstructLab

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Uh oh!

Uh oh!

megacamelus/camel-jbang-plugin-explain

Folders and files

Latest commit

History

Repository files navigation

Getting started

Setting up the pre-requisites

LLM API Endpoint

Vector DB: Qdrant

Running

Running as Camel JBang Plugin

Install the plugin

Build and install

Running as Standalone Application

Usage Examples

Asking questions

Generate a training dataset

Components

Data formats

Others

Upload

Generate the documentation dump for training dataset

Generate the taxonomy for InstructLab

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages