⚠️ Warning: Do not make changes in the main branch. Instead, create a separate branch for each change you want to make, make the changes, test it, and then merge it back into main.
- How to Add a New NVIDIA Endpoint to the Model Dropdown in the UI
- How to Modify the Embedding Model
- How to Modify Vector Database Clearing Behavior
- How to Modify the Agent's Recursion Limit
- How to Modify Tavily Search Settings
- How to See Higher Resolution Error Messaging in the Monitor Tab
Who is this guide for?
- People that know some Python
- People that want to adapt or improve the application
- People that want to explore code to see how agents work
What are the guide limitations?
- It isn't comprehensive and doesn't go into full detail
- It assumes you can basically figure things out if you are pointed to the correct section
- There may be some errors in the details for the guidance, but you should be able to figure your way out past them
What else do I need to know?
- You will need to be able to check errors and do some debugging if you alter the code. Best to use an LLM to help you interpret errors and figure out what to do.
- The first place to find errors is in the Output widget in the Desktop App (bottom left corner)
- Click Output and select Chat from the dropdown.
⚠️ Warning: Do not make changes in the main branch. Instead, create a separate branch for each change you want to make, make the changes, test it, and then merge it back into main.
You can add more NVIDIA endpoints to the dropdown API menus in the Models tab of the Gradio interface (converse.py).
You can add your model by:
- providing a new variable, e.g.
NEMO, assigned to the relevant endpoint string - (NVIDIA only) adding that endpoint to the conditional logic that prepends the internal endpoint bits
- Updating the list of models pulled into the dropdown
Adding a model this way will make it available to all of the different pipeline components.
code/chatui/pages/converse.py- Search:
Model identifiers with prefix - Search:
Modify model identifiers - Search:
build_page()>model_list
- Search:
- Go to build.nvidia.com and find a large language model, e.g. Llama 3.1 Nemotron Ultra
- Copy the provider-model path, e.g.
nvidia/llama-3_1-nemotron-ultra-253b-v1
- Find the
Model identifers with prefixincode/chatui/pages/converse.py - Add your model to the section by defining a new variable:
NEMO = "nvidia/llama-3_1-nemotron-ultra-253b-v1"
- Find the list
model_listin thebuild_page()function - Add your model to the list:
model_list = [LLAMA, MISTRAL, NEMO]
If you're using INTERNAL_API, you need to make sure you put the proper prefix on the model identifier.
Find the # Modify model identifiers section and update the endpoint logic:
if INTERNAL_API != '':
NEMO = f'{INTERNAL_API}/nvidia/llama-3_1-nemotron-ultra-253b-v1'- The guidance below is to give you an idea of how to change things by adding a single model. You may want to add many models, and if so the current code and the guidance below should be modified.
- For example, there are different Llama, Mistral, and NVIDIA model endpoints on build.nvidia.com. If you want to add more than one model from a given provider, the naming convention used below would need to change.
- In addition, the internal endpoint logic is relevant to NVIDIA's internal endpoints, not necessarily to any other setup. If you aren't at NVIDIA, it's not currently setup to support you.
You can modify the embedding model used for document processing in the vector database by editing the configuration in code/chatui/utils/database.py.
code/chatui/utils/database.py- Search:
Default model for public embedding - Search:
Set the chunk size and overlap
- Search:
- Select an embedding model that is compatible with your needs
- Common choices include:
- OpenAI's text-embedding-ada-002
- Hugging Face's sentence-transformers
- Cohere's embedding models
- Or any other embedding model that provides vector representations
- Find the
Default model for public embeddingsection incode/chatui/utils/database.py - Update the
EMBEDDINGS_MODELvariable with your chosen model:# Default model for public embedding EMBEDDINGS_MODEL = 'your-embedding-model-name'
You can modify how documents are split and processed by adjusting the chunk size and overlap parameters:
# Set the chunk size and overlap for the text splitter
DEFAULT_CHUNK_SIZE = 250 # Adjust this value to change chunk size
DEFAULT_CHUNK_OVERLAP = 0 # Adjust this value to change overlap- The embedding model must be compatible with the vector store implementation
- Changing the embedding model will require re-embedding all documents in your vector store
- The chunk size and overlap settings affect how documents are processed and retrieved
- Make sure to update any API keys or authentication required for the new embedding model
You can modify how the vector database is cleared by adjusting the delete_all parameter in the _clear() function in code/chatui/utils/database.py.
code/chatui/utils/database.py- Search:
Clear the Chroma collection - Search:
delete_all: bool = True
- Search:
The vector database clearing has two behaviors:
- Basic clearing: Only clears the current Chroma collection
- Full clearing (default): Clears both the collection and all associated files/directories
- Find the
_clear()function incode/chatui/utils/database.py - Change the default value of
delete_alltoFalseto preserve previous searches:def _clear( persist_directory: str = "/project/data", collection_name: str = "rag-chroma", delete_all: bool = False # Changed from True to False ):
- Setting
delete_alltoFalsewill preserve files in the persist directory - Hidden files (starting with '.') are always preserved regardless of this setting
- The current collection will still be cleared even with
delete_all = False
You can modify how many times the agent can recursively process a query by adjusting the recursion limit in code/chatui/pages/converse.py.
code/chatui/pages/converse.py- Search:
Set recursion limit - Search:
DEFAULT_RECURSION_LIMIT
- Search:
The recursion limit controls how many times the agent can:
- Re-route questions between different components
- Re-try generating answers when previous attempts fail
- Iterate through the document retrieval and grading process
You can change the recursion limit in two ways.
Regardless of how you set it, the app must be restarted for the change to take effect.
-
Environment Variable (Recommended):
- If the app is not running:
# In a terminal attached to your container export RECURSION_LIMIT=20 # Set to your desired value # Then start the app
- If the app is already running:
# In a terminal attached to your container # First stop the app # Then set the environment variable export RECURSION_LIMIT=20 # Then restart the app
- If the app is not running:
-
Direct Code Modification:
- Find the recursion limit configuration in
converse.py:
DEFAULT_RECURSION_LIMIT = 10 RECURSION_LIMIT = int(os.getenv("RECURSION_LIMIT", DEFAULT_RECURSION_LIMIT))
- Change the
DEFAULT_RECURSION_LIMITvalue to your desired number - This change will take effect after restarting the app
- Find the recursion limit configuration in
- Setting the limit too high may cause the agent to get stuck in loops
- Setting the limit too low may prevent the agent from fully processing complex queries
- The default value of 10 is a good balance for most use cases
- You can monitor recursion depth in the Output widget of the Desktop App
- Environment variable changes require app restart to take effect
- Code modifications require app restart and code recompilation to take effect
- Environment variables must be set in a terminal attached to the running container
You can modify how many search results Tavily returns by adjusting the TAVILY_K parameter in code/chatui/utils/graph.py.
code/chatui/utils/graph.py- Search:
Tavily related parameters - Search:
DEFAULT_TAVILY_K
- Search:
The TAVILY_K parameter controls:
- How many search results are returned from Tavily
- The amount of web content available for the agent to process
- The breadth of information considered when answering questions
You can change the number of search results in two ways.
Regardless of how you set it, the app will need to be restarted.
-
Environment Variable (Recommended):
- If the app is not running:
# In a terminal attached to your container export TAVILY_K=5 # Set to your desired value # Then start the app
- If the app is already running:
# In a terminal attached to your container # First stop the app # Then set the environment variable export TAVILY_K=5 # Then restart the app
- If the app is not running:
-
Direct Code Modification:
- Find the Tavily configuration in
graph.py:
DEFAULT_TAVILY_K = 3 TAVILY_K = int(os.getenv("TAVILY_K", DEFAULT_TAVILY_K))
- Change the
DEFAULT_TAVILY_Kvalue to your desired number - This change will take effect after restarting the app
- Find the Tavily configuration in
- Setting the value too high may increase response time and API costs
- Setting the value too low may limit the information available to the agent
- The default value of 3 is a good balance for most use cases
- Environment variable changes require app restart to take effect
- Code modifications require app restart and code recompilation to take effect
- Environment variables must be set in a terminal attached to the running container
By default, the Monitor tab only shows stdout messages. You can modify the logging system to capture error messages (stderr) and even see all output by editing code/chatui/utils/logger.py and code/chatui/pages/converse.py.
If you want to optionally capture error messages, modify the Logger class in logger.py:
class Logger:
def __init__(self, filename, stream_type='stdout', capture_errors=False):
self.stream_type = stream_type
self.capture_errors = capture_errors
if stream_type == 'stdout':
self.terminal = sys.stdout
else: # stderr
self.terminal = sys.stderr
self.log = open(filename, "a") # Changed to append mode
def write(self, message):
self.terminal.write(message)
# Only log stderr if capture_errors is True
if self.stream_type == 'stdout' or (self.stream_type == 'stderr' and self.capture_errors):
if self.stream_type == 'stderr':
message = f"[ERROR] {message}"
self.log.write(message)Then in converse.py, you can enable error capture when needed:
sys.stdout = logger.Logger("/project/code/output.log", 'stdout')
sys.stderr = logger.Logger("/project/code/output.log", 'stderr', capture_errors=True) # Set to True to capture errorsIf you want to see literally everything (including debug messages and internal processing), you can modify the Logger class to capture all output without filtering:
class Logger:
def __init__(self, filename, stream_type='stdout', capture_all=False):
self.stream_type = stream_type
self.capture_all = capture_all
if stream_type == 'stdout':
self.terminal = sys.stdout
else: # stderr
self.terminal = sys.stderr
self.log = open(filename, "a")
def write(self, message):
self.terminal.write(message)
# Log everything if capture_all is True
if self.capture_all or self.stream_type == 'stdout':
if self.stream_type == 'stderr':
message = f"[ERROR] {message}"
self.log.write(message)Then in converse.py:
sys.stdout = logger.Logger("/project/code/output.log", 'stdout', capture_all=True)
sys.stderr = logger.Logger("/project/code/output.log", 'stderr', capture_all=True)- Capturing errors or all output may result in a lot of noise in the Monitor tab
- Some error messages might be expected and not indicate actual problems
- Consider using these options temporarily for debugging rather than permanently
- The log file will grow larger when capturing more output
- You may want to add additional filtering based on message content