Skip to content

Add LLM inference series to community examples #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,8 @@ Community examples are sample code and deployments for RAG pipelines that are no

* [Chat with LLM Llama 3.1 Nemotron Nano 4B](./chat-llama-nemotron/)

This is a React-based conversational UI designed for interacting with a powerful local LLM. It incorporates RAG to enhance contextual understanding and is backed by an NVIDIA Dynamo inference server running the NVIDIA Llama-3.1-Nemotron-Nano-4B-v1.1 model. The setup enables low-latency, cloud-free AI assistant capabilities, with live document search and reasoning, all deployable on local or edge infrastructure.
This is a React-based conversational UI designed for interacting with a powerful local LLM. It incorporates RAG to enhance contextual understanding and is backed by an NVIDIA Dynamo inference server running the NVIDIA Llama-3.1-Nemotron-Nano-4B-v1.1 model. The setup enables low-latency, cloud-free AI assistant capabilities, with live document search and reasoning, all deployable on local or edge infrastructure.

* [LLM Inference Series: Performance, Optimization & Deployment with LLMs](llm-inference-series)

This repository supports a video + notebook series exploring how to run, optimize, and serve Large Language Models (LLMs) with a focus on latency, throughput, user experience (UX), and NVIDIA GPU acceleration.
35 changes: 35 additions & 0 deletions community/llm-inference-series/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
.venv/

# Jupyter
.ipynb_checkpoints/

# Data files
*.csv
*.json
*.pkl
*.h5
*.hdf5

# OS
.DS_Store
Thumbs.db

# IDE
.vscode/
.idea/
*.swp
*.swo

# Logs
*.log

# Project
01_inference_101/batch_benchmark.csv
Loading