Name	Name	Last commit message	Last commit date
parent directory ..
assets	assets
cli	cli
docker	docker
docs	docs
examples	examples
models	models
pipeline	pipeline
prompts	prompts
server	server
serving/triton/text_generation	serving/triton/text_generation
tests	tests
tools	tools
ui	ui
utils	utils
README.md	README.md
__init__.py	__init__.py
chatbot.py	chatbot.py
config.py	config.py
config_logging.py	config_logging.py
errorcode.py	errorcode.py
plugins.py	plugins.py
requirements.txt	requirements.txt
requirements_cpu.txt	requirements_cpu.txt
requirements_hpu.txt	requirements_hpu.txt
requirements_pc.txt	requirements_pc.txt
requirements_xpu.txt	requirements_xpu.txt

NeuralChat

A customizable framework to create your own LLM-driven AI apps within minutes

🌟RESTful API | 🔥Features | 💻Examples | 📖Notebooks

Introduction

NeuralChat is a powerful and flexible open framework that empowers you to effortlessly create LLM-centric AI applications, including chatbots and copilots.

Support a range of hardware like Intel Xeon Scalable processors, Intel Gaudi AI processors, Intel® Data Center GPU Max Series and NVidia GPUs
Leverage the leading AI frameworks (e.g., PyTorch and popular domain libraries (e.g., Hugging Face, Langchain) with their extensions
Support the model customizations through parameter-efficient fine-tuning, quantization, and sparsity. Released Intel NeuralChat-7B LLM, ranking #1 in Hugging Face open LLM leaderboard in Nov'23
Provide a rich set of plugins that can augment the AI applications through retrieval-augmented generation (RAG) (e.g., fastRAG), content moderation, query caching, more
Integrate with popular serving frameworks (e.g., vLLM, TGI, Triton). Support OpenAI-compatible API to simplify the creation or migration of AI applications

NeuralChat is under active development. APIs are subject to change.

Installation

NeuralChat is under Intel Extension for Transformers, so ensure the installation of Intel Extension for Transformers first by following the installation. After that, install additional dependency for NeuralChat per your device:

# For CPU device
pip install -r requirements_cpu.txt

# For HPU device
pip install -r requirements_hpu.txt

# For XPU device
pip install -r requirements_xpu.txt

# For CUDA device
pip install -r requirements.txt

Getting Started

OpenAI-Compatible RESTful APIs

NeuralChat provides OpenAI-compatible RESTful APIs for LLM inference, so you can use NeuralChat as a drop-in replacement for OpenAI APIs. NeuralChat service can also be accessible through OpenAI client library, curl commands, and requests library. See neuralchat_api.md.

Launch OpenAI-compatible Service

NeuralChat launches a chatbot service using Intel/neural-chat-7b-v3-1 by default. You can customize the chatbot service by configuring the YAML file.

You can start the NeuralChat server either using the shell command or Python code.

Using Shell Command:

neuralchat_server start --config_file ./server/config/neuralchat.yaml

Using Python Code:

from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
server_executor = NeuralChatServerExecutor()
server_executor(config_file="./server/config/neuralchat.yaml", log_file="./neuralchat.log")

Access the Service

Once the service is running, you can observe an OpenAI-compatible endpoint /v1/chat/completions. You can use any of below ways to access the endpoint.

Using OpenAI Client Library

from openai import Client
# Replace 'your_api_key' with your actual OpenAI API key
api_key = 'your_api_key'
backend_url = 'http://127.0.0.1:80/v1/chat/completions'
client = Client(api_key=api_key, base_url=backend_url)
response = client.ChatCompletion.create(
      model="Intel/neural-chat-7b-v3-1",
      messages=[
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."},
      ]
)
print(response)

Using Curl

curl http://127.0.0.1:80/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "Intel/neural-chat-7b-v3-1",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."}
    ]
    }'

Using Python Requests Library

import requests
url = 'http://127.0.0.1:80/v1/chat/completions'
headers = {'Content-Type': 'application/json'}
data = '{"model": "Intel/neural-chat-7b-v3-1", "messages": [ \
          {"role": "system", "content": "You are a helpful assistant."}, \
          {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."}] \
       }'
response = requests.post(url, headers=headers, data=data)
print(response.json())

Langchain Extension APIs

Intel Extension for Transformers provides a comprehensive suite of Langchain-based extension APIs, including advanced retrievers, embedding models, and vector stores. These enhancements are carefully crafted to expand the capabilities of the original langchain API, ultimately boosting overall performance. This extension is specifically tailored to enhance the functionality and performance of RAG.

Vector Stores

We introduce enhanced vector store operations, enabling users to adjust and fine-tune their settings even after the chatbot has been initialized, offering a more adaptable and user-friendly experience. For langchain users, integrating and utilizing optimized Vector Stores is straightforward by replacing the original Chroma API in langchain.

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain_core.vectorstores import VectorStoreRetriever
from intel_extension_for_transformers.langchain.vectorstores import Chroma
retriever = VectorStoreRetriever(vectorstore=Chroma(...))
retrievalQA = RetrievalQA.from_llm(llm=HuggingFacePipeline(...), retriever=retriever)

Retrievers

We provide optimized retrievers such as VectorStoreRetriever, ChildParentRetriever to efficiently handle vectorstore operations, ensuring optimal retrieval performance.

from intel_extension_for_transformers.langchain.retrievers import ChildParentRetriever
from langchain.vectorstores import Chroma
retriever = ChildParentRetriever(vectorstore=Chroma(documents=child_documents), parentstore=Chroma(documents=parent_documents), search_type=xxx, search_kwargs={...})
docs=retriever.get_relevant_documents("Intel")

Please refer to this documentation for more details.

Customizing the NeuralChat Service

Users have the flexibility to customize the NeuralChat service by making modifications in the YAML configuration file. Detailed instructions can be found in the documentation.

Supported Models

NeuralChat boasts support for various generative Transformer models available in HuggingFace Transformers. The following is a curated list of models validated for both inference and fine-tuning within NeuralChat:

Pretrained model	Text Generation (Completions)	Text Generation (Chat Completions)	Summarization	Code Generation or SQL Generation
Intel/neural-chat-7b-v1-1	✅	✅	✅	✅
Intel/neural-chat-7b-v3-1	✅	✅	✅	✅
meta-llama/Llama-2-7b-chat-hf	✅	✅	✅	✅
meta-llama/Llama-2-70b-chat-hf	✅	✅	✅	✅
EleutherAI/gpt-j-6b	✅	✅	✅	✅
mosaicml/mpt-7b-chat	✅	✅	✅	✅
mistralai/Mistral-7B-v0.1	✅	✅	✅	✅
mistralai/Mixtral-8x7B-Instruct-v0.1	✅	✅	✅	✅
upstage/SOLAR-10.7B-Instruct-v1.0	✅	✅	✅	✅
THUDM/chatglm2-6b	✅	✅	✅	✅
THUDM/chatglm3-6b	✅	✅	✅	✅
Qwen/Qwen-7B	✅	✅	✅	✅
microsoft/phi-2	✅	✅	✅	✅
Deci/DeciLM-7B	✅	✅	✅	✅
Deci/DeciLM-7B-instruct	✅	✅	✅	✅
bigcode/starcoder				✅
codellama/CodeLlama-7b-hf				✅
codellama/CodeLlama-34b-hf				✅
Phind/Phind-CodeLlama-34B-v2				✅
Salesforce/codegen2-7B				✅
ise-uiuc/Magicoder-S-CL-7B				✅
defog/sqlcoder2				✅
defog/sqlcoder-34b-alpha				✅

Modify the model_name_or_path parameter in the YAML configuration file to load different models.

Rich Plugins

NeuralChat includes support for various plugins to enhance its capabilities:

Speech Processing
- Text-to-Speech (TTS)
- Automatic Speech Recognition (ASR)
RAG (Retrieval-Augmented Generation)
Safety Checker
Caching
Named Entity Recognition (NER)

Multimodal APIs

In addition to the text-based chat RESTful API, NeuralChat offers several helpful plugins in its RESTful API lineup to aid users in building multimodal applications. NeuralChat supports the following RESTful APIs:

Tasks List	RESTful APIs
textchat	/v1/chat/completions
	/v1/completions
voicechat	/v1/audio/speech
	/v1/audio/transcriptions
	/v1/audio/translations
retrieval	/v1/rag/create
	/v1/rag/append
	/v1/rag/upload_link
	/v1/rag/chat
codegen	/v1/code_generation
	/v1/code_chat
text2image	/v1/text2image
image2image	/v1/image2image
faceanimation	/v1/face_animation
finetune	/v1/finetune

Modify the tasks_list parameter in the YAML configuration file to seamlessly leverage different RESTful APIs as per your project needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neural_chat

neural_chat

README.md

NeuralChat

A customizable framework to create your own LLM-driven AI apps within minutes

Introduction

Installation

Getting Started

OpenAI-Compatible RESTful APIs

Launch OpenAI-compatible Service

Access the Service

Using OpenAI Client Library

Using Curl

Using Python Requests Library

Langchain Extension APIs

Vector Stores

Retrievers

Customizing the NeuralChat Service

Supported Models

Rich Plugins

Multimodal APIs

Files

neural_chat

Directory actions

More options

Directory actions

More options

Latest commit

History

neural_chat

Folders and files

parent directory

README.md

NeuralChat

A customizable framework to create your own LLM-driven AI apps within minutes

Introduction

Installation

Getting Started

OpenAI-Compatible RESTful APIs

Launch OpenAI-compatible Service

Access the Service

Using OpenAI Client Library

Using Curl

Using Python Requests Library

Langchain Extension APIs

Vector Stores

Retrievers

Customizing the NeuralChat Service

Supported Models

Rich Plugins

Multimodal APIs