This repository contains the Python and Unity code for a paper titled "Mitigating Response Delays in Free-Form Conversations with LLM-powered Intelligent Virtual Agents" to appear in the Proceedings of the 7th ACM Conference on Conversational User Interfaces (CUI '25). If you use this code or Unity environments in your research, please cite our paper (see Citation section below).
All scenes are located in iva-cui-unity/Assets/Scenes/. List of licenses for third-party code and assets used in this project can be found in the ASSET_LICENSES.md file.
City_Scene.unity
-> Scenario 1Hotel_Scene.unity
-> Scenario 2Museum_Scene.unity
-> Scenario 3
- Unity version: 2022.3.21
- Run Python backend before running the Unity scenes.
- VR and Desktop (non-VR) modes are supported. Follow instructions in Desktop Mode and VR Mode.
- To speak with agents, toggle mic on before and toggle mic off after you speak (see Controls). Adjust microphone on the
SceneControls
gameobject in scene hierarchy (see screenshot below, Desktop Mode and VR Mode). - Agents will respond after a short delay. If no agent can hear you or an agent is currently thinking or speaking, you will hear a broken mic sound.
- Enable
WASD Player
gameobject in hierarchy - Disable
XR Interaction Setup
gameobject in hierarchy - On the
SceneControls
gameobject, set a working microphone
- Enable
XR Interaction Setup
gameobject in hierarchy - Disable
WASD Player
gameobject in hierarchy - On the
SceneControls
gameobject, set microphone toOculus Virtual Audio Device
(or other device equivalent)
Action | VR Mode | Desktop Mode |
---|---|---|
Toggle microphone | A | M |
Move | Left Stick | WASD |
Look around | Right Stick | Mouse |
Sprint | – | Left Shift |
Interact with objects | Side Trigger (Grab) | – |
Backend (we also call it 'middleware') is responsible for handling requests from Unity, processing audio files, and interacting with the LLM server. It is located in the iva-cui-backend directory.
The outcome from following these instructions should be:
- A local LLM server running on port
8082
(or11434
for Ollama) - A local ASR server running on port
8083
- A local Python middleware server running on port
8000
By default, backend runs using Ollama. We recommend using it, however, OpenAI API-style LLM server endpoints and locally-deployed options (llamafile and LMStudio) are also supported. LLM API endpoints are specified in iva-cui-backend/python_middleware/llm_backends.py. If you want to switch to OpenAI-style endpoints, you can do so by changing the LLM_BACKEND
variable in iva-cui-backend/python_middleware/app.py.
- Download and install Ollama.
- Run
ollama run llama3.1:8b-instruct-q5_K_M
. - Set the
LLM_BACKEND
variable in iva-cui-backend/python_middleware/app.py toollama
.
- Download, install and run LMStudio.
- Download this model
lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf
. - Set the UI mode to "Developer" or "Power User" (bottom left corner).
- Go to "Developer" tab -> Settings -> Server Port and set it to
8082
. - Start the server by toggling the switch in the top left corner.
- Set the
LLM_BACKEND
variable in iva-cui-backend/python_middleware/app.py tollamafile_llama3
.
- Download llamafile-0.9.0
- Rename
llamafile-0.9.0
tollamafile-0.9.0.exe
- Download
Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf
from huggingface - Run
llamafile-0.9.0.exe --server -ngl 9999 -m Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf --host 0.0.0.0 --port 8082
- Set the
LLM_BACKEND
variable in iva-cui-backend/python_middleware/app.py tollamafile_llama3
- Create a file mentioned in the
load_openai_key()
function in iva-cui-backend/python_middleware/llm_backends.py and put your OpenAI API key there. The file should contain only the key, no other text. Alternatively, modify that function to load the key from an environment variable. You can also make the function directly return the key in the code (not recommended). - Set the
LLM_BACKEND
variable in iva-cui-backend/python_middleware/app.py toopenai_4
oropenai_4mini
. You can also use other models by directly setting themodel="gpt-4o"
in an appropriate class in the iva-cui-backend/python_middleware/llm_backends.py file.
# create and activate virtual environment
python -m venv venv
venv\Scripts\activate
# install the required packages
pip install openai ollama edge-tts FastAPI[all]
# navigate to the directory and run the server
cd iva-cui-backend\python_middleware
uvicorn app:app --reload
# create a virtual environment
sudo apt update
sudo apt install python3-venv
python3 -m venv venv
# activate the virtual environment
source venv/bin/activate
# install the required packages
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*
export LD_LIBRARY_PATH=`python -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`
pip install faster_whisper FastAPI[all]
# navigate to directory and run the ASR server
cd iva-cui-backend\transcription_server
python whisper_server.py
cd iva-cui-backend\python_middleware
python test_conv.py
Mykola Maslych, Mohammadreza Katebi, Christopher Lee, Yahya Hmaiti, Amirpouya Ghasemaghaei, Christian Pumarada, Janneese Palmer, Esteban Segarra Martinez, Marco Emporio, Warren Snipes, Ryan P. McMahan, Joseph J. LaViola Jr.
If you use this code in your research, please cite our paper:
@inproceedings{Maslych2025Mitigating,
author = {Maslych, Mykola and Katebi, Mohammadreza and Lee, Christopher and Hmaiti, Yahya and Ghasemaghaei, Amirpouya and Pumarada, Christian and Palmer, Janneese and Segarra Martinez, Esteban and Emporio, Marco and Snipes, Warren and McMahan, Ryan P. and LaViola Jr., Joseph J.},
title = {Mitigating Response Delays in Free-Form Conversations with LLM-powered Intelligent Virtual Agents},
year = {2025},
isbn = {9798400715273},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3719160.3736636},
doi = {10.1145/3719160.3736636},
booktitle = {Proceedings of the 7th ACM Conference on Conversational User Interfaces},
articleno = {49},
numpages = {15},
month = {jul},
series = {CUI '25},
location = {Waterloo, ON, Canada},
}