A feature-rich Next.js web application demonstrating real-time conversational AI capabilities using Agora's Real-Time Communication SDK. This demo showcases voice-first interactions with live transcriptions, multi-device audio input support, and an Agent ready to help you wiht your Agora build.
This application demonstrates how to build a production-ready conversational AI interface with:
- Real-time voice conversations with AI agents powered by Agora's Conversational AI Engine
- Live text transcriptions with streaming message updates and visual status indicators
- Advanced audio controls including device selection and visual feedback
- Modern UX patterns like smart auto-scrolling, mobile responsiveness, and accessibility features
- Flexible backend integration supporting multiple LLM providers (OpenAI, Anthropic, etc.) and TTS services (Microsoft Azure, ElevenLabs)
- Guide.md - Complete step-by-step guide on how to build this application from scratch.
- User Interaction Diagram - Visual diagram showing how the application interacts with different services.
- Text Streaming Guide - Deep dive into implementing real-time conversation transcriptions.
- Microphone Selector Implementation - Guide for adding device selection functionality.
Before you begin, ensure you have the following installed:
You must have an Agora account and a project to use this application.
- Clone the repository:
git clone https://github.com/AgoraIO-Community/conversational-ai-nextjs-client
cd conversational-ai-nextjs-client- Install dependencies:
pnpm install- Create a
.env.localfile in the root directory and add your environment variables:
cp .env.local.example .env.localThe following environment variables are required:
NEXT_PUBLIC_AGORA_APP_ID- Your Agora App IDNEXT_AGORA_APP_CERTIFICATE- Your Agora App CertificateNEXT_AGORA_CONVO_AI_BASE_URL- Agora Conversation AI Base URLNEXT_AGORA_CUSTOMER_ID- Your Agora Customer IDNEXT_AGORA_CUSTOMER_SECRET- Your Agora Customer SecretNEXT_AGENT_UID- Agent UID (defaults to "Agent")
NEXT_LLM_URL- LLM API endpoint URLNEXT_LLM_TOKEN- LLM API authentication tokenNEXT_LLM_MODEL- LLM model to use (optional)
Choose one of the following TTS providers:
NEXT_TTS_VENDOR=microsoftNEXT_MICROSOFT_TTS_KEY- Microsoft TTS API keyNEXT_MICROSOFT_TTS_REGION- Microsoft TTS regionNEXT_MICROSOFT_TTS_VOICE_NAME- Voice name (optional, defaults to 'en-US-AndrewMultilingualNeural')NEXT_MICROSOFT_TTS_RATE- Speech rate (optional, defaults to 1.0)NEXT_MICROSOFT_TTS_VOLUME- Volume (optional, defaults to 100.0)
NEXT_TTS_VENDOR=elevenlabsNEXT_ELEVENLABS_API_KEY- ElevenLabs API keyNEXT_ELEVENLABS_VOICE_ID- ElevenLabs voice IDNEXT_ELEVENLABS_MODEL_ID- Model ID (optional, defaults to 'eleven_flash_v2_5')
NEXT_INPUT_MODALITIES- Comma-separated list of input modalities (defaults to 'text')NEXT_OUTPUT_MODALITIES- Comma-separated list of output modalities (defaults to 'text,audio')
- Run the development server:
pnpm dev- Open your browser and navigate to
http://localhost:3000to see the application in action.
This project is configured for quick deployments to Vercel.
This will:
- Clone the repository to your GitHub account
- Create a new project on Vercel
- Prompt you to fill in the required environment variables:
- Required: Agora credentials (
NEXT_PUBLIC_AGORA_APP_ID,NEXT_AGORA_APP_CERTIFICATE, etc.) - Required: LLM API key (
NEXT_LLM_API_KEY) - OpenAI API key by default - Required: Either Microsoft TTS key (
NEXT_MICROSOFT_TTS_KEY) or ElevenLabs API key (NEXT_ELEVENLABS_API_KEY) - Other variables have defaults if values are not provided
- Required: Agora credentials (
- Deploy the application automatically
- Microphone Toggle: Easy-to-use button to enable/disable your microphone
- Device Selection: Choose from multiple microphone inputs with the microphone selector dropdown
- Hot-Swap Support: Automatically detects when devices are plugged in/unplugged
- Audio Visualization: Real-time visual feedback showing microphone input levels
- Live Transcriptions: See what you say and the AI's responses in real-time as text
- Message Status Indicators: Visual feedback for in-progress, completed, and interrupted messages
- Smart Auto-Scroll: Automatically scrolls to new messages while preserving scroll position when reviewing history
- Mobile-Responsive Chat UI: Collapsible chat window that adapts to different screen sizes
- Desktop Auto-Open: Chat window automatically opens on first message (desktop only)
- Message Persistence: Full conversation history maintained throughout the session
- Custom LLM Integration: Connect your preferred LLM (OpenAI, Anthropic, etc.)
- Multiple TTS Providers: Support for Microsoft Azure TTS and ElevenLabs
- Voice Activity Detection: Smart VAD settings for natural conversation flow
- Token Management: Automatic token renewal to prevent disconnections
- Agent Control: Start, stop, and restart AI agent during the conversation
- Audio Visualizations: Animated frequency bars for both user and AI audio
- Connection Status: Real-time connection indicators
- Error Handling: Graceful error messages and recovery options
- Accessibility: ARIA labels and keyboard-friendly controls
Male voices:
- en-US-AndrewMultilingualNeural (default)
- en-US-ChristopherNeural (casual, friendly)
- en-US-GuyNeural (professional)
- en-US-JasonNeural (clear, energetic)
- en-US-TonyNeural (enthusiastic)
Female voices:
- en-US-JennyNeural (assistant-like)
- en-US-AriaNeural (professional)
- en-US-EmmaNeural (friendly)
- en-US-SaraNeural (warm)
Try voices: https://speech.microsoft.com/portal/voicegallery
Try voices: https://elevenlabs.io/app/voice-lab
The application is built with a modular component architecture:
LandingPage.tsx: Entry point that initializes the Agora client and manages the conversation lifecycleConversationComponent.tsx: Main conversation container handling RTC connections, agent management, and audio/text streamingMicrophoneButton.tsx: Interactive button with built-in audio visualization for microphone controlMicrophoneSelector.tsx: Dropdown component for selecting audio input devices with hot-swap supportConvoTextStream.tsx: Real-time text transcription display with smart scrolling and message managementAudioVisualizer.tsx: Visual feedback component showing audio frequency data for remote users
lib/message.ts: MessageEngine for processing and managing conversation transcriptionslib/utils.ts: Helper functions including markdown rendering for chat messagestypes/conversation.ts: TypeScript type definitions for conversation data structures
Contributions are welcome! Please feel free to submit a Pull Request.
The application provides the following API endpoints:
- Endpoint:
/api/generate-agora-token - Method: GET
- Query Parameters:
uid(optional) - User ID (defaults to 0)channel(optional) - Channel name (auto-generated if not provided)
- Response: Returns token, uid, and channel information
- Endpoint:
/api/invite-agent - Method: POST
- Body:
{
requester_id: string;
channel_name: string;
input_modalities?: string[];
output_modalities?: string[];
}- Endpoint:
/api/stop-conversation - Method: POST
- Body:
{
agent_id: string;
}The text streaming feature uses Agora's MessageEngine to handle real-time transcriptions:
- MessageEngine (
lib/message.ts) processes incoming stream messages from the Agora data channel - ConversationComponent manages message state and updates, separating in-progress messages from completed ones
- ConvoTextStream renders the UI with smart scrolling and visual indicators for message status
Message states include:
IN_PROGRESS: Currently being transcribed/streamedEND: Successfully completed messageINTERRUPTED: Message cut off by user or system
The MicrophoneSelector component provides:
- Device enumeration via
AgoraRTC.getMicrophones() - Hot-swap detection through
AgoraRTC.onMicrophoneChangedcallbacks - Seamless switching using
localMicrophoneTrack.setDevice(deviceId) - Automatic fallback when the current device is disconnected
Both the MicrophoneButton and AudioVisualizer components use the Web Audio API:
- Creates an
AudioContextandAnalyserNode - Connects to the Agora audio track's MediaStream
- Uses
getByteFrequencyData()to extract frequency information - Animates visual bars using
requestAnimationFramefor smooth 60fps updates
