This project, "Emotion Feedback," aims to recommend conversation topics for video chats based on real-time emotion analysis.
- 이원재
- 이준범
- 권영우
- 심재호
"Emotion Feedback" is a system designed to alleviate awkward atmospheres and lack of conversation topics during video chats by recommending topics based on real-time emotion analysis.
- Project Overview
- Problem Statement
- System Features
- Data Sources
- Models and Outputs
- Performance Metrics
- Recommendation Types
- System Architecture
- Evaluation and Future Plans
- Team and Contributions
- License
- Awkward atmosphere
- Lack of conversation topics
- Emotion-based conversation topic recommendation
- Real-time Emotion Analysis: Analyze and store the emotions of the participants in real-time.
- Emotion-based Topic Recommendation: Recommend conversation topics based on the stored emotions and conversation content.
- Understanding Partner’s Favorability: Analyze and display the favorability graph of the conversation partner post-conversation.
- FER-2013 (Kaggle): Dataset labeled with seven types of emotions (28,709 samples).
- Self-Collected Data (Google): Dataset labeled from 0 to 9 (597 samples).
- Multimodal Video Data (AI Hub): Labeled data showing conversation and actions in specific situations, including positive/negative emotions, intensity, and dialogues.
- Outputs the probability of favorability based on visual data.
- Outputs the probability of favorability based on audio and text data.
- Combines the outputs from the video and audio/text models:
Final Favorability = (0.43 * Video Model Score) + (0.57 * Audio/Text Model Score)
- Profile-Based Recommendation
- Favorability and Conversation Content-Based Recommendation
- Random Topic Recommendation
- Video Call: Users start a video call through the system.
- Data Collection and Analysis:
- Image Analysis: Analyze the user's facial expressions captured from the video.
- Text Analysis: Convert the conversation into text and analyze the sentiment.
- Audio Analysis: Analyze the tone and speed of the user's voice to determine the emotional state.
- Preprocessing images to fit CNN models.
- Converting and preprocessing audio and text data for analysis.
- Combining data from text, image, and audio analysis to evaluate the favorability between users.
- Using stored data and GPT API to recommend conversation topics in real-time.
- Storing analyzed text data and favorability scores in a database.
- Providing feedback to users based on stored data after the conversation ends.
- Frontend: React, Figma, WebRTC, MediaStream API
- Backend: FastAPI, SpringBoot, WebSocket, PostgreSQL, AWS
- Modeling: CNN for image, audio, and text analysis, LangChain for natural language processing
Despite achieving significant accuracy for emotion detection, continuous improvements are planned, including expanding the dataset, refining models, and exploring additional applications such as enhancing conversational flow and user engagement.
- STT Delay: Mitigated by adding a 3-second delay to accommodate processing time.
- Model Accuracy: Improved by modifying preprocessing steps and model parameters, achieving 78% accuracy for image models and 82% for audio/text models.
- Real-time Processing: Achieved using asynchronous processing with FastAPI to handle model server communications.
- Increase the amount of quality data to improve model performance.
- Explore further applications, such as tracking the ball's trajectory, measuring speed, and predicting match outcomes.
- 이원재: Model Server API Development and Deployment, Prompt Engineering
- 이준범: Backend Server API Development, Signaling Server Development
- 권영우: Frontend Development
- 심재호: Modeling
This project is licensed under the MIT License.