Audio Visualizer

A modern web application that transforms spoken audio into synchronized text transcriptions with AI-generated visual descriptions.

🌟 Features

Real-time Audio Transcription: Convert spoken words to text using OpenAI's Whisper model
Synchronized Text Display: View transcriptions that animate in sync with audio playback
AI Visual Descriptions: Experience generated visual descriptions that represent the content
Streaming Architecture: Process audio in chunks for a responsive experience
Intuitive Interface: Simple and elegant design for easy interaction

🚀 Getting Started

Prerequisites

Python 3.10+ for the backend
Node.js 18+ for the frontend
Whisper for speech-to-text
Ollama with LLaMA3 model for visual descriptions

Installation

Backend Setup

Navigate to the backend directory:
```
cd back
```

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install required packages:

pip install django django-cors-headers openai-whisper ffmpeg-python

Start the Django server:
```
python manage.py runserver
```

Frontend Setup

Navigate to the frontend directory:
```
cd front
```
Install dependencies:
```
npm install
```
Start the development server:
```
npm run dev
```

Ollama Setup (for Visual Descriptions)

Install Ollama
Pull the LLaMA3 model:
```
ollama pull llama3
```
Ensure the Ollama service is running at http://localhost:11434

🎯 How to Use

Open the application in your web browser (typically at http://localhost:5173)
Click on "Choose Audio File" to upload an audio recording
Wait for the transcription process to begin
Watch as the text appears synchronized with the audio playback
Experience AI-generated visual descriptions that represent the content

🔧 Architecture

The application is structured with:

Frontend: React with Vite for a fast and responsive UI
Backend: Django for robust server-side processing
Speech-to-Text: OpenAI's Whisper model for high-quality transcription
Visual Descriptions: LLaMA3 through Ollama for generating descriptive imagery

🔄 How It Works

Audio is uploaded from the client to the Django backend
The backend processes the audio in chunks using Whisper
Transcription results are streamed back to the frontend
Each chunk is sent to Ollama for visual description generation
The frontend synchronizes the display with audio playback
Visual descriptions update as the audio progresses

👥 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🌐 Future Enhancements

Multiple language support
Custom visual styling options
User accounts for saving and sharing transcriptions
Integration with stable diffusion for actual image generation
Mobile application support

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
back		back
front		front
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
implementation-plan.md		implementation-plan.md
initial-app-idea.md		initial-app-idea.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Visualizer

🌟 Features

🚀 Getting Started

Prerequisites

Installation

Backend Setup

Frontend Setup

Ollama Setup (for Visual Descriptions)

🎯 How to Use

🔧 Architecture

🔄 How It Works

👥 Contributing

📝 License

🌐 Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tpaulshippy/audio-visualizer

Folders and files

Latest commit

History

Repository files navigation

Audio Visualizer

🌟 Features

🚀 Getting Started

Prerequisites

Installation

Backend Setup

Frontend Setup

Ollama Setup (for Visual Descriptions)

🎯 How to Use

🔧 Architecture

🔄 How It Works

👥 Contributing

📝 License

🌐 Future Enhancements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages