Audio Transcriber is a web application built with Next.js that allows users to transcribe audio files directly in the browser. It leverages the power of Transformers.js and Hugging Face models to perform client-side speech-to-text conversion, providing a fast, accurate, and privacy-focused solution for audio transcription.
- Client-side audio transcription using Transformers.js
- Utilizes Hugging Face's pre-trained speech recognition models
- Support for various audio file formats
- Real-time audio recording and transcription
- Export transcriptions as TXT or JSON
- Next.js - React framework for building the web application
- React - JavaScript library for building user interfaces
- TypeScript - Typed superset of JavaScript
- Tailwind CSS - Utility-first CSS framework
- Transformers.js - JavaScript library for state-of-the-art Machine Learning
- Hugging Face Models - Pre-trained models for various AI tasks, including speech recognition
- Web Workers API - For running transcription in background threads
- Web Audio API - For audio processing and recording
- Node.js (version 14 or later)
- npm
-
Clone the repository:
git clone https://github.com/subigya-js/audio-transcriber.git
cd audio-transcriber
-
Install dependencies:
npm install
-
Run the development server:
npm run dev
-
Open http://localhost:3000 with your browser to see the application.
- Upload an audio file or record audio directly in the browser.
- Click the "Transcribe" button to start the transcription process.
- The application will use Transformers.js to load the appropriate Hugging Face model for speech recognition.
- View the transcription results in real-time as they are processed.
- Once transcription is complete, you can export the results as TXT or JSON.
src/app/
- Next.js app router and page componentssrc/components/
- React components used throughout the applicationsrc/hooks/
- Custom React hooks, including the transcription logicsrc/utils/
- Utility functions and helperspublic/
- Static assets and files
The Audio Transcriber uses Transformers.js to load and run Hugging Face's speech recognition models directly in the browser. This approach allows for:
- Privacy: All processing happens on the client-side, so audio data never leaves the user's device.
- Speed: No need to upload audio files to a server, resulting in faster transcription times.
- Offline Capability: Once the model is loaded, the app can work without an internet connection.
The transcription process is handled by a Web Worker, ensuring that the main thread remains responsive during computationally intensive tasks.
Contributions are welcome! Please feel free to submit a Pull Request.
- Hugging Face for providing state-of-the-art NLP models and tools
- Transformers.js for enabling client-side machine learning in JavaScript
- Next.js for the amazing React framework
If you have any questions or feedback, please open an issue in the GitHub repository.