A simple finance application that allows users to add expenses and income using voice commands. The app uses Google's Gemini AI to parse voice input and extract transaction details.
- Voice Input: Record your financial transactions by speaking naturally
- AI Processing: Gemini AI extracts transaction details from voice input
- Expense & Income Tracking: Add both expenses and income to your financial records
- Simple UI: Clean, modern interface built with Next.js and Tailwind CSS
- Enhanced Voice Input Dialog: A reusable dialog component that provides voice input functionality across different pages
- Context-Aware Voice Processing: The system now recognizes the context (income/expense) and enhances voice commands accordingly
- Improved AI Integration: Updated to use Gemini 1.5 Flash for better recognition and response times
- Consistent UI: Standardized voice input interface across all transaction pages
- Frontend: Next.js, React, Tailwind CSS, shadcn/ui
- Backend: Express.js, MongoDB
- AI: Google Gemini API for natural language processing
- Voice Recognition: Web Speech API (built into modern browsers)
- Node.js (v16 or higher)
- MongoDB (running locally or MongoDB Atlas)
- Google Gemini API key
- Clone the repository
git clone <repository-url>
cd finance-voice-app
- Install dependencies
npm install --legacy-peer-deps
- Create a
.envfile in the server directory with the following:
PORT=5000
MONGODB_URI=mongodb://localhost:27017/finance_app
GEMINI_API_KEY=your_gemini_api_key_here
- Start the backend server
npm run server
- In a separate terminal, start the Next.js frontend
npm run dev
- Run both frontend and backend concurrently (recommended)
npm run dev:5001
- Open your browser and navigate to
http://localhost:3000
You can add transactions using voice input from multiple pages:
- Dashboard page: Use the Voice Transaction button in the main dashboard
- Income page: Click the "Add via Speech" button
- Expenses page: Click the "Add via Speech" button
- Click the "Add via Speech" button on any transaction page
- Click "Start Recording" and speak about a financial transaction
- For expenses: "I spent $45 on groceries at Trader Joe's yesterday"
- For income: "Got paid $2000 as salary on Friday"
- Click "Stop Recording" when finished
- Review the transcription
- Click "Create Transaction from Voice" to process
- The AI will extract transaction details and save it to the database
The application uses Google's Gemini 1.5 Flash model to process voice commands. The system:
- Captures your voice input through your browser
- Sends the text to the Gemini AI
- Extracts key transaction details (amount, category, description, date, etc.)
- Creates a properly formatted transaction in the database
If the AI service is temporarily unavailable, the system has a robust fallback mechanism that uses rule-based parsing to extract the basic transaction details.
This is a simple demonstration project designed for a university project. It includes:
- Basic authentication (placeholder)
- Local MongoDB storage
- Simple AI integration
- Voice recognition via browser APIs
For a production application, you would want to add:
- Robust authentication
- Data validation
- Error handling
- Production database setup
- Advanced security measures