Skip to content

immagiov4/FieldWise_Backend

Repository files navigation

FieldWise AI Backend

Overview

Backend server leveraging Genkit for AI-driven conversation and transcription services, powered by OpenAI's GPT-4 for conversation, Whisper for audio transcription, and Google Cloud Text-to-Speech (TTS) for speech synthesis. Built with Express.js, it provides routes for AI conversations, audio transcription, and TTS conversion.

Note: Currently, this project only utilizes the OpenAI-powered features (GPT-4 conversations and Whisper transcription). Google Cloud TTS integration and firebase authentication is available but not in active use.

Features

  • AI Conversations

    • Context-aware, human-like responses using GPT-4o
    • Multiple language support
    • Customizable conversation scripts
  • Audio Transcription

    • High accuracy with OpenAI's Whisper
    • Handles various audio formats (MP3, WAV, FLAC)
  • Text-to-Speech

    • Converts text to natural-sounding speech with Google Cloud TTS
    • Supports multiple languages and voices
  • Authentication

    • Firebase authentication ensures secure endpoints (can be bypassed with BYPASS_AUTH=true)
    • Easy integration for user management

Installation

  1. Clone the repository:
    git clone <repository-url>
    cd <repository-directory>
  2. Install dependencies:
    npm install
  3. Set up environment variables by creating a .env file:
    OPENAI_API_KEY=your_OPENAI_API_KEY
    GOOGLE_API_KEY=your_google_api_key
    GOOGLE_CLOUD_CREDENTIALS=./secret_keys/google_cloud_key.json
    FIREBASE_SERVICE_ACCOUNT_KEY=./secret_keys/firebase_key.json
    PORT=4000
    BYPASS_AUTH=true
  4. Add your Firebase service account key to ./secret_keys/firebase_key.json.
  5. Add your Google Cloud service account key to ./secret_keys/google_cloud_key.json.

Usage

Start the Server

npm start

API Endpoints

Health Check

  • GET /:
    Simple health check route.
    curl http://localhost:4000/

AI Conversation

  • POST /ai/converse: Starts or continues a conversation.
    • Request Body:
      {
         "language": "string",     // ISO 639-1 language code (e.g., "en", "es")
         "script": "string",       // Required. Defines conversation context and rules
         "history": [              // Mandatory. Array of messages
            {
               "role": "user" | "assistant",
               "content": "string"
            }
         ]
      }
      • language: Determines the language for AI responses
      • script: Required. Contains conversation rules, context, and flow logic. Example of script:
      Name: Relation Databases
      Topics:
      1. What are relational databases?
      2. How do they work?
      3. What are the benefits of using them?
      ...etc
      • history: Conversation history.
    • Response Body:
      {
         "reply": "string",              // AI's response message. Can contain special tokens.
         "feedback": "string",           // Negative-only feedback on user's input
         "correctnessPercent": number    // Accuracy score of user's input (0-100%)
      }
      • reply: it can contain special tokens starting with @. They are:
        • @END_CONVERSATION: indicates that the conversation finished
      • feedback: a feedback about the prompt, only containing constructive negative criticism, or otherwise equals to @NO_FEEDBACK.

Audio Transcription

  • POST /ai/transcribe: Transcribes an audio file.
    • Request: multipart/form-data with an audio field
      {
         "audio": "file"    // Supported formats: 'mp3', 'mp4', 'mpeg', 'mpga', 'wav', 'webm' (max 25MB)
      }
    • Response Body:
      {
         "transcript": "string"    // Transcribed text from the audio file
      }

Text-to-Speech

  • POST /ai/text-to-speech: Converts text to speech.
    • Request Body:
      {
         "text": "string",         // Text to convert to speech
         "languageCode": "string", // BCP-47 language code (e.g., "en-US")
         "name": "string"          // Voice name (e.g., "en-US-Standard-A")
      }
    • Response: Returns audio file in audio/mpeg format

Testing

Run tests using Jest:

npm test

About

Backend providing access to AI APIs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published