A Streamlit web application that generates captions for uploaded images using Google's Gemini AI model.
- Upload images in JPG, JPEG, or PNG format
- Choose from three description styles:
- Short (1-line description)
- Detailed (paragraph description)
- Poetic (caption as a poem)
- Instant AI-powered image descriptions
- Simple and intuitive user interface
Before running this application, you'll need:
- Python 3.7+
- A Google AI Studio API key (Gemini API)
- Obtain a Google AI Studio API key from here
- Clone this repository or download the source code:
git clone https://github.com/DSC-UIT-khi/build-with-ai.git
cd build-with-ai
- Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install the required packages:
pip install -r requirements.txt
Replace the API key in the code with your own Google AI Studio API key:
client = genai.Client(api_key="YOUR_API_KEY_HERE")
Important: It's recommended to use environment variables for API keys in production environments.
- Run the Streamlit application:
streamlit run image_caption_app.py
-
Access the application in your web browser (typically at http://localhost:8501)
-
Upload an image using the file uploader
-
Select your preferred description style from the dropdown menu
-
Click "Generate Description" to get your AI-generated caption
- The application accepts image uploads from the user
- The uploaded image is saved to a temporary file
- The image is sent to Google's Gemini model along with a prompt based on the selected description style
- Gemini analyzes the image and generates a description
- The description is displayed on the web interface
- Streamlit - Web application framework
- Google Generative AI - AI model for image analysis and caption generation
The current code includes an API key directly in the source. For production use:
- Move the API key to an environment variable or a secure configuration file
- Add any configuration files with sensitive information to .gitignore
Contributions are welcome! Feel free to submit issues or pull requests.
Made with ❤️ by Moosa Raza
This project is open source and available under the MIT License.
- Google for providing the Gemini AI model
- Streamlit for the excellent web app framework