Hear the world using Azure OpenAI and a Raspberry Pi

Overview

In my free time, I have a strong interest in electronics, particularly in microcomputers like the Raspberry Pi, microcontrollers like the ESP32, and various sensors and circuits. I'm fascinated by the integration of hardware, physics, and software to create amazing projects, which is now more accessible than ever. I learn best through practical tasks, so I always set my own challenges. This project is one of those challenges, and I’m excited to share it with you to demonstrate that not everything is as complex as it seems. I am not an expert, so your feedback is more than welcome.

This project is designed to help visually impaired individuals by recognizing objects in images and providing audio feedback. To achieve this I use a Raspberry Pi Zero 2 W, camera, OLED display, speaker, vibration motor, and Azure OpenAI and Speech Service for image recognition and text-to-speech synthesis. The user interface includes a touch sensor to trigger image capture and analysis.

The project is explained in greater detail in my blog post: Hear the world using Azure OpenAI and a Raspberry Pi - marcogerber.ch

Requirements

Hardware

Raspberry Pi Zero 2 W
Zero Spy Camera
SSD1306 OLED Display
Vibration Motor
Touch Sensor
MAX98357A Amplifier
Adafruit Mini Oval Speaker
LED and 220 Ohms Resistor
Jumper cables

Software

Azure OpenAI Service with a gpt-4o model deployed
Azure Speech Service
Python 3.x
Required Python libraries: os, time, base64, requests, python-dotenv, requests, RPi.GPIO, gpiozero, openai, Adafruit-SSD1306, adafruit-python-shell, pillow==9.5.0, pygame
Other libraries and tools: git, curl, libsdl2-mixer-2.0-0, libsdl2-image-2.0-0, libsdl2-2.0-0, libopenjp2-7, libcap-dev, python3-picamera2, i2samp.py

Wiring

Please check my blog post for further wiring information: Hear the world using Azure OpenAI and a Raspberry Pi - marcogerber.ch

Raspberry Pi setup

Enable I2C serial communication protocol in raspi-config:

sudo raspi-config > Interface Options > I2C > Yes > Finish
sudo reboot

Install missing libraries and tools:

sudo apt-get install git curl libsdl2-mixer-2.0-0 libsdl2-image-2.0-0 libsdl2-2.0-0 libopenjp2-7
sudo apt install -y python3-picamera2 libcap-dev

Install I2S Amplifier prerequisites:

sudo apt install -y wget
wget https://github.com/adafruit/Raspberry-Pi-Installer-Scripts/raw/main/i2samp.py
sudo -E env PATH=$PATH python3 i2samp.py

Create a Python virtual environment:

python3 -m venv --system-site-packages .venv
source .venv/bin/activate

Install Python modules:

python3 -m pip install python-dotenv requests RPi.GPIO gpiozero openai Adafruit-SSD1306 adafruit-python-shell pillow==9.5.0 pygame

Update the .env file in the root directory of the project with your own values:

AZURE_OPENAI_ENDPOINT=<your_azure_openai_endpoint>      # The endpoint for your Azure OpenAI service.
AZURE_OPENAI_API_KEY=<your_azure_openai_api_key>        # The API key for your Azure OpenAI service.
AZURE_OPENAI_DEPLOYMENT=<your_azure_openai_deployment>  # The deployment name for your Azure OpenAI service.
SPEECH_KEY=<your_azure_speech_key>                      # The key for your Azure Speech service.
SPEECH_REGION=<your_azure_speech_region>                # The region for your Azure Speech service.

Functions

`display_screen()`

Updates the OLED display with the current image.

`scroll_text(display, text)`

Scrolls text on the OLED display if it exceeds the screen size.

`vibration_pulse()`

Activates the vibration motor for a short pulse.

`encode_image(image_path)`

Encodes the image at the given path to a base64 string.

`play_audio(audio_file_path)`

Plays the audio file at the given path.

`synthesize_speech(text_input)`

Synthesizes speech using Azure Speech services from the given text input.

`main()`

Main loop that initializes the system, waits for user input via the touch sensor, captures and analyzes images, and provides audio feedback.

Running the project

Ensure your Raspberry Pi is properly set up with the necessary hardware and software prerequisites.
Run the Python script:
```
python3 main.py
```
The system will initialize and display "Device is ready" on the OLED screen as well as play an audio description.
Touch the sensor to capture an image, which will be analyzed and described using Azure OpenAI services. The description will be played back via audio.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
audio		audio
includes		includes
snapshots		snapshots
.env		.env
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hear the world using Azure OpenAI and a Raspberry Pi

Overview

Table of contents

Requirements

Hardware

Software

Wiring

Raspberry Pi setup

Functions

`display_screen()`

`scroll_text(display, text)`

`vibration_pulse()`

`encode_image(image_path)`

`play_audio(audio_file_path)`

`synthesize_speech(text_input)`

`main()`

Running the project

About

Releases

Packages

Languages

gerbermarco/hear-the-world

Folders and files

Latest commit

History

Repository files navigation

Hear the world using Azure OpenAI and a Raspberry Pi

Overview

Table of contents

Requirements

Hardware

Software

Wiring

Raspberry Pi setup

Functions

display_screen()

scroll_text(display, text)

vibration_pulse()

encode_image(image_path)

play_audio(audio_file_path)

synthesize_speech(text_input)

main()

Running the project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`display_screen()`

`scroll_text(display, text)`

`vibration_pulse()`

`encode_image(image_path)`

`play_audio(audio_file_path)`

`synthesize_speech(text_input)`

`main()`

Packages