Skip to content

App that helps people with limited vision see the world by using an Image Captioning DL model, Ionic Angular, translation API and a Text2Speech library. Works in 11 languages.

License

Notifications You must be signed in to change notification settings

LaloVene/Image-Captioning-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image-Captioning-Project

By Eduardo Venegas, Moises Chávez, Leonardo Galindo, and Alberto Cortés.

DEMO: https://youtu.be/mwhK7TP2GxQ

📖 Project Overview

Technology is a great way to help those in need, as it continous to develop it also presents new possibilities, one such being human vision aided and complemented by computer vision.

This project consists of a web and mobile based app that generates captions from images captured by the device's camera. The goal of the app is to help people with limited vision to see the world around them by using an easy to use UI from where the users can take a picture, upload it to the app a be presented with an audio caption in multiple languages of what they are seeing.

This can be run in a Web Browser or compiled to an Adroid or IOS device using the Ionic Capacitor tool.

License Monitoring

📍 Table of Contents

💻 Technologies Used

  • Docker
  • Google CLoud
  • Ionic Angular
  • TensorFlow
  • Github Actions
  • Flask
  • Python
  • Typescript

📚 Workflow

The architecture of the app is composed by a client app that captures images and a server app that process the images using a Deep Learning model and returns the generated captions.

Architecture

  • The client app is an Ionic Angular app takes a picture and encodes the captured image to a base64 string that is sent through a POST request to the server.
  • The server app is a Flask container running in Docker that holds a trained Captioning Neural Network composed of a Convolutional Neural Network to extract features of an image and a Recurrent Neural Network that generated captions from the features of the image using a Long Short Term Memory model.
  • Once an image is passed as input to the Captioning Neural Network it returns a text caption that is returned to the client app.
  • Finally, the client app presents the text caption as an audio output that is being translated to any of the 11 available languages using text-to-speech, once the audio is finished the app returns to the main activity.
  • The client app can be used in a Web Browser or compiled to an Android or IOS device by using Ionic.
  • Response time under 1.5 seconds

image

🔍 Site Overview

Home Page

Users can select the language of the audio output and press the button that launches a camera intent.

image image

Supported Languages

Currently 11 languages are being supported, handling translations and pronunciation.

image

Camera Page

The user can capture an image and confirm or reject the captured image.

image

🤖 CI/CD

This project has a full Continuous Integration and Delivery system.

  • All code is tested the moment a pull request is created by building it in Github Actions
  • You can merge into main when all tests pass.
  • When Continuous Delivery is triggered, Github Actions builds the API image and pushes it into a Github Package Registry.
  • SSHs into a Google Cloud instance, pull the new images, stop the current docker compose and run it again.
  • Also Github Actions connects with Firebase to deliver automatic client deployments.
  • As an extra, an Android an IOS app can be compiled from the main source code. CI/CD

⬇️ Installation

Make sure you have python3 and pip installed

Create and activate virtual environment using virtualenv

$ python -m venv python3-virtualenv
$ source python3-virtualenv/bin/activate

Use the package manager pip to install all dependencies

pip install -r requirements.txt

Install the node modules

npm i

💼 Usage

Make sure to have ionic installed

$ flask run
$ ionic serve

📝 Contributing

Contributions are welcome! Please refer to the guidelines.

About

App that helps people with limited vision see the world by using an Image Captioning DL model, Ionic Angular, translation API and a Text2Speech library. Works in 11 languages.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •