Extract-text-from-image-and-audio-using-google-vision-api

We are going to see two mini projects where we will be using Google Cloud Vision API for extracting the text from the image and audio. To start with we have to get an API-key of Google-Cloud-Vision in order to use their services.

Step 1:

Setting up Your Google Platform Account

Google Cloud Platform Free an account is required to get an api-key.json file.

Sign-in to Google Cloud Console
Click “API Manager”
Click “Credentials”
Click “Create Credentials”
Select “Service Account Key”
Under “Service Account” select “New service account”
Name service (whatever you’d like)
Select Role: “Project” -> “Owner”
Leave “JSON” option selected
Click “Create”
Save generated API key file
Rename file to api-key.json

Step 2:

Convert the Audio file to .WAV file format. We can use any online tools to do it.

Step 3:(For Audio to text):

Break up the audio file into smaller parts. Google Cloud Speech API only accepts files no longer than 60 seconds. To be on safe side, either break your files in 30-seconds chunks or select audio file less than 60 seconds.

Break the large file:

We can either use any online tools or we can use an open source command line library called ffmpeg. It can be downloaded from its site and install it in your machine. Here is the command to break up the file.

First clean out old parts if needed via rm -rf parts/* Then use the command to break the file.
ffmpeg -i source/filename.wav -f segment -segment_time 30 -c copy parts/out%09d.wav

Where, source/filename.wav is the name of the input file, and parts/out%09d.wav is the format for output files. %09d indicated that the file number will be padded with 9 zeros (i.e. out000000001.wav), allowing files to be sorted alphabetically. This way ls command returns files sorted in the right order.

Step 3(For Image to text):

For Image we don’t need to do much pre-work. We have to select the image and keep them in the local directories or we have to mentioned the proper address if the location of the image is different.

Step 4:

Install the requirements.txt file using pip command which contains the required libraries.
pip install -r requirements.txt

Step 5:

Run the Code: For Audio to text: python3 audio-to-text.py
For Image to text, run the Jupyter Notebook Image-to-text-using-google-vision-api.ipynb

Step 6:

These two mini projects should gives an amazing result and it does recognize the words properly even from a song which is amazing. Same goes with the image-to-text project, it reads the words properly, but it is not able to format then properly which is something we have to take care.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Documentations for running the two mini projects.pdf		Documentations for running the two mini projects.pdf
Image-to-text-using-google-vision-api.html		Image-to-text-using-google-vision-api.html
Image-to-text-using-google-vision-api.ipynb		Image-to-text-using-google-vision-api.ipynb
LICENSE		LICENSE
Oldest-Written-Languages.jpg		Oldest-Written-Languages.jpg
README.md		README.md
audio-to-text.py		audio-to-text.py
picture1.jpg		picture1.jpg
requirements.txt		requirements.txt
sample-data.png		sample-data.png
transcript.txt		transcript.txt
wordsworthwordle1.jpg		wordsworthwordle1.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract-text-from-image-and-audio-using-google-vision-api

Step 1:

Setting up Your Google Platform Account

Google Cloud Platform Free an account is required to get an api-key.json file.

Step 2:

Step 3:(For Audio to text):

Break the large file:

Step 3(For Image to text):

Step 4:

Step 5:

Step 6:

Special thanks to Alex

About

Releases

Packages

Languages

License

chandan0709/extract-text-from-image-and-audio-using-google-vision-api

Folders and files

Latest commit

History

Repository files navigation

Extract-text-from-image-and-audio-using-google-vision-api

Step 1:

Setting up Your Google Platform Account

Google Cloud Platform Free an account is required to get an api-key.json file.

Step 2:

Step 3:(For Audio to text):

Break the large file:

Step 3(For Image to text):

Step 4:

Step 5:

Step 6:

Special thanks to Alex

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages