Image Captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions. This work (Mohammad Mohammadifar's master thesis) makes captions for images in Persian.
Example 1 | Example 2 |
---|---|
![]() |
![]() |
کودکی با لباس آبی در حال بازی با توپ قرمز است | دختربچه ای در حال بازی است |
Install requirements by
pip install -r requirements.txt
This work uses the Flicker8 Dataset which is available here. You can donwload and unzip it by command bellow:
wget https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip
unzip Flickr8k_Dataset.zip
In this section three steps of pre-processing are needed which are as fallows
First part will extract features from Flicker8 dataset and save them into a features.pkl
file
python 1_feature_exctract.py
Second part will prepare texts from the farsi_8k_human.txt
and save them into a descriptions.txt
file
python 1_text_prep.py
Third part will train a tokenizer based on train set image descriptions and save it into a tokenizer.pkl
python 1_tokenizer.py
In this section we train our Persion image captioning model based on features.pkl
, descriptions.txt
and tokenizer.pkl
. This process may take a while. At the end it will make some files like model-ep*-loss*-val_loss*-attention-final.h5
and each of them can be used for evaluation section. You should rename your preferred one to model.h5
. You can either downlowad our pretrianed model from here.
python 2_train_nic2.py
In this section you can evaluate the trained model and then test it on any given image
This part will evaluate the model on test data using BLEU score
python 3_eval.py
You can get the caption of your images using bellow command
python test.py [path_to_image]
For example for bellow picture we should have
$ python test.py test.jpg
startseq یک زن در حال عکس گرفتن از یک صخره بزرگ است endseq