ArchaeolDA

Data Augmentation tool for Deep Learning algorithms

Data Augmentation (DA) is a technique to MULTIPLY the small amount of training dataset by creating new data from it in order to train your Deep Learning (DL) algorithm, as it is common in Computational Archaeology.

The methods developed here are random translation, rotation, resizing, elastic deformation, the so-called Doppelgänger technique, and refinement. This tool is designed for the Mask R-CNN segmentation algorithm entry (Waleed, 2017) but the code can be adapted to others like YOLO using VIAtoYOLO.

VIAtoYOLO: https://github.com/iberganzo/VIAtoYOLO

Mask R-CNN: https://github.com/matterport/Mask_RCNN

Workflow

First of all, you need to annotate all the images you will use as training, validation and testing data. To label the features to be detected in them you can use VGG Image Annotator (VIA) tool from the University of Oxford (Dutta and Zisserman, 2019).

VIA: https://www.robots.ox.ac.uk/~vgg/software/via/via_demo.html

As an example, to understand the workflow, we will detect different Greek sculptures in pictures by the hand of Homer Simpson.

Therefore, to automate and accurate the process, we will train a DL segmentation algorithm, but since we only have a single Hellenistic Venus de Milo image as training data, we will implement DA. We will first use VIA tool to label the Venus de Milo we will use for the DA.

Save the labeled images in the /1_PNG/img/ folder and each corresponding JSON file in the /1_PNG/json/ folder. Keep the same name for the image as for its associated JSON file. In addition to the /1_PNG/img/ folder, save each training, validation, and testing image to its corresponding folder /1_PNG/train/, /1_PNG/val/ and /1_PNG/test/. Now run the python codes 1_PNG, 2_Polygons and 3_Objects to crop the labeled features for use as DA.

python3 1_PNG.py
python3 2_Polygons.py
python3 3_Objects.py

Translation

Before creating the labeled synthetic images thanks to the DA, we have to attend to a series of configuration parameters in the 4_DA python code, as well as add several background images (without desired objects) in the /4_DA/img/ folder.

DA Dataset

createBackImages = 0 # Background images creation: 0: PNG images, 1: TIFF images, 2: Already created
backImagesStorage = 1 # Number of original background images
backImagesNumberIni = 1 # Initial number of background back%d.png images to use
backImagesNumber = 10 # Number of background back%d.png images to use
numBacksImg = math.ceil(backImagesNumber/backImagesStorage) # Number of back%d.png per background image
cropImagesPerBackImages = 30 # Number of cropped crop%d.png objects to be in each background image
cropImageStoreSize = 1 # List of cropped crop%d.png objects to be used
cropImageStoreSizeInitial = 1 # Initial number of the list of cropped crop%d.png objects to be used

DA Configuration

margin = 200 # Maximun size of an object
fileWeigthTh = 0 # Minimun file size for DA Resizing: 0: All, e.g. 4000: 4 KB
maxPointsNumber = 250 # Maximun number of points in a object polygon
timeoutMin = 5 # Minutes for a timeout during a synthetic image creation

Now, run the 4_DA python code and create new synthetic images. The default DA technique is random translation, copy and paste randomly chosen features for DA into a new random location.

python3 4_DA.py

Rotation

Set daRotate to 1 in the 4_DA python code to randomly rotate (between 0 and 359 degrees) the translated features for DA in the synthetic images.

python3 4_DA.py

Resizing

Set daResize to 1,2 or 3 in the 4_DA python code to randomly resize the translated features for DA in the synthetic images. Also, configure the parameters of this technique to your needs.

minResized = 45 # Minimun object size after DA Resizing
maxResized = margin - 1 # Maximun object size after DA Resizing

python3 4_DA.py

Elastic Deformation

Set daElastic to 1 in the 4_DA python code to randomly elastically deform the translated features for DA in the synthetic images. Also, configure the parameters of this technique to your needs.

nDiamRMax = 3 # The maximum side ratio for DA Elastic Deformation

python3 4_DA.py

Doppelgänger

But what happens when the object for DA has a hole inside like a Doughnut? The object and the random background context of its hole will be incorrectly added as the desired feature to train the algorithm.

To avoid this, you can use the so-called Doppelgänger technique from our ArchaeolDA tool. This technique makes it possible to ensure that these random background contexts (the ones in the Doughnut hole) are also included as negative training by copying them out of the object feature.

Let's see a case where we will use a Doughnut for DA! Set doppel to 1 in the 4_DA python code to use the Doppelgänger technique.

python3 4_DA.py

As can be seen in the image below, the face of the statue of Homer, which is a ramdon background context, is duplicated to ensure that the algorithm learns exactly what a Doughnut is (positive training) and understands the rest as background (negative training). The same happens for example with the grapes, the leg of the table, Marge's belt and the ceiling of the room.

Refinement

Sometimes, after the initial training, we find a series of false positives similar to the object feature used to train the algorithm. Therefore, it is recommended to add a refinement step that includes this series of false positives in the training as negative. To do this, the ArchaeolDA tool allows us to include those false positives before creating the DA. We can see an example below where we will use Homer's donut-head as a false positive for Doughnut training. Add the false positives in the /4_DA/FP/ folder and run the Refinement python code from the /4_DA/ folder.

backImagesStorage = 1 # Number of original background images
backImagesNumber = 10 # Number of background back%d.png images to use
numBacksImg = math.ceil(backImagesNumber/backImagesStorage) # Number of back%d.png per background image
createBackImages = 0 # Background images creation: 0: PNG images, 1: TIFF images, 2: Already created
imgFP = 50 # Average number of each false positive per background image

rotateFP = 1 # Negative training data rotation
margin = 200 # Maximun size of a false positive

python3 4_DA/Refinement.py
python3 4_DA.py

Image Division

Likewise, this tool allows us to divide the generated images into a more affordable size for computation, keeping the multiple objects created labeled to use them directly in the training of the DL algorithm. A via_region_data.json file associated with the training, validation, and test data will be generated.

tesela = 512 # Image size to train, validate, test and detect in pixels
addPointsNumber = 8 # Number of points between labeled points

python3 5_Split.py

tesela = 512 # Image size to train, validate, test and detect in pixels
trainVal = 0 # 0: To create training data
			# 1: To create validation data
			# 2: To create test data

python3 6_Merge.py

trainVal = 0 # 0: To create training data
			# 1: To create validation data
			# 2: To create test data

python3 7_Data.py

Citation

To cite this repository:

Berganzo-Besga, I. ArchaeolDA: Data Augmentation tool for Deep Learning algorithms. GitHub repository 2023. Available online: https://github.com/iberganzo/ArchaeolDA

This repository was created thanks to:

Orengo, H.A.; Garcia-Molsosa, A.; Berganzo-Besga, I.; Landauer, J.; Aliende, P.; Tres- Martínez, S. New developments in drone-based automated surface survey: Towards a functional and effective survey system. Archaeol. Prospect. 2021, 1–8. https://doi.org/10.1002/arp.1822

Berganzo-Besga, I.; Orengo, H.A.; Lumbreras, F.; Alam, A.; Campbell, R.; Gerrits, P.J.; Gregorio de Souza, J.; Khan, A.; Suárez-Moreno, M.; Tomaney, J.; Roberts, R.C., Petrie, C.A. Curriculum learning-based strategy for low-density archaeological mound detection from historical maps in India and Pakistan. Sci Rep 13, 11257 (2023). https://doi.org/10.1038/s41598-023-38190-x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArchaeolDA

Data Augmentation tool for Deep Learning algorithms

Workflow

Translation

Rotation

Resizing

Elastic Deformation

Doppelgänger

Refinement

Image Division

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
1_PNG		1_PNG
2_Polygons		2_Polygons
3_Objects		3_Objects
4_DA		4_DA
5_Split		5_Split
6_Merge/train		6_Merge/train
7_Data/train		7_Data/train
1_PNG.py		1_PNG.py
2_Polygons.py		2_Polygons.py
3_Objects.py		3_Objects.py
4_DA.py		4_DA.py
5_Split.py		5_Split.py
6_Merge.py		6_Merge.py
7_Data.py		7_Data.py
README.md		README.md

iberganzo/ArchaeolDA

Folders and files

Latest commit

History

Repository files navigation

ArchaeolDA

Data Augmentation tool for Deep Learning algorithms

Workflow

Translation

Rotation

Resizing

Elastic Deformation

Doppelgänger

Refinement

Image Division

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages