quaternion-phocnet

Introduction

PHOCNet is a state-of-the-art deep CNN for Keyword Spotting (KWS) in handwritten documents. Using Pyramidal Histogram of Characters (PHOC) as labels, PHOCNet can achieve outstanding performance in KWS for both Query-by-Example (QbE) and Query-by-String (QbS).

In our work, we transform PHOCNet from a conventional CNN to a Quaternionic CNN (QCNN). The objective is that with QCNN we can create a parameters-efficient network (appr. 1/4 of parameters) with equivalent (or better) performance and better generalization ability. We focused on the case of QbE, not limiting our system to work for QbS with minimum adaptations. The final system produced uses KWS to retrieve pages (in image format) from a collection containing a query word.

Installation

Use requirements.txt file to setup the environment with the necessary dependencies.

Usage

Training

To train a Q-PHOCNet model run train.py script with minimum arguments as follows:

python train.py -ds <dataset_name> -sn <trained_model_name>.pt

Available datasets for training so far are GW and IAM. Using another dataset needs implementation of torch.utils.data.Dataset class accordingly. Other useful training arguments:

option	description
-lrs	learning rate step
-gpu_id	the ID of the GPU
-pul	PHOC unigram levels

For all available arguments see def train() in train.py.

Retrieval

To retrieve images that contain a QbE run retrieval_with_qbe.py as follows:

python retrieval_with_qbe.py -ds <dataset_name> -i <path_to_doc_collection> -m <trained_model_path>

The query image is specified by the user at runtime.

Datasets

We trained our Q-PHOCNet on following datasets:

GW dataset: The dataset is single-writer and contains 4,894 words. We applied data augmentation to get a total of 500,000 word instances.
IAM dataset: The dataset is multi-writer (657 writers) and contains 115,320 words.

Examples

QbE results for the word "about" of arbitrary writing style

QbE results for the word "last" of arbitrary and curved writing style

This example demonstrates the tolerance of Q-PHOCNet in using distorted word images as queries.

Evaluation

Retrieval metrics

$mAP = \frac{\sum_{q=1}^{Q} AveP(q)}{Q}$ , where:

q: current query
Q: total number of queries
P(q): precision of query q

$mAP_2 = \frac{\sum_{q=1}^{Q} (AP@n)_q }{Q}$, $AP@n = \frac{1}{GTP} \sum_{k=1}^{n} P@k \times rel@k$, where:

q: current query
Q: total number of queries
AP@n: average precision at n
GTP: number of true positives
n: number of results we interested in
P@k: precision at k
rel@k: relevance function equal to 1 if k element is relative to query, 0 otherwise

Performance

		GW	IAM
	params (millions)	mAP (%)	mAP (%)
Q-PHOCNet (full)	17.8	96.15	72.12
PHOCNet (1/2)	18	95.45	69.55
Q-PHOCNet (1/2)	4.5	95.55	56.84
PHOCNet (1/4)	4.6	94.14	54.32
Q-PHOCNet (1/4)	1.1	85.13	34.49
PHOCNet (1/8)	1.2	81.49	27.17

Generalization

A generalization ability evaluation of our system was made using 10 words of various, arbitrary and distorted writing styles. The words and result are shown below:

	$mAP_2$ (%)
Q-PHOCNet (full)	86.6

Citations

Disclaimer:

This project incorporates parts from other repositories. Corresponding licenses and repositories are included. In third-party files original repository is listed in comments, too.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
experiments		experiments
images		images
retrieval		retrieval
src		src
LICENSE		LICENSE
LICENSE-BSD		LICENSE-BSD
LICENSE-GPL		LICENSE-GPL
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

quaternion-phocnet

Introduction

Installation

Usage

Training

Retrieval

Datasets

Examples

QbE results for the word "about" of arbitrary writing style

QbE results for the word "last" of arbitrary and curved writing style

Evaluation

Retrieval metrics

Performance

Generalization

Citations

Disclaimer:

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Licenses found

lezaf/quaternion-phocnet

Folders and files

Latest commit

History

Repository files navigation

quaternion-phocnet

Introduction

Installation

Usage

Training

Retrieval

Datasets

Examples

QbE results for the word "about" of arbitrary writing style

QbE results for the word "last" of arbitrary and curved writing style

Evaluation

Retrieval metrics

Performance

Generalization

Citations

Disclaimer:

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages