-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #111 from BU-Spark/research-doc-patch
Research doc patch
- Loading branch information
Showing
10 changed files
with
591 additions
and
251 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Use an official Miniconda3 as a parent image | ||
FROM continuumio/miniconda3:latest | ||
|
||
# Set the working directory in docker | ||
WORKDIR /usr/src/app | ||
|
||
# Declare argument for conda environment name | ||
ARG CONDA_ENV_NAME=trocr_env | ||
|
||
# Clone the repository | ||
RUN git clone https://github.com/BU-Spark/ml-herbarium.git . && \ | ||
git checkout dev && \ | ||
cd trocr | ||
|
||
# Create a new conda environment from the YAML file and activate it | ||
RUN conda env create -n $CONDA_ENV_NAME --file=trocr_env.yml && \ | ||
echo "conda activate $CONDA_ENV_NAME" >> ~/.bashrc | ||
|
||
# Install Jupyter and other required packages | ||
RUN conda install -n $CONDA_ENV_NAME jupyter -y && \ | ||
/opt/conda/envs/$CONDA_ENV_NAME/bin/pip install transformers==4.27.0 --no-deps && \ | ||
/opt/conda/envs/$CONDA_ENV_NAME/bin/pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_md-1.0.2.tar.gz && \ | ||
/opt/conda/envs/$CONDA_ENV_NAME/bin/pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_biobert-1.0.2.tar.gz && \ | ||
/opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_sm && \ | ||
/opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_md && \ | ||
/opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_trf | ||
|
||
# Make port 8888 available to the world outside this container | ||
EXPOSE 8888 | ||
|
||
# Run Jupyter Notebook when the container launches | ||
CMD [ "/opt/conda/envs/$CONDA_ENV_NAME/bin/jupyter", "notebook", "--ip='*'", "--port=8888", "--no-browser", "--allow-root" ] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Build and Run Instructions | ||
## **Build the Docker Image:** | ||
Navigate to the directory containing the Dockerfile and run: | ||
```sh | ||
docker build --build-arg CONDA_ENV_NAME=<your-conda-env-name> -t my-herbarium-app . | ||
``` | ||
Replace `<your-conda-env-name>` with the desired conda environment name. | ||
|
||
> ### Notes | ||
> - If you don't provide the `--build-arg` while building, the default value `trocr_env` will be used as the conda environment name. | ||
> - Remember to replace `<your-conda-env-name>` with the actual name you want to give to your conda environment when building the Docker image. | ||
## **Run the Docker Container:** | ||
### Using Docker Bind Mounts | ||
When you run your Docker container, you can use the `-v` or `--mount` flag to bind-mount a directory or a file from your host into your container. | ||
#### Example | ||
If you have the input images in a directory named `images` on your host, you can mount this directory to a directory inside your container like this: | ||
```sh | ||
docker run -v $(pwd)/images:/usr/src/app/images -p 8888:8888 my-herbarium-app | ||
``` | ||
or | ||
```sh | ||
docker run --mount type=bind,source=$(pwd)/images,target=/usr/src/app/images -p 8888:8888 my-herbarium-app | ||
``` | ||
Here: | ||
- `$(pwd)/images` is the absolute path to the `images` directory on your host machine. | ||
- `/usr/src/app/images` is the path where the `images` directory will be accessible from within your container. | ||
> ### Note | ||
> When using bind mounts, any changes made to the files in the mounted directory will be reflected in both the host and the container, since they are actually the same files on the host’s filesystem. | ||
> ### Modification in Script | ||
> We would need to modify the script to read images from the mounted directory (`/usr/src/app/images` in this example) instead of the original host directory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 45 additions & 0 deletions
45
trocr/evaluation-dataset/handwritten-typed-text-classification/research.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# [Research] TrOCR Encoder + FFN Decoder | ||
|
||
#### Overview | ||
To create a robust classification model for our task, multiple Convolutional Neural Network (CNN) models were explored and assessed. Details of each attempted model, along with their respective implementations, can be accessed in the [Introduction section](https://github.com/BU-Spark/ml-herbarium/blob/dev/trocr/evaluation-dataset/handwritten-typed-text-classification/notebooks/Classifier_NN.ipynb) of the project's Jupyter Notebook. | ||
|
||
#### Issues Encountered with CNNs | ||
During experimentation, I identified fundamental limitations with how CNNs process images containing text, affecting our ability to accurately classify text in images into either handwritten or machine-printed categories. | ||
|
||
In specific, it was observed that the text in images, particularly handwritten text, constitutes a minimal portion of the image in terms of pixel count, thereby reducing our Region of Interest (ROI). This small ROI posed challenges in information retention and propagation when image filters were applied, leading to the loss of textual details. To mitigate this, I employed the morphological operation of **erosion** on binarized images to emphasize the text, effectively enlarging the ROI. This process proved useful in counteracting some of the undesirable effects of CNN filters and preserving the integrity of the text in the images. | ||
|
||
#### Methodology | ||
Given the encountered limitations with CNNs, I approached the classification task in two primary steps to circumvent the challenges: | ||
|
||
1. **Feature Extraction with TrOCR Encoder:** | ||
Leveraged the encoder part of the TrOCR model to obtain reliable feature representations from the images, focusing on capturing inherent characteristics of text. The encoder from TrOCR was employed due to its capability to retain textual details in its feature representations, which are pivotal for decoding to characters. This stands in contrast to Convolutional Neural Networks (CNNs), which might not preserve such detailed textual information. In essence, the encoder within TrOCR ensures the conservation of textual nuances that are potentially overlooked or lost when using CNNs. | ||
|
||
2. **Training a Custom FFN Decoder:** | ||
Employed a custom Feed-Forward Neural Network (FFN) as the decoder to make predictions based on the feature representations extracted from the encoder. The model was trained specifically to discern the subtle differences in features between the two categories. | ||
|
||
This methodology enabled to maintain a high level of accuracy and reliability in our classification task while overcoming the inherent shortcomings identified in CNN models for processing images with text. | ||
|
||
#### Readings | ||
|
||
The [Sequence to Sequence Learning with Neural Networks](https://arxiv.org/abs/1409.3215) paper inspired me to use this encoder-decoder architecture. In this paper, the authors use multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Additionally, BERT-like architectures also act as an inspiration to the encoder-decoder paradigm. | ||
|
||
This approach of utilizing an FFN as a decoder, post feature extraction, is important in handling various classification tasks, especially when dealing with specialized forms of data like text in images, because it allows us to define a custom network specific to our task. | ||
|
||
#### Results Summary | ||
|
||
In our handwritten vs. typed-text classification task, the model performed impressively with an overall accuracy of \(96\%\). The test samples were handpicked to be challenging for the model to classify (since some of these were misclassified by a human). | ||
|
||
- *Handwritten Text Class:* | ||
- *Precision:* \(97.96\%\) | ||
- *Recall:* \(96.00\%\) | ||
- *F1-Score:* \(96.97\%\) | ||
- *Support:* 50 samples | ||
|
||
- *Typed Text Class:* | ||
- *Precision:* \(96.23\%\) | ||
- *Recall:* \(98.08\%\) | ||
- *F1-Score:* \(97.14\%\) | ||
- *Support:* 52 samples | ||
|
||
The balanced performance across both classes, as shown in the nearly identical macro average and weighted average metrics, demonstrates the model's robustness in distinguishing between handwritten and typed texts. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.