GitHub - AccelerationConsortium/CV-HTE-Tutorial

1. Getting started

To help experimental scientists get hands-on experience with training their own YOLO-based object detection models, we've prepared a step-by-step tutorial, using the detection of different material phases (e.g. solid, heterogeneous liquid, homogeneous liquid etc) in a glass vial as a working example. We also recommend looking at Roboflow's official documentation for guides on dataset annotation, to complement this tutorial.

What’s Included

A quick overview of the ready-to-use Python scripts:

dataset.py – Converts annotated data (from Roboflow) into YOLO format.

train.py – Splits the dataset into train/validation sets and trains a YOLOv8 model.
Key parameters: batch size, image size, epochs.
Outputs: best.pt weights, training metrics.

test.py – Runs inference on new images.
Outputs: normalized bounding box coordinates, class predictions, and confidence scores for each detected object in a .txt file and as an annotated image for each image.

2. Installation

2.1 Create a new virtual environment

Using Conda (recommended for all environments):

conda create -n cv-hte-tutorial python=3.9
conda activate cv-hte-tutorial

OR using venv:

For macOS/Linux

python3.9 -m venv cv-hte-tutorial
source cv-hte-tutorial/bin/activate

For Windows

python -m venv cv-hte-tutorial
cv-hte-tutorial\Scripts\activate

2.2 Clone the repositories

Clone our tutorial repository

git clone https://github.com/ac-rad/CV-HTE-Tutorial.git
git lfs install
cd CV-HTE-Tutorial
git lfs pull

Clone `YOLOv5`

git clone https://github.com/ultralytics/yolov5

2.3 Install dependencies

cd yolov5
pip3 install -r requirements.txt
cd ..
pip3 install -r requirements.txt

⚠️ If you encounter issues with torch installation, run this command:

pip3 install torch==2.2.0+cu118 torchvision==0.17.0+cu118 torchaudio==2.2.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

⚠️ Due to compatibility issues with NumPy 2.x, we require NumPy 1.x. This is already specified in requirements.txt, but if you encounter related errors, reinstall it manually with:
pip install numpy==1.26.3

3. Running inferences on a trained model

For this tutorial, we've provided pre-trained model weights (best.pt and best_nocap.pt) for phase and vial detection, respectively.

Place your image dataset as a subfolder (e.g. user-study) in datasets folder.
Update DATASET_PATH and RESULTS_PATH in test.py. For example:

DATASET_PATH = "./datasets/user-study/" #This should point directly to the image dataset folder
RESULTS_PATH = "./output/user-study-annotations" #This should point to the folder where you want the results to be saved

Run the script:

python3 test.py

This will output .txt files containing absolute bounding box coordinates (x1, y1, x2, y2 in pixels), predicted classes, and confidence scores for each test image. The images with the predicted bounding boxes annotated are also saved.

4. Training your own YOLO model

4.1 Preparing the training dataset

Download and place RoboFlow annotations in datasets directory.
Update source and target directories in dataset.py (the target directory need not exist). For example:

SOURCE_DIR = "./datasets/CV-MOF.v6i.yolov8"
DESTINATION_DIR = "./datasets/cropped_images"

Set the class lists SOURCE_CLASS_LIST (i.e. the list of classes in data.yaml in RoboFlow annotation) and the target classes TARGET_CLASS_LIST.
Target classes need not be in any particular order unless combining with other datasets, in which case it should be set to the class list of the dataset to be merged in. For example:

SOURCE_CLASS_LIST = ['Empty', 'Hetero', 'Homo', 'Residue', 'Solid']
TARGET_CLASS_LIST = ["Homo", "Hetero", "Empty", "Residue", "Solid"]

-Run dataset.py, to crop out images and annotations in YOLO format

python3 dataset.py

4.2 Training the model

Edit dataset.yaml to point to the correct path to the training dataset you just created, and ensure that the order of the classes matches TARGET_CLASS_LIST set in dataset.py.
Run train.py

python3 train.py

⚠️ If you encounter memory related errors or GPU related errors try reducing the match size and using CPU instead for training.

Once you are finished training, place the trained model in ./models/combined and update the path to this model in ./test.py to use it.

5. Datasets

Here, we provide a non-exhaustive list of available chemistry-related image datasets.

Dataset name	Description	DOI
Vector-LabPics	2187 images of chemical experiments with materials within mostly transparent vessels in various laboratory settings and in everyday conditions such as beverage handling	10.5281/zenodo.3697451
ChemEq25	4,599 annotated images of chemical laboratory equipment, designed to advance machine learning applications in real-time equipment detection	10.17632/zptphkynt6.3
HeinSight3.0	6,800 annotated images capturing vials with diverse phase compositions, including combinations of empty regions, homogeneous liquids, and heterogeneous mixture	10.5281/zenodo.11053822
HeinSight2.0	823 images capturing vials with diverse phase compositions, including combinations of empty regions, homogeneous liquids, and heterogeneous mixture	doi.org/10.1039/d3sc05491h

5.1 Processing LabPics Dataset

Edit the processLabPics.py file to extract annotations from the LabPics dataset.
LABPICS_VESSEL_CLASSES defines the classes to extract, by default it is set to all class labels in LabPics which are "Vessels". This creates a dataset of various vessels in common lab settings.
TEST_SPLIT defines the ratio of images to randomly split into test vs train.
To augment this dataset with vessel annotations from a lab setup for your use case, simply copy over YOLO annotations and images from your vessel dataset into the train/images, train/labels and val/train,val/labels of the vessel dataset extracted from LabPics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of contents

1. Getting started

What’s Included

2. Installation

2.1 Create a new virtual environment

Using Conda (recommended for all environments):

OR using venv:

For macOS/Linux

For Windows

2.2 Clone the repositories

Clone our tutorial repository

Clone `YOLOv5`

2.3 Install dependencies

3. Running inferences on a trained model

4. Training your own YOLO model

4.1 Preparing the training dataset

4.2 Training the model

5. Datasets

5.1 Processing LabPics Dataset

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
datasets		datasets
models		models
output/insseg		output/insseg
README.md		README.md
dataset.py		dataset.py
dataset.yaml		dataset.yaml
processLabPics.py		processLabPics.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

AccelerationConsortium/CV-HTE-Tutorial

Folders and files

Latest commit

History

Repository files navigation

Table of contents

1. Getting started

What’s Included

2. Installation

2.1 Create a new virtual environment

Using Conda (recommended for all environments):

OR using venv:

For macOS/Linux

For Windows

2.2 Clone the repositories

Clone our tutorial repository

Clone YOLOv5

2.3 Install dependencies

3. Running inferences on a trained model

4. Training your own YOLO model

4.1 Preparing the training dataset

4.2 Training the model

5. Datasets

5.1 Processing LabPics Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Clone `YOLOv5`

Packages