Skip to content

AccelerationConsortium/CV-HTE-Tutorial

Repository files navigation

Table of contents

  1. Getting Started
  2. Installation
  3. Running Inference
  4. Training Your Own Model
  5. Datasets

1. Getting started

To help experimental scientists get hands-on experience with training their own YOLO-based object detection models, we've prepared a step-by-step tutorial, using the detection of different material phases (e.g. solid, heterogeneous liquid, homogeneous liquid etc) in a glass vial as a working example. We also recommend looking at Roboflow's official documentation for guides on dataset annotation, to complement this tutorial.

What’s Included

A quick overview of the ready-to-use Python scripts:

  1. dataset.py – Converts annotated data (from Roboflow) into YOLO format.

  2. train.py – Splits the dataset into train/validation sets and trains a YOLOv8 model.
    Key parameters: batch size, image size, epochs.
    Outputs: best.pt weights, training metrics.

  3. test.py – Runs inference on new images.
    Outputs: normalized bounding box coordinates, class predictions, and confidence scores for each detected object in a .txt file and as an annotated image for each image.

2. Installation

2.1 Create a new virtual environment

Using Conda (recommended for all environments):

conda create -n cv-hte-tutorial python=3.9
conda activate cv-hte-tutorial

OR using venv:

For macOS/Linux
python3.9 -m venv cv-hte-tutorial
source cv-hte-tutorial/bin/activate
For Windows
python -m venv cv-hte-tutorial
cv-hte-tutorial\Scripts\activate

2.2 Clone the repositories

Clone our tutorial repository

git clone https://github.com/ac-rad/CV-HTE-Tutorial.git
git lfs install
cd CV-HTE-Tutorial
git lfs pull

Clone YOLOv5

git clone https://github.com/ultralytics/yolov5

2.3 Install dependencies

cd yolov5
pip3 install -r requirements.txt
cd ..
pip3 install -r requirements.txt

⚠️ If you encounter issues with torch installation, run this command:

pip3 install torch==2.2.0+cu118 torchvision==0.17.0+cu118 torchaudio==2.2.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

⚠️ Due to compatibility issues with NumPy 2.x, we require NumPy 1.x. This is already specified in requirements.txt, but if you encounter related errors, reinstall it manually with:

pip install numpy==1.26.3

3. Running inferences on a trained model

For this tutorial, we've provided pre-trained model weights (best.pt and best_nocap.pt) for phase and vial detection, respectively.

  • Place your image dataset as a subfolder (e.g. user-study) in datasets folder.
  • Update DATASET_PATH and RESULTS_PATH in test.py. For example:
DATASET_PATH = "./datasets/user-study/" #This should point directly to the image dataset folder
RESULTS_PATH = "./output/user-study-annotations" #This should point to the folder where you want the results to be saved
  • Run the script:
python3 test.py

This will output .txt files containing absolute bounding box coordinates (x1, y1, x2, y2 in pixels), predicted classes, and confidence scores for each test image. The images with the predicted bounding boxes annotated are also saved.

4. Training your own YOLO model

4.1 Preparing the training dataset

  • Download and place RoboFlow annotations in datasets directory.
  • Update source and target directories in dataset.py (the target directory need not exist). For example:
SOURCE_DIR = "./datasets/CV-MOF.v6i.yolov8"
DESTINATION_DIR = "./datasets/cropped_images"
  • Set the class lists SOURCE_CLASS_LIST (i.e. the list of classes in data.yaml in RoboFlow annotation) and the target classes TARGET_CLASS_LIST.
  • Target classes need not be in any particular order unless combining with other datasets, in which case it should be set to the class list of the dataset to be merged in. For example:
SOURCE_CLASS_LIST = ['Empty', 'Hetero', 'Homo', 'Residue', 'Solid']
TARGET_CLASS_LIST = ["Homo", "Hetero", "Empty", "Residue", "Solid"]

-Run dataset.py, to crop out images and annotations in YOLO format

python3 dataset.py

4.2 Training the model

  • Edit dataset.yaml to point to the correct path to the training dataset you just created, and ensure that the order of the classes matches TARGET_CLASS_LIST set in dataset.py.

  • Run train.py

python3 train.py

⚠️ If you encounter memory related errors or GPU related errors try reducing the match size and using CPU instead for training.

Once you are finished training, place the trained model in ./models/combined and update the path to this model in ./test.py to use it.


5. Datasets

Here, we provide a non-exhaustive list of available chemistry-related image datasets.

Dataset name Description DOI
Vector-LabPics 2187 images of chemical experiments with materials within mostly transparent vessels in various laboratory settings and in everyday conditions such as beverage handling 10.5281/zenodo.3697451
ChemEq25 4,599 annotated images of chemical laboratory equipment, designed to advance machine learning applications in real-time equipment detection 10.17632/zptphkynt6.3
HeinSight3.0 6,800 annotated images capturing vials with diverse phase compositions, including combinations of empty regions, homogeneous liquids, and heterogeneous mixture 10.5281/zenodo.11053822
HeinSight2.0 823 images capturing vials with diverse phase compositions, including combinations of empty regions, homogeneous liquids, and heterogeneous mixture doi.org/10.1039/d3sc05491h

5.1 Processing LabPics Dataset

  • Edit the processLabPics.py file to extract annotations from the LabPics dataset.
  • LABPICS_VESSEL_CLASSES defines the classes to extract, by default it is set to all class labels in LabPics which are "Vessels". This creates a dataset of various vessels in common lab settings.
  • TEST_SPLIT defines the ratio of images to randomly split into test vs train.
  • To augment this dataset with vessel annotations from a lab setup for your use case, simply copy over YOLO annotations and images from your vessel dataset into the train/images, train/labels and val/train,val/labels of the vessel dataset extracted from LabPics.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages