To help experimental scientists get hands-on experience with training their own YOLO-based object detection models, we've prepared a step-by-step tutorial, using the detection of different material phases (e.g. solid, heterogeneous liquid, homogeneous liquid etc) in a glass vial as a working example. We also recommend looking at Roboflow's official documentation for guides on dataset annotation, to complement this tutorial.
A quick overview of the ready-to-use Python scripts:
dataset.py– Converts annotated data (from Roboflow) into YOLO format.
train.py– Splits the dataset into train/validation sets and trains a YOLOv8 model.
Key parameters:batch size,image size,epochs.
Outputs:best.ptweights, training metrics.
test.py– Runs inference on new images.
Outputs: normalized bounding box coordinates, class predictions, and confidence scores for each detected object in a .txt file and as an annotated image for each image.
conda create -n cv-hte-tutorial python=3.9
conda activate cv-hte-tutorial
python3.9 -m venv cv-hte-tutorial
source cv-hte-tutorial/bin/activate
python -m venv cv-hte-tutorial
cv-hte-tutorial\Scripts\activate
git clone https://github.com/ac-rad/CV-HTE-Tutorial.git
git lfs install
cd CV-HTE-Tutorial
git lfs pull
git clone https://github.com/ultralytics/yolov5
cd yolov5
pip3 install -r requirements.txt
cd ..
pip3 install -r requirements.txt
⚠️ If you encounter issues withtorchinstallation, run this command:
pip3 install torch==2.2.0+cu118 torchvision==0.17.0+cu118 torchaudio==2.2.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
⚠️ Due to compatibility issues with NumPy 2.x, we require NumPy 1.x. This is already specified inrequirements.txt, but if you encounter related errors, reinstall it manually with:pip install numpy==1.26.3
For this tutorial, we've provided pre-trained model weights (best.pt and best_nocap.pt) for phase and vial detection, respectively.
- Place your image dataset as a subfolder (e.g. user-study) in
datasetsfolder. - Update
DATASET_PATHandRESULTS_PATHintest.py. For example:
DATASET_PATH = "./datasets/user-study/" #This should point directly to the image dataset folder
RESULTS_PATH = "./output/user-study-annotations" #This should point to the folder where you want the results to be saved
- Run the script:
python3 test.py
This will output .txt files containing absolute bounding box coordinates (x1, y1, x2, y2 in pixels), predicted classes, and confidence scores for each test image. The images with the predicted bounding boxes annotated are also saved.
- Download and place RoboFlow annotations in
datasetsdirectory. - Update source and target directories in
dataset.py(the target directory need not exist). For example:
SOURCE_DIR = "./datasets/CV-MOF.v6i.yolov8"
DESTINATION_DIR = "./datasets/cropped_images"
- Set the class lists
SOURCE_CLASS_LIST(i.e. the list of classes indata.yamlin RoboFlow annotation) and the target classesTARGET_CLASS_LIST. - Target classes need not be in any particular order unless combining with other datasets, in which case it should be set to the class list of the dataset to be merged in. For example:
SOURCE_CLASS_LIST = ['Empty', 'Hetero', 'Homo', 'Residue', 'Solid']
TARGET_CLASS_LIST = ["Homo", "Hetero", "Empty", "Residue", "Solid"]
-Run dataset.py, to crop out images and annotations in YOLO format
python3 dataset.py
-
Edit
dataset.yamlto point to the correct path to the training dataset you just created, and ensure that the order of the classes matchesTARGET_CLASS_LISTset indataset.py. -
Run
train.py
python3 train.py
⚠️ If you encounter memory related errors or GPU related errors try reducing the match size and using CPU instead for training.
Once you are finished training, place the trained model in ./models/combined and update the path to this model in ./test.py to use it.
Here, we provide a non-exhaustive list of available chemistry-related image datasets.
| Dataset name | Description | DOI |
|---|---|---|
| Vector-LabPics | 2187 images of chemical experiments with materials within mostly transparent vessels in various laboratory settings and in everyday conditions such as beverage handling | 10.5281/zenodo.3697451 |
| ChemEq25 | 4,599 annotated images of chemical laboratory equipment, designed to advance machine learning applications in real-time equipment detection | 10.17632/zptphkynt6.3 |
| HeinSight3.0 | 6,800 annotated images capturing vials with diverse phase compositions, including combinations of empty regions, homogeneous liquids, and heterogeneous mixture | 10.5281/zenodo.11053822 |
| HeinSight2.0 | 823 images capturing vials with diverse phase compositions, including combinations of empty regions, homogeneous liquids, and heterogeneous mixture | doi.org/10.1039/d3sc05491h |
- Edit the
processLabPics.pyfile to extract annotations from the LabPics dataset. LABPICS_VESSEL_CLASSESdefines the classes to extract, by default it is set to all class labels in LabPics which are "Vessels". This creates a dataset of various vessels in common lab settings.TEST_SPLITdefines the ratio of images to randomly split into test vs train.- To augment this dataset with vessel annotations from a lab setup for your use case, simply copy over YOLO annotations and images from your vessel dataset into the train/images, train/labels and val/train,val/labels of the vessel dataset extracted from LabPics.