Natalia Valderrama1,2, Paola Ruiz Puentes1,2*, Isabela Hernández1,2*, Nicolás Ayobi1,2, Mathilde Verlyck1,2, Jessica Santander3, Juan Caicedo3, Nicolás Fernández4,5, Pablo Arbeláez1,2
* Equal contribution.
1 Center for Research and Formation in Artificial Intelligence (CinfonIA).
2 Universidad de los Andes, Bogota, Colombia.
3 Fundación Santafé de Bogotá, Bogotá, Colombia
4 Seattle Children’s Hospital, Seattle, USA
5 University of Washington, Seattle, USA
- Oral presentation and best paper nominee at MICCAI 2022.
- Proceedings available at Springer Link
- Preprint available at arXiv.
Visit the project in our website and our youtube channel.
We present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action Recognition (PSI-AVA) Dataset. PSI-AVA includes annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding. TAPIR leverages our dataset’s multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Our experimental results in both PSI-AVA and other publicly available databases demonstrate the adequacy of our framework to spur future research on holistic surgical scene understanding.
This repository provides instructions to download the PSI-AVA dataset and run the PyTorch implementation of TAPIR, both presented in the paper Towards Holistic Surgical Scene Understanding, oral presentation at MICCAI,2022.
Check out GraSP, an extended version of our PSI-AVA dataset that provides surgical instrument segmentation annotations and more data. Also check TAPIS, the improved version of our method. GraSP and TAPIS have been published in this arXiv.
In this link, you will find the sampled frames of the original Radical Prostatectomy surgical videos and the annotations that compose the Phases, Steps, Instruments, and Atomic Visual Actions recognition dataset. You will also find the preprocessed data we used for training TAPIR, the instrument detector predictions, and the trained model weights on each task.
We recommend downloading the compressed data archive and extract all files with the following commands:
$ wget http://157.253.243.19/PSI-AVA/PSI-AVA.tar.gz
$ tar -xzvf PSI-AVA.tar.gz
After decompressing and extracting all files, the link's data is organized as follows:
PSI-AVA:
|
|_TAPIR_trained_models
| |_ACTIONS
| | |_Fold1
| | | |_checkpoint_best_actions.pyth
| | |_Fold2
| | |_checkpoint_best_actions.pyth
| |_INSTRUMENTS
| | ...
| |_PHASES
| | ...
| |_STEPS
| ...
|
|_def_DETR_box_ftrs
| |_fold1
| | |_train
| | | |_box_features.pth
| | |_val
| | |_box_features.pth
| |_fold2
| ...
|
|_keyframes
|_CASE001
| |_00000.jpg
| |_00001.jpg
| |_00002.jpg
| ...
|_CASE002
| ...
...
You will find PSIAVA's data partition and annotations in the outputs/data_annotations. directory.
For further details on frame preprocessing, please read the Supplementary Material of our extended article in arXiv. Similarly, if you require frames extracted at larger frame rates, the original surgery videos, or the raw frames, please refer to the GraSP Repo.
Please follow these steps to run TAPIR:
$ conda create --name tapir python=3.8 -y
$ conda activate tapir
$ conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
$ conda install av -c conda-forge
$ pip install -U iopath
$ pip install -U opencv-python
$ pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
$ pip install 'git+https://github.com/facebookresearch/fvcore'
$ pip install 'git+https://github.com/facebookresearch/fairscale'
$ python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
$ git clone https://github.com/BCV-Uniandes/TAPIR
$ cd TAPIR
$ pip install -r requirements.txt
Our code builds upon Multi Scale Vision Transformers[1]. For more information, please refer to this work.
First, extract the "keyframes" folder extracted from PSI-AVA data link. Then locate the "keyframes" folder in this repository's ./outputs/PSIAVA/ directory.
PSI-AVA/keyframes/* ===> ./outputs/PSIAVA/keyframes/
Then, extract the instrument features computed by deformable DETR from the "Def_DETR_Box_ftrs" folder extracted from the PSI-AVA data link. Locate these files in this repository as follows:
PSI-AVA/def_DETR_box_ftrs/fold1/* ===> ./outputs/data_annotations/psi-ava/fold1/*
PSI-AVA/def_DETR_box_ftrs/fold2/* ===> ./outputs/data_annotations/psi-ava/fold2/*
Ultimately, the outputs
directory must have the following structure.
outputs
|_data_annotations
| |_psi-ava
| | |_fold1
| | | |_annotationas
| | | | ...
| | | |_coco_anns
| | | | ...
| | | |_frame_lists
| | | | ...
| | | |_train
| | | | |_box_features.pth
| | | |_val
| | | |_box_features.pth
| | |_fold2
| | ...
| |_psi-ava_extended
| ...
|_PSIAVA
|_keyframes
|_CASE001
| |_00000.jpg
| |_00001.jpg
| ...
|_CASE002
...
...
We provide our pretrained Deformable-DETR weights in this link.
If you cannot download our data from our servers, you can also download the PSI-AVA compressed archive in this Google Drive Link. Similarly, you can download the compressed Deformable-DETR weights from this link.
First, add this repository in the $PYTHONPATH
$ export PYTHONPATH=/path/to/TAPIR/slowfast:$PYTHONPATH
For training TAPIR run:
# the Instrument detection or Atomic Action recognition task
$ bash run_examples/mvit_short_term.sh
# the Phases or Steps recognition task
$ bash run_examples/mvit_long_term.sh
Task | mAP | config | run file |
---|---|---|---|
Phases | 56.55 |
PHASES | long_term |
Steps | 45.56 |
STEPS | long_term |
Instruments | 80.85 |
TOOLS | short_term |
Actions | 28.68 |
ACTIONS | short_term |
Our pretrained models are stored in PSI-AVA data link.
Add this path in the run_examples/mvit_*.sh file corresponding to the task you want to evaluate. Enable the test by setting it in the config TEST.ENABLE True
If you have any doubts, questions, issues, corrections, or comments, please email [email protected].
If you use PSI-AVA or TAPIR (or their extended versions, GraSP and TAPIS) in your research, please include the following BibTex citations in your papers.
@InProceedings{valderrama2020tapir,
author={Natalia Valderrama and Paola Ruiz and Isabela Hern{\'a}ndez and Nicol{\'a}s Ayobi and Mathilde Verlyck and Jessica Santander and Juan Caicedo and Nicol{\'a}s Fern{\'a}ndez and Pablo Arbel{\'a}ez},
title={Towards Holistic Surgical Scene Understanding},
booktitle={Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022},
year={2022},
publisher={Springer Nature Switzerland},
address={Cham},
pages={442--452},
isbn={978-3-031-16449-1}
}
@article{ayobi2024pixelwise,
title={Pixel-Wise Recognition for Holistic Surgical Scene Understanding},
author={Nicolás Ayobi and Santiago Rodríguez and Alejandra Pérez and Isabela Hernández and Nicolás Aparicio and Eugénie Dessevres and Sebastián Peña and Jessica Santander and Juan Ignacio Caicedo and Nicolás Fernández and Pablo Arbeláez},
year={2024},
url={https://arxiv.org/abs/2401.11174},
eprint={2401.11174},
journal={arXiv},
primaryClass={cs.CV}
}
[1] H. Fan, Y. Li, B. Xiong, W.-Y. Lo, C. Feichtenhofer, ‘PySlowFast’, 2020. https://github.com/facebookresearch/slowfast.