Skip to content

Commit

Permalink
updated README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
samwinebrake committed Jun 30, 2023
1 parent a3fde8e commit ca0e191
Showing 1 changed file with 46 additions and 32 deletions.
78 changes: 46 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,4 @@

<!---
<a href="https://github.com/icatcherplus/icatcher_plus" class="btn btn-primary">View on GitHub</a>
-->
<!---
Repository found [here](https://github.com/icatcherplus/icatcher_plus).
-->
<!---
[![button](https://github.com/icatcherplus/icatcherplus.github.io/blob/main/images/github-mark.png?raw=true)](https://github.com/icatcherplus/icatcher_plus) [![button2](https://github.com/icatcherplus/icatcherplus.github.io/blob/main/images/quest_logo.png?raw=true)](https://quest.mit.edu/)
-->
<p align="center">
<a href="https://quest.mit.edu/">
<img src="https://github.com/icatcherplus/icatcherplus.github.io/blob/main/images/quest_logo_cropped.png?raw=true" height="80">
Expand All @@ -18,16 +9,17 @@ Repository found [here](https://github.com/icatcherplus/icatcher_plus).
</a>
</p>

---
# About iCatcher+

<!---feel free to change to whatever, this is all very loose... copied installation section from readme -->

[iCatcher+](https://doi.org/10.1177/25152459221147250) is a tool utilized for performing automatic annotation of
discrete infant gaze directions from videos collected in the lab, field or online (remotely). This process is highly
customizable; users can choose between different face detectors, infant face classifiers, and gaze direction classifiers
customizable; you can choose between different face detectors, infant face classifiers, and gaze direction classifiers
to create an optimized pipeline that produces the most accurate infant gaze annotations.

<img src="https://github.com/icatcherplus/icatcherplus.github.io/blob/main/gaze.gif?raw=true" />
<img src="https://github.com/icatcherplus/icatcherplus.github.io/blob/main/gaze.gif?raw=true" align="center"/>

<!---
![](https://github.com/icatcherplus/icatcherplus.github.io/blob/main/gaze.gif)
Expand All @@ -46,62 +38,84 @@ If you require speedy performance, prior to installing icatcher you should insta

Note2:
When using iCatcher+ for the first time, neural network model files will automatically be downloaded to a local cache folder. To control where they are downloaded to set the "ICATCHER_DATA_DIR" environment variable.

---
# iCatcher+ Pipeline
The iCatcher+ pipeline can be segmented into three distinct components: face detection, face classification, and gaze
classification. Several tests have been run on each of these components to optimize overall performance on annotation
accuracy. Each of these components has plug-and-play options to allow for higher accuracy and quicker results in varying
circumstances. Below, the default iCatcher+ pipeline is defined:

## Face Detection
### Face Detector
iCatcher+ utilizes an out-of-box face detector based on [RetinaFace](https://arxiv.org/abs/1905.00641), a robust state
of the art pretrained model with a resnet50 backbone. A variation of RetinaFace that allows for batch inference is
used in order to decrease compute time, and can be found [here](https://github.com/elliottzheng/batch-face).

## Face Classification
### Face Classifier
Although a face classifier has been trained to distinguish between infant faces and adult faces within input videos
(and is readily available for use with the flag --use_fc_model), the best approach tends to be with the default "lowest
(and is readily available for use with the flag -`-use_fc_model`), the best approach tends to be with the default "lowest
face" selector. Since studies typically require infants' faces to be below parents, this method selects the face based
on a mix of its bounding box's y-coordinate and the height:width ratio being close to 1 (parents are normally instructed to look away
from the camera during studies, so their relative bounding boxes tend to be more rectangular).

## Gaze Classification
INSERT DEMONSTRATION SIDE BY SIDE IMAGE OF RATIO BEING USED HERE

### Gaze Classifier
KHALED INSERT DETAILS HERE

---
# Usability
After installing iCatcher+, it can be easily run by following [these instructions](https://github.com/icatcherplus/icatcher_plus#running-icatcher).
**ADD MORE OR LESS HERE DEPENDING ON DEIRDRE UI**

In order to increase the accuracy of the iCatcher+ gaze coding system, several design decisions were made that, while
increasing accuracy, may affect other aspects of iCatcher+ such as increasing the run time. As a user of iCatcher+, the
desire for speed vs. accuracy, as well as the user’s resources (CPU vs. GPU), are all taken into account, and the user
may customize the iCatcher+ pipeline in order to align with whatever result they prioritize.

Two tracks of the pipeline have been created based on the compute resources of the user.
desire for speed vs. accuracy, as well as the user’s resources (CPU vs. GPU), are all taken into account, and you
may customize the iCatcher+ pipeline in order to align with whatever result you prioritize.

Two tracks of the pipeline have been created based on the compute resources of the user: a GPU track and a CPU track.

---
## Track A (GPU)
Track A (the recommended track) relies on a GPU to run the iCatcher+ pipeline.
Track A (the recommended track) relies on a GPU to run the iCatcher+ pipeline. This track utilizes the default pipeline
components specified above, but you must insert the flag `--gpu_id #`, with the # corresponding to the GPU you
would like to use (insert 0 if your system has a single GPU).

INSERT DEFAULT PIPELINE INFO/ACCURACY/RUNTIME HERE
When tested on 96 videos within the [Lookit dataset](https://osf.io/ujteb/), each averaging a runtime of 9 minutes 20 seconds, Track A automatically annotated each video
in about **20 minutes**. The iCatcher+ default pipeline was able to correctly annotate approximately 91% of these frames
when compared to two human coders.

---
## Track B (CPU)
If a GPU is not available, the iCatcher+ pipeline defaults to a CPU version of the track, which increases the run time
of the annotation process. As a result of this increase in run time, several customizations have been added to try and
keep Track B as a viable option.

### Face Detection
The default face detection model used in the iCatcher+ pipeline is **RetinaFace**. This model is compute intensive and
results in long run times when used in the CPU track. For this reason, another face detection model (**opencv_dnn**) is
offered. This model is less accurate, but allows for quicker run times if that is the priority.
The default face detection model used in the iCatcher+ pipeline is **RetinaFace**. This model is computationally
intensive and results in long run times when used in the CPU track. For this reason, another face detection model
(**an OpenCV DNN**) is offered. This model is less accurate, but allows for quicker run times if that's your priority.

If you elect to prioritize accuracy and stick with the default RetinaFace face detector, there are certain routes you
can take to decrease the natural long run time.

### Parallel Processing
DISCUSS SAVING CPUS, BATCH INFERENCE
can take to decrease the associated long run time.

#### Parallel Processing
Parallel processing is used by default when RetinaFace is run on CPU. This will split the frames into batches that will
be processed on each core of your system. After testing, a batch size of 16 frames was chosen as the default setting.
This can be customized using the `--fd_batch_size` flag. If you don't want all of your cores to be utilized in the
parallelization process (for instance if you have other things to run or work on), you can specify an amount of CPUs
to pull out of the pipeline through the `--num_cpus_saved` flag. One caveat of parallel processing input frames is that
the full video is buffered in order to divide it into batches and send out to each core. This process can be switched
off by changing the `--dont_buffer` flag, which may help with memory issues of loading the full video, but will increase
overall run time.

### Skipping Frames
DISCUSS CORRELATION BETWEEN SKIPPING FRAMES AND DECREASE RUN TIME, AT COST OF ACCURACY

When using RetinaFace, you can choose to alter the amount of frames you are running through the face detector by
selecting an amount of frames to 'skip'. For the amount of frames you specify, the last known bounding box coordinates
of an infant's face will be reused and fed back into the gaze classifier. Since videos are typically 30 FPS, infant
faces tend to stay relatively stationary between frames. Skipping frames will decrease the run time of the
pipeline, but may variably decrease the accuracy.

---
## Flags
In order to see all of the available options iCatcher+ offers, you can run:

Expand All @@ -120,5 +134,5 @@ working in Track A (GPU) vs. Track B (CPU).
| --fd_skip_frames | B | int | 0 | The number of frames to skip between each face detection. If frames are skipped, the last known bounding box is reused for the skipped frames. |
| --dont_buffer | B | True, False | False | When changed, frames will not be buffered, decreasing memory usage, but increasing processing time. Turning off the buffer also allows for live stream of results. |


---

0 comments on commit ca0e191

Please sign in to comment.