Skip to content

Driving environment detection using a multiclass classifier

Notifications You must be signed in to change notification settings

VTTI/Driving-Environment-Detection

Repository files navigation

Driving Environment Detection (Locality Calssification)

Introduction

Driving Environment/Locality has significant impact on driving styles, speed, driver attention and various other factors that help study and improve driver safety in both traditional and autonomous driving systems. This project aims to percive and identify driving environment from image/video feeds based on the visual cues contained in them.

Code use: Setting up Docker Environment and Dependencies

  • Step 1: Clone the repository to local machine
    git clone https://github.com/VTTI/Driving-Environment-Detection.git 
  • Step 2: cd to downloaded repository
    cd [repo-name]
  • Step 3: Build the docker image using Dockerfile.ML
    docker build -f Dockerfile.ML -t driving_env .
  • Step 4: Run container from image and mount data volumes
    docker run -it --rm -p 9999:8888 -v $(pwd):/opt/app -v [path to data]:/opt/app/data --shm-size=20G driving_env
    example:
    docker run -it --rm -p 9999:8888 --user=12764:10001 -v $(pwd):/opt/app -v /vtti:/vtti --gpus all --shm-size=20G driving_env
  • You may get an error
    failed: port is already allocated
    If so, expose a different port number on the server, e.g. '9898:8888'
  • If you wish to run the jupyter notebook, type 'jupyter' on the container's terminal
  • On your local machine perform port forwarding using
    ssh -N -f -L 9999:localhost:9999 [email protected] 

Dataset Information

Organize the data as follows in the repository. We use a custom dataset 70/20/10 train/val/test split respectively the dataset compiled from SHRP2 and Signal Phase video data. Our data set contains:

  • 17174 training images.
  • 2120 validation images.
  • and 2147 test images.
./
 |__ data
        |__ Interstate
        |__ Urban
	|__ Residential
        

Models

  1. Baseline built on resnext50 backbone : To run and train the model use the configs/config_baseline.yaml file as input to --config flag and run.

  2. Baseline_2 built on Vision Transformer backbone : To run and train the model use the configs/config_ViT.yaml file as input to --config flag and run.

  3. To test a model with pretrained weights. Use --mode='test'/'test_single' and appropriate config file as input to --config flag and run.

To run the code

cd /opt/app
python main.py \
--config [optional:path to config file] \
--mode ['train', 'test', 'test_single'] \
--comment [optional:any comment while training] \
--weight [optional:custom path to weight] \
--device [optional:set device number if you have multiple GPUs]

Training & Testing

We trained the network on train and validation sets and tested its performance on a test set that the network never sees during training. The performance of the network is evaluated based on a combination of its loss, F-score and accuracy curves for training and validation, and its performance on the same metrics with the test data. Further, we also analyze the saliency maps of the calssified images to gather insights on the basis of classification. Note that all models are initialized with pretrained weights from training on ImageNet calssification task.

Resnext50

Training and Validation

The best model obtianed from training with various configurations of optimizers and hyperparameters including learning rate and epochs is with the use of AdamW optimizer. We trained the network for 200 epochs and ploted the performance curves which are as shown here.

1 1 1

Test

The results obtained by this base line on the entire test set :

  • Loss: 0.6871
  • Fscore: 71.15%
  • Confusion Matrices by class:
    • residential [tp,tn,fp,fn] : [369, 1313, 271, 193]
    • Urban [tp,tn,fp,fn] : [650, 935, 233, 328]
    • Interstate [tp,tn,fp,fn] : [501, 1418, 122, 105]
  • Accuracy : 80.55%

The confusion matrix on test set is as follows:

1

Vision Transformer

Training and Validation

Alternate model was trained using Vision Transformer abd best wegiths for this were from training with various configurations of optimizers and hyperparameters including learning rate and epochs is with the use of AdamW optimizer. We trained the network for 200 epochs and ploted the performance curves which are as shown here.

1 1 1
Test

The results obtained by this vit model on the entire test set :

  • Loss: 0.925
  • Fscore: 56.5%
  • Confusion Matrices by class:
    • residential [tp,tn,fp,fn] : [339, 1217, 329, 234]
    • Urban [tp,tn,fp,fn] : [516, 800, 335, 468]
    • Interstate [tp,tn,fp,fn] : [332, 1289, 268, 230]
  • Accuracy : 71.45%

The confusion matrix on test set is as follows:

1

Saliency

Some examples of saliency maps observed for each class.

  • Interstate
1
  • Urban
1
  • Residential
1

About

Driving environment detection using a multiclass classifier

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published