Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting(ICPR 2020)
Official Implementation of "Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting" LINK
Many thanks to BL, SFANet and CAN for their useful publications and repositories.
For complete UCF-QNRF and Shanghaitech training code, please refer to BL and SFANet respectively.
Please see models for our M-SFANet and M-SegNet implementations.
To reproduce the results reported in the paper, you may use these preprocessed datasets. This is not completed yet, and might be updated in the future.
Shanghaitech B dataset that is preprocessed using the Gaussian kernel Link
Bayesian preprocessed (following BL) Shanghaitech datasets (A&B) Link
The Beijing-BRT dataset Link (Originally from BRT)
Shanghaitech A&B Link
To test the visualization code you should use the pretrained M_SegNet* on UCF_QNRF Link (The pretrained weights of M_SFANet* are also included.)
An example code of how to use the pretrained M-SFANet* on UCF-QNRF to count the number people in an image. The test image is ./images/img_0071.jpg
(from UCF-QNRF test set).
import cv2
from PIL import Image
import numpy as np
import torch
from torchvision import transforms
from datasets.crowd import Crowd
from models import M_SFANet_UCF_QNRF
# Simple preprocessing.
trans = transforms.Compose([transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
# An example image with the label = 1236.
img = Image.open("./images/img_0071.jpg").convert('RGB')
height, width = img.size[1], img.size[0]
height = round(height / 16) * 16
width = round(width / 16) * 16
img = cv2.resize(np.array(img), (width,height), cv2.INTER_CUBIC)
img = trans(Image.fromarray(img))[None, :]
model = M_SFANet_UCF_QNRF.Model()
# Weights are stored in the Google drive link.
# The model are originally trained on a GPU but, we can also test it on a CPU.
# For ShanghaitechWeights, use torch.load("./ShanghaitechWeights/...")["model"] with M_SFANet.Model() or M_SegNet.Model()
model.load_state_dict(torch.load("./Paper's_weights_UCF_QNRF/best_M-SFANet*_UCF_QNRF.pth",
map_location = torch.device('cpu')))
# Evaluation mode
model.eval()
density_map = model(img)
# Est. count = 1168.37 (67.63 deviates from the ground truth)
print(torch.sum(density_map).item())
If you find the code useful for your research, please cite our paper:
@inproceedings{thanasutives2021encoder,
title={Encoder-decoder based convolutional neural networks with multi-scale-aware modules for crowd counting},
author={Thanasutives, Pongpisit and Fukui, Ken-ichi and Numao, Masayuki and Kijsirikul, Boonserm},
booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
pages={2382--2389},
year={2021},
organization={IEEE}
}
Erratum: In Fig. 1 of the paper, "ASSP" should be "ASPP".
You may watch this 6-minute presentation video as a short introduction.