CCCNet: An Attention Based Deep Learning
Framework for Categorized Counting of Crowd in
Different Body States
Sarkar Snigdha Sarathi Das*
Department of CSE
Bangladesh University of
Engineering and Technology
Dhaka, Bangladesh
sarathismg@gmail.com
Syed Md. Mukit Rashid*
Department of CSE
Bangladesh University of
Engineering and Technology
Dhaka, Bangladesh
mukitrashid270596@gmail.com
Mohammed Eunus Ali
Department of CSE
Bangladesh University of
Engineering and Technology
Dhaka, Bangladesh
mohammed.eunus.ali@gmail.com
Abstract—Crowd counting problem that counts the number of
people in an image has been extensively studied in recent years.
In this paper, we introduce a new variant of crowd counting
problem, namely categorized crowd counting, that counts the
number of people sitting and standing in a given image. Catego-
rized crowd counting has many real-world applications such as
crowd monitoring, customer service, and resource management.
The major challenges in categorized crowd counting come from
high occlusion, perspective distortion and the seemingly identical
upper body posture of sitting and standing persons. Existing
density map based approaches perform well to approximate a
large crowd, but lose important local information necessary for
categorization. On the other hand, traditional detection-based
approaches perform poorly in occluded environments, especially
when the crowd size gets bigger. Hence, to solve the categorized
crowd counting problem, we develop a novel attention-based
deep learning framework that addresses the above limitations.
In particular, our approach works in three phases: i) We first
generate basic detection based sitting and standing density maps
to capture the local information; ii) Then, we generate a crowd
counting based density map as global counting feature; iii)
Finally, we have a cross-branch segregating refinement phase
that splits the crowd density map into final sitting and standing
density maps using attention mechanism. Extensive experiments
show the efficacy of our approach in solving the categorized
crowd counting problem.
Index Terms—Crowd Counting, Convolutional Neural Net-
works, Attention Mechanism, Human Pose Estimation
I. I NTRODUCTION
The crowd counting problem that counts the number of
people in a given image, has gained considerable attention
in recent years due to its intense demand in video surveil-
lance, public safety, and urban planning. Counting crowd
by automatic scene analysis is a challenging task due to
occlusion, complex background, non-uniform distributions of
scale and perspective variations. A plethora of techniques have
been proposed in recent years (e.g., [1]–[3]) to address these
challenges and to increase the accuracy of crowd count in
different real-world environments.
* Equal Contribution
Fig. 1: Example Images From Our Dataset
In this paper, we introduce a new variant of crowd counting,
namely categorized crowd counting, that counts the number
of persons sitting and standing separately in a given image.
There are many practical applications of categorized crowd
counting. For example, a bank manager may want to know
the number of customers who are waiting, standing inside the
service area of the bank so that s/he can increase the on-
demand resource for better service to the customers; a bus/tram
operator may want to know the number of standing passengers
and sitting passengers in the bus/tram, which will help them
to decide on the frequency and size of transports needed in
different times of the day; a service provider may want to
know the number of standing and sitting customers in a room
to decide on the facility that they should provide. In general,
the categorized crowd counting will add a new dimension
in providing quality services especially in restaurants, banks,
airport waiting areas, subway, and public transport where
delivering quality customer service is crucial. To the best of
our knowledge, we are the first to attempt the problem of
categorized crowd counting.
Existing approaches for general crowd counting can be
largely divided into two groups: (i) the most recent density-
based approaches (e.g., [1]–[6]) that generate density of the
crowd to approximate a large crowd in outdoor environment,
and the detection based approaches that detect visible human
body parts [7], [8] to count the number of persons in a given
(mostly indoor) image. Though the density-based counting is
quite promising when counting people in a high-density crowd,
it has the following limitations: (i) For images with a low-
978-1-7281-6926-2/20/$31.00 ©2020 IEEE