TRAFFIC ANALYSIS USING VISUAL OBJECT DETECTION AND TRACKING Yi Wei 1 , Nenghui Song 1 , Lipeng Ke 2 , Ming-Ching Chang 1 , Siwei Lyu 1 1 University at Albany, SUNY 2 University of Chinese Academy of Sciences ABSTRACT Smart transportation based on big data traffic analysis is an important component of smart city. With millions of ubiq- uitous street cameras and intelligent analyzing algorithms, public transit systems of the next generation can be safer and smarter. We participated the IEEE Smart World 2017 NVIDIA AI City Challenge which consists of two tracks of contests that serve this spirit. In Track 1 contest on visual detection, we built a competitive object detector for vehi- cle localization and classification. In Track 2 contest, we developed an traffic analysis framework based on vehicle tracking that improves the surveillance and visualization of traffic flow. Both developed methods demonstrated practical, effective, and competitive performance when compared with state-of-art methods evaluated on real-world traffic videos in the challenge contest. Index Terms— Object detection, Multi-object tracking, Traffic analysis, Smart city 1. INTRODUCTION Cities around the world are built up with large surveil- lance networks for the purposes of surveillance, management, and in particular, transportation monitoring. By the end of 2020, there will be 1 billion cameras installed ubiquitously throughout the cities. While the increasing amount of street cameras provide massive big data that can make public tran- sit systems safer and smarter, at present these data are far from not well exploited. The major bottleneck is the lack of efficient automatic or semi-automatic methods to analyze the buck amount of videos with little or no human interven- tion. Nowadays, machine learning methods such as deep neu- ral network has advanced greatly in demonstrating great im- provements in image recognition [1] and object detection, that shed light upon on the breakthrough of video-based smart traffic analysis and management. In order to foster the development of efficient algorithms that can improve smart transportation and smart city, NVIDIA partnering with IEEE and academia organized the first AI City Challenge [2] in conjunction with IEEE Smart World Congress 2017. The challenge consists of two tracks in R&D contests: Track 1 focusing on the development of street/traffic object detection and classification, and Track 2 on the appli- cation of the video analytics to smart transportation including the safety, congestion, and management of the traffics in the urban scenario. As a participating team, we submitted proposal methods with contest results to both AIC challenge tracks. For Track 1 challenge, we combined two state-of-the-art object detec- tion models, namely the faster R-CNN [3] and ResNet [4] to construct a fast and accuracy object detector that are evalu- ated on the AI City Challenge dataset. For Track 2 challenge, we combined the developed object detector with hypergraph based Multi-Object Tracking (MOT) [5] and developed an ef- ficient traffic analysis method that can generate and analyze traffic flow patterns, which are demonstrated on real-world traffic videos. The paper is organized as follows. In §2, we briefly intro- duce the datasets used for the training of our methods. In §3, we discuss our object detection model. In §4 we present the method and results of our traffic analysis methods based on hyper-graph tracking. §5 concludes the paper with discussion of future works. 2. DATASETS AND CHALLENGE PREPARATION We describe the datasets (including the AI City Challenge dataset and others) we used to generate our vehicle detection module according to the challenge protocol. The NVIDIA AI City Challenge (AIC) dataset consists of 3 subset of traffic videos taken from 3 different locations including (1) a Silicon Valley intersection, (2) a Virginia Beach intersection, and (3) Lincoln, Nebraska with different video resolutions. The videos are recorded under diverse environmental and lighting conditions, ranging from day and night. About 150, 000 key frames extracted from 80 hours videos are manually annotated with bounding boxes around the objects of interest with corresponding labels. The labels for the datasets are: Car, SUV, SmallTruck, MediumTruck, LargeTruck, Pedestrian, Bus, Van, Group of People, Bi- cycle, Motorcycle, TrafficSignal-Green, TrafficSignal-Red, TrafficSignal-Yellow. For the object detection task, the whole dataset are divided into 3 subsets according to its resolu- tion. The AIC480 dataset contains videos with a resolution of 720x480 pixels. The AIC1080 dataset contains videos