SEGMENT-WISE ONLINE LEARNING BASED ON GREEDY ALGORITHM FOR REAL-TIME MULTI-TARGET TRACKING Changhoon Lee, and Chang D. Yoo Korea Advanced Institute of Science and Technology Department of Electrical Engineering 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Korea ABSTRACT This paper proposes a tracklet-based algorithm for online multiple-target tracking. The algorithm performs tracking in three steps: (1) tracklet initialization, (2) tracklet refine- ment, and (3) tracklet association. Given detection responses, tracklets are initialized by finding a near-optimum path in the min-cost flow network using a greedy-based algorithm. Based on an appearance-based model, the tracklets are refined so that the detection responses within the tracklet become more homogeneous. Finally, the tracklets are linked based on a novel affinity measure, then by optimizing a min-cost flow network with links, the final tracks are generated. For real-time multi-target tracking, every step is processed in a segment-wise manner. On popular public datasets and strictly in an online fashion, the proposed multi-target tracking algo- rithm performed comparable to that of many state-of-the-art algorithms. Index Terms— Multi-target, tracking, online, tracklet, greedy algorithm 1. INTRODUCTION Multiple-target tracking has received a great deal of interest for applications in surveillance, traffic control, activity recog- nition and sports video analysis. Occlusion and variations in appearance and illumination render it a difficult problem. Various algorithms have been proposed, and some notable state-of-the-art algorithms are as follows. Wang et al. [1] use appearance-model based metric learning to determine the affinity between the tracklets in generating an occlusion ro- bust tracking algorithm. Pauwels et al. [2] use a mixture model of dense motion and stereo cues with feedback to gen- erate robust tracklets. These algorithms achieve their state-of- the-art performances in a manner which is not strictly online. This is a serious limitation that hinder their applicability to many applications. This work was supported by ICT R&D program of MSIP/IITP. [B0101- 15-0307, Basic Software Research in Human-level Lifelong Machine Learn- ing (Machine Learning Center)] This paper proposes an online tracklet-based multi-target tracking algorithm. Contributions of this paper are as fol- lows. (1) A simple greedy-based algorithm is constructed to find a near-optimum path in the min-cost flow network, and this path is used for initializing the tracklets. The en- tire tracking process is performed in real-time and provides highly accurate tracks. (2) An affinity score between the tail of a tracklet and the head of the subsequent tracklet is esti- mated by performing a 1-norm on the outer product of the detection responses of the two tracklet regions. The score is approximated using probes at the two regions, and this allows accurate association between tracklets to be made with low complexity. The rest of the paper is organized as follows: Section 2 discusses two maximum-a-posteriori (MAP) data association problems in the context of cost-flow network for tracklet ini- tialization and tracklet association. Section 3 describes an online greedy learning algorithm. Section 4 discusses experi- mental results, and finally, Section 5 concludes the paper. 2. COST-FLOW NETWORK FOR TRACKLET INITIALIZATION AND ASSOCIATION In a multi-target environment with frequent occlusion, the cost-flow network provides a systematic and efficient frame- work for inferring tracks that are relatively occlusion robust. Here, two separate cost-flow networks are constructed for tracklet initialization and association: the node represent de- tection response in one network and tracklet in the other. The details of the two networks are discussed below. 2.1. Cost-flow Network with Detection Responses To initialize the tracklets, a network with object observation nodes analogous to that considered in [3] is constructed. Let x =(p, s, t) ∈X be the space-time location of an object such that p, s and t are respectively the position, scale and frame index, and assume a human detector finds N potential loca- tions {x 1 ,x 2 ,...,x N } where x i ∈X for i ∈ Z N 1 . Let y i 1 For postive integer n, let Zn be the set {1, 2,...,n}