SEGMENT-WISE ONLINE LEARNING BASED ON GREEDY ALGORITHM FOR REAL-TIME MULTI-TARGET TRACKING Changhoon Lee, and Chang D. Yoo Korea Advanced Institute of Science and Technology Department of Electrical Engineering 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Korea ABSTRACT This paper proposes a tracklet-based algorithm for online multiple-target tracking. The algorithm performs tracking in three steps: (1) tracklet initialization, (2) tracklet reﬁne- ment, and (3) tracklet association. Given detection responses, tracklets are initialized by ﬁnding a near-optimum path in the min-cost ﬂow network using a greedy-based algorithm. Based on an appearance-based model, the tracklets are reﬁned so that the detection responses within the tracklet become more homogeneous. Finally, the tracklets are linked based on a novel afﬁnity measure, then by optimizing a min-cost ﬂow network with links, the ﬁnal tracks are generated. For real-time multi-target tracking, every step is processed in a segment-wise manner. On popular public datasets and strictly in an online fashion, the proposed multi-target tracking algo- rithm performed comparable to that of many state-of-the-art algorithms. Index Terms— Multi-target, tracking, online, tracklet, greedy algorithm 1. INTRODUCTION Multiple-target tracking has received a great deal of interest for applications in surveillance, trafﬁc control, activity recog- nition and sports video analysis. Occlusion and variations in appearance and illumination render it a difﬁcult problem. Various algorithms have been proposed, and some notable state-of-the-art algorithms are as follows. Wang et al. [1] use appearance-model based metric learning to determine the afﬁnity between the tracklets in generating an occlusion ro- bust tracking algorithm. Pauwels et al. [2] use a mixture model of dense motion and stereo cues with feedback to gen- erate robust tracklets. These algorithms achieve their state-of- the-art performances in a manner which is not strictly online. This is a serious limitation that hinder their applicability to many applications. This work was supported by ICT R&D program of MSIP/IITP. [B0101- 15-0307, Basic Software Research in Human-level Lifelong Machine Learn- ing (Machine Learning Center)] This paper proposes an online tracklet-based multi-target tracking algorithm. Contributions of this paper are as fol- lows. (1) A simple greedy-based algorithm is constructed to ﬁnd a near-optimum path in the min-cost ﬂow network, and this path is used for initializing the tracklets. The en- tire tracking process is performed in real-time and provides highly accurate tracks. (2) An afﬁnity score between the tail of a tracklet and the head of the subsequent tracklet is esti- mated by performing a 1-norm on the outer product of the detection responses of the two tracklet regions. The score is approximated using probes at the two regions, and this allows accurate association between tracklets to be made with low complexity. The rest of the paper is organized as follows: Section 2 discusses two maximum-a-posteriori (MAP) data association problems in the context of cost-ﬂow network for tracklet ini- tialization and tracklet association. Section 3 describes an online greedy learning algorithm. Section 4 discusses experi- mental results, and ﬁnally, Section 5 concludes the paper. 2. COST-FLOW NETWORK FOR TRACKLET INITIALIZATION AND ASSOCIATION In a multi-target environment with frequent occlusion, the cost-ﬂow network provides a systematic and efﬁcient frame- work for inferring tracks that are relatively occlusion robust. Here, two separate cost-ﬂow networks are constructed for tracklet initialization and association: the node represent de- tection response in one network and tracklet in the other. The details of the two networks are discussed below. 2.1. Cost-ﬂow Network with Detection Responses To initialize the tracklets, a network with object observation nodes analogous to that considered in [3] is constructed. Let x =(p, s, t) ∈X be the space-time location of an object such that p, s and t are respectively the position, scale and frame index, and assume a human detector ﬁnds N potential loca- tions {x 1 ,x 2 ,...,x N } where x i ∈X for i ∈ Z N 1 . Let y i 1 For postive integer n, let Zn be the set {1, 2,...,n}