Lane Discovery in Trafﬁc Video Nicholas Miller Miovision David M. Swart Kitchener, ON Akshaya Mishra Canada Andrew Achkar Abstract Video sensing has become very important in Intelligent Transporta- tion Systems (ITS) due to its relative low cost and non-invasive de- ployment. An effective ITS requires detailed trafﬁc information, in- cluding vehicle volume counts for each lane in surveillance video of a highway or an intersection. The multiple-target, vehicle-tracking and counting problem is most reliably solved in a reduced space deﬁned by the constraints of the vehicles driving within lanes. This requires lanes to be pre-speciﬁed. An off-line pre-processing method is presented which automatically discovers trafﬁc lanes from vehi- cle motion in uncalibrated video from a stationary camera. A mov- ing vehicle density map is constructed, then multiple lane curves are ﬁtted. Trafﬁc lanes are found without relying on possibly noisy tracked vehicle trajectories. 1 Introduction In computer vision, the heaviest focus on trafﬁc lane detection is in the autonomous driving application. Only the driver’s own lane boundaries and adjacent lanes need to be detected. Lanes must be localized in real world space from video image space. Hence the camera for this set-up is typically a calibrated front facing cam- era, relatively close to the road. The standard approaches often rely on line ﬁtting to edges from road markings which are clearly visible to the driver from this vantage point [1, 2]. Ego motion in the image is present and initial lane detections are tracked and up- dated. Another area where lane detection is required is for vehicle counting from a statically mounted trafﬁc surveillance camera. Of- ten, in the computer vision literature, trafﬁc lanes are speciﬁed manually as a region of interest [4, 5, 6, 7, 8, 9]. This may be sufﬁ- cient in some situations where only total volumes are required, but in many ITS applications a precise breakdown of counts for each lane may be required. Yet manual lane annotation can prove to be error prone, especially when lane marking are unclear and only a small segment of video or a still frame is presented to the user which does not fully demonstrate vehicle paths for every possible lane. For the vehicle counting application, it is sufﬁcient to ﬁt the traf- ﬁc lanes in the image space only. This avoids the need for calibra- tion and extrinsic camera parameter estimation. Furthermore, if the lane detection may be performed off-line as a pre-processing step, lanes may be discovered and traced from the motion of detected vehicles. This results in more reliable lane placement than rely- ing on possibly noisy, degraded, or invisible edge markings. Lane detection from moving vehicles was performed in [3] by tracking trajectories and simple clustering of ﬁtted lines. This assumes that lanes are straight on a planar road and does not address noise due to vehicle tracking errors. Most other automatic lane detection work assumes straight planar lanes and employs manual annota- tion, camera calibration, or homographies [9]. 2 Methodology The objective of this article is to use the positions of vehicles as they move through the image in order to discover all of the trafﬁc lanes. The general approach is to tightly localize vehicles in every frame and register their positions over time using a density image. A random search is used to ﬁt multiple curve models to the den- sity image and thereby identify each separate lane as an individual curve. See Fig. 1. Since trafﬁc cameras are mounted upright, vehicles are verti- cally aligned and their tops and bottoms appear up and down re- spectively. There may be a variety of vehicles of different heights so the location of vehicle tops and centroids may vary in the image. Nonetheless, the deﬁning characteristic of all vehicles traveling in- Fig. 1: Example trafﬁc video. Three lanes (blue) are discovered from moving vehicles. Vehicle density is also overlaid (red). Car Van Transport Truck Lane Image Camera Vehicle Bottom * Fig. 2: A demonstration of the appearance of various sized vehicles in the image. The fronts of several vehicles are shown, all from the same lane. The vehicle tops have various image projections but the bottoms all converge near one image point. This may be used to deﬁne the location of the lane in the image. side a particular lane is the curve traced out by bottom point of their apparent boundaries as they move through the image, see Fig. 2. A greedy random search is employed to ﬁt multiple quadratic spline models to the density image. It is similar to sequentially ap- plying Random Sample Consensus (RANSAC) to ﬁt multiple mod- els by removing ﬁtted samples. There are a few key differences to address the well-known pitfalls when using the greedy sequential RANSAC strategy [11, 12]. 2.1 Vehicle Density Image Vehicle trajectories provide the locations of the lanes in which they travel. Unfortunately, solving the multiple target tracking and data association problem reliably enough to discover lanes from vehi- cles trajectories proves difﬁcult in practice. Furthermore, espe- cially for vehicle counting, reliable identiﬁcation and tracking is best done in a reduced space where the trafﬁc lanes are already identi- ﬁed [6, 7, 10]. In order to discover lanes from vehicle motion without explic- itly tracking them, each vehicle is independently localized in ev- ery frame of video. Vehicles are detected using a deep learning vehicle contour detector which is fused with background subtrac- tion [8]. This provides a precise closed boundary contour around moving vehicles from which the bottom point of each vehicle can be located in each frame. These points are added as samples in a vehicle motion density image. The resulting density image has