Clustering of test scenes by use of Wasserstein metric and analysis of tracking quality in optically degraded videos Pawel Kowalczyk , Paulina Bugiel , Marcin Szelest § and Jacek Izydorczyk * pawel.2.kowalczyk@aptiv.com, paulina.bugiel@aptiv.com, marcin.szelest@aptiv.com, jacek.izydorczyk@ieee.org * Silesian University of Technology, Gliwice, Poland § Aptiv Technical Center, Krak´ ow, Poland Abstract—The goal of this paper is to present results of analysis of the impact of artificially generated disturbances imitating real defects that may occur in the process of testing autonomous vehicles both during rides and later, in vehicle software sim- ulation. We will focus on one perception module responsible for detection of other moving vehicles on the road. Injected faults are Gaussian blur, obliteration by particles and gray level resolution degradation. At the same time, we want to propose an examination approach scheme that will provide detailed information about distribution of quality in this comparative experiment and which can be reused for different perception modules, faults, scene sets and also in order to compare new releases of main recognition software. To do so we will combine statistical methods (Welch’s ANOVA) and topological analysis (Clusterization over space of distributions, Wasserstein metric). Results include summary of experiment for all used data and described by mentioned tools examples of certain cases that illustrate general conclusions and less standard outcomes. I. I NTRODUCTION Main motivation for conducting this experiment is to study influence of artificially generated visual disturbances injected into video stream on quality of detection from object recog- nition algorithm. These types of algorithms form automotive perception modules and are used to create a virtual description of the environment on the basis of which the control protocols in car will work. There already exists some input in this topic [[5], [13], [16], [18]]. The value of such analysis is discussed, inter alia, in [[3], [4], [8], [19]] but is mostly represented by the possibility to reuse already collected and labelled test data in order to broaden knowledge about the behavior of vision algorithms in conditions that did not occur on original recordings. In this work we will focus on module responsible for description of other moving vehicles. Analysis will include three types of disturbances that imitate real defects that can occur during tests. Comparison will be conducted on performance measures of software results for original videos and videos with differing levels of injected distortions. The data were divided into groups according to conditions on road and the shape of the quality distribution of detection in original videos (not infected with disorders). A statistical analysis based on tests comparing the main parameters of dis- tributions in increasingly degraded conditions will be applied to such prepared experimental setup. In addition, the results of these tests will be supplemented by comparison of quality distributions using a dedicated metric. II. DATA All the research presented in this paper was based on car test drive video recordings, which were 1280x971 pixels in size, in gray-scale, with 12-bit depth. All the interesting objects apparent in the videos were annotated in a dedicated laboratory to produce ground truth for quality measures. We also subjected the videos to automatic object detection in a system that emulates real in-car detector. We repeated this process, introducing different but controlled disturbances each time. Obtained results served as a data set for the presented methods. We analyzed three perturbation types with varying intensity. A. Fault Injection 1) Gaussian blur: By introducing Gaussian blur, we tried to mimic some weather conditions e.g. rain, ice or fog as well camera being out of focus for some reason. Blur was performed by convolving rectangular Gaussian kernel [12] with image pixels. In this case the size and parameters of the kernel were the same for all available videos as well. In consecutive test cases, we changed only the size of the kernel. We chose 11 kernel sizes in range from 3 - 65, differing roughly by 6 pixels. Example can be seen in Fig.1 and Fig.2. 2) Particle injection: By particle we understand e.g. dust or dirt that obstructs the camera as well as lens and camera sensor faults. In our experiments, the particles were black circular spots scattered inside the image but fixed for all frames of the video and the same in all the videos. This approach helped us minimize the risk that some regions in the field of view may be of different significance. For each experiment, we changed only the radius of the circles which gave us control over the blocked area size. We decided to put 15 particles onto images, and change their radius in 10 pixel steps from 0 to 100, which gave 11 blockage levels. Table I shows how the radius of the 15 particles placed in image affects percentage of blocked pixel in every frame. Examples are shown in Fig.3 and Fig.4