Accid. And. & F’rev. Vol. 18, No. 1, pp. 1-12, 1986 ooO1-4756/86 $3.00+ .O!l Printed in Gnat Britain. 0 1986 Pergamon Press Ltd. ON THE ESTIMATION OF THE EXPECTED NUMBER OF ACCIDENTS EZRA HAUER Transport Safety Studies Group, Department of Civil Engineering, University of Toronto, Toronto, Ont., M5S lA4, Canada (Received 18 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONM February 1985) zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO Abstract-We show that similar entities (drivers, intersections, bus companies, rail crossings) which in one period recorded “x” accidents do not have, on the average, “x” accidents in the subsequent period. The difference is large and systematic. This leads us to conclude that in circumstances in which only the safety estimates in these two periods matter, use of “x” to estimate the expected number of accidents has definite shortcomings. Better estimators are suggested, explored and their use is illustrated. We note that the suggested estimators are similar to what is used when estimation is based on a “treatment-control” type experimental design. It is hoped that the suggested estimators will alleviate some practical problems in the structuring of controlled experiments in safety research, eliminate bias- by-selection from uncontrolled studies and in general enhance the accuracy of safety estimates. INTRODUCTION Safety, or rather its absence, is the property of specific entities (city A, driver B, intersection C, bus company D) during a certain period of time. Because the actual count of accidents (to be denoted by x) is subject to random variation, we define the safety of an entity to be the expected number of accidents m. It follows, that to measure the safety of some entity means to obtain an estimate of its m. It is common practice to use x (the actual count of accidents) to serve as an estimate of m (the expected number of accidents). In the first part of this paper we will show that in many cases of practical interest, perhaps in most, x is not a very good estimate of m. With some added effort one can estimate better. How to do so will be the subject of the second part of this paper. Section 3 is devoted to a recapitulation of the principal conclusions and their discussion. 1. THE COUNT OF ACCIDENTS IS NOT A GOOD ESTIMATE OF SAFETY Consider the entries in Table 1. The table is based on the count of accidents occurring during the years 1974 and 1975 at 1142 intersections in San Francisco. All intersections in this population had stop signs on the two approaches carrying the lesser flows. Column 1 gives the number of intersections [n(x)] on which the count of accidents in 1974 was x = 0, 1, 2 . . . as shown in column 2. Column 3 gives the average of the count of accidents [x,] for the same n(x) intersections during 1975. The use of x as an estimate of m means that if an intersection registered, say, x = 3 accidents in 1974 and if it has remained largely unchanged for 1975, we think and assert that 3 is a sensible estimate of the accident count for 1975. This belief and assertion applies to each of the 65 intersections in the group. Therefore, while for any intersection in this group the 1975 count of accidents may be different from 3, the average count for all 65 intersections should be quite close to 3. However, inspection of Table 1 reveals that intersections which in 1974 registered 3 accidents, had in 1975 1.97 accidents on the average. Similar discrepancies between the entries of columns 2 and 3 exist for all values of x (except for x = 1 which will turn out to be the rule-confiiing exception). These discrepancies can not be reasonably attributed to chance. Nor are they likely to reflect a sudden, large and peculiarly systematic change between these two years. (The total number of accidents at these intersections was 1253 in 1974 and 1216 in 1975 .) We observe therefore, that in this case, the 1974 count of accidents is not a good indication of the average count in 1975 for any value of x (except for x = 1). One is led to conclude