Replacement sort revisited: The ‘‘gold standard’’ unearthed! Soubhik Chakraborty a, * , Suman Kumar Sourabh a , Mausumi Bose b , Kumar Sushant c a University Department of Statistics and Computer Applications, T.M. Bhagalpur University, Bhagalpur 812 007, India b Applied Statistics Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata 700 108, India c Department of Applied Physics, B.I.T. Mesra, Ranchi 835 215, India Abstract The present paper shows that for certain algorithms such as sorting, the parameters of the input distribution must also be taken into account, apart from the input size, for a more precise evaluation of computational and time complexity (aver- age case only) of the algorithm in question (the so-called ‘‘gold standard’’). Some concrete results are presented to warrant a new and improved model for replacement sort (also called selection sort) as T avg ðn; p 1 ; p 2 ; ... p k Þ¼ a 0 þ b 0 nðn  1Þ=2 þ c 0 iðn; p 1 ; p 2 ; ... p k Þþ ; where the LHS gives the average case time complexity, n is the input size, p i ’s the parameters of the input distribution char- acterizing the sorting elements, i is the average number of interchanges which is a function of both the input size and the parameters, the rest of the terms arising due to linear regression and have usual meanings. The error term  arises as we have ﬁxed only the input size n in the model but varying the speciﬁc input elements and their relative positions in the array, for a particular distribution [H. Mahmoud, Sorting: A Distribution Theory, John Wiley and Sons, 2000]. The term nðn  1Þ=2 represents the number of comparisons. We claim this to be an improvement over the conventional model, namely, T avg ðnÞ¼ a þ bn þ cn 2 þ ; which stems from the Oðn 2 Þ complexity for this algorithm. We argue that the new model in our opinion can be a guiding factor in distinguishing this algorithm from other sorting algorithms of similar order of average complexity such as bubble sort and insertion sort. Note carefully that the depen- dence of the number of interchanges on the parameters is more prominent for discrete distributions rather than continuous ones and we suspect this to be because the probability of a tie is zero in a continuous case. However, presence of ties and their relative positions in the array is crucial for discrete cases. And this is precisely where the parameters of the input dis- tribution come into play. Those algorithms where ties have a greater inﬂuence on some of the computations will have greater inﬂuence of parameters of the input distribution in it. Another strength of the paper is that it brings up the close connection between algorithmic complexity and computer experiments, a crucial issue which is overlooked in the textbooks on algorithms. This is a paper on modeling rather than speed. Ó 2006 Elsevier Inc. All rights reserved. 0096-3003/$ - see front matter Ó 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2006.11.093 * Corresponding author. E-mail addresses: soubhikc@yahoo.co.in (S. Chakraborty), sourabh.suman@rediﬀmail.com (S.K. Sourabh), mausumi@isical.ac.in (M. Bose), kumarsushant@yahoo.com (K. Sushant). Applied Mathematics and Computation 189 (2007) 384–394 www.elsevier.com/locate/amc