Weighted Bootstrap with Probability in Regression M. R. NORAZAN 1 , M. HABSHAH 2 , AND A. H. M. R. IMON 3 1 Faculty of Computer and Mathematical Sciences University Technology MARA, 40450 Shah Alam, Selangor, MALAYSIA 2 Laboratory of Applied and Computational Statistics, Institute for Mathematical Research, University Putra Malaysia, 43400 Serdang, Selangor, MALAYSIA 3 Department of Mathematical Sciences, Ball State University, Muncie, IN 47306, U.S.A. Email: 1 norazan@tmsk.uitm.edu.my , 2 habshahmidi@gmail.com , 3 imon_ru@yahoo.com Abstract: - In statistical inference we are often concerned in procuring the standard errors of the estimates of parameters and their confidence intervals. The t and the z statistics used to construct confidence intervals require the observations to come from a normal distribution. However, in many practical situations, the normality assumptions might not be guaranteed especially when outliers are present in the data. As an alternative one may think of considering bootstrap techniques which does not rely on the normality assumption. It is now evident that the existence of outliers in the original sample can cause a serious problem to the classical bootstrap estimates although statistics practitioners, unfortunately, are not much aware of this fact. Even if the original sample contains a single outlier, there is a possibility that the bootstrap samples may contain many outliers and consequently classical bootstrap technique may produce worse results. Here we present a new weighted bootstrap with probability (WBP) method in regression where the probability of selecting an observation has a weight disproportionate to its outlyingness. So the outliers would have smaller probabilities to be selected and thus this process will keep the effect of outliers very small on the entire bootstrap procedure. Numerical examples and simulation studies show that this newly developed WBP method performs better than the existing bootstrap methods. Key-Words: - Outliers, Probability Bootstrap, Weighting Psi Function 1 Introduction Bootstrap method is a procedure that can be used to obtain inference such as confidence intervals for the regression coefficient estimates. The bootstrap method was proposed by Efron with the basic idea of generating a large number of sub-samples by randomly drawing observations with replacement from the original dataset [3]. These sub-samples are then being termed as bootstrap samples and are used to recalculate the estimates of the regression coefficients. Bootstrap method has been successful in attracting statistics practitioners as its usage does not rely on the normality assumption. An interesting property of the bootstrap method is that it can provide the standard errors of any complicated estimator without requiring any theoretical calculations. It is now evident that the presence of outliers have an unduly effect on the bootstrap estimates. Outliers are observations which are markedly different from the bulk of the data or from the pattern set by the majority of the observations. In a regression problem, observations corresponding to excessively large residuals are treated as outliers. There is a possibility that the bootstrap samples may contain more outliers than the original sample because the bootstrap re-sampling procedure is with replacement [see (8)]. As a consequence, the variance estimates and also the confidence intervals are affected and thus resulting to bootstrap distribution break down. We may use robust estimator to deal with possible outliers, but this may not be enough since robust estimation is expected to perform well only up to a certain percentage of outliers. In this paper, we propose a modification of the bootstrap procedure proposed by Imon and Ali [see (8)]. The main idea is to form each bootstrap sample by re-sampling with probabilities so that the more outlying observations will have smaller probabilities of selection. We organize this paper as follows – we discuss and summarize several existing bootstrap procedures in Section 2; in Section 3 we present the newly proposed bootstrap method and examine its performance; and finally, some conclusions are made in Section 4. Proceedings of the 8th WSEAS International Conference on Applied Computer and Applied Computational Science ISSN: 1790-5117 135 ISBN: 978-960-474-075-8