LOGRITHMIC TRANSFORMATION: A TOOL FOR NORMALIZING RESIDUALS IN ANOVA MODELS ABBAS ULLAH JAN 1 , DAWOOD JAN 1 , YOUSAF HAYAT 2 , AISHA SADIQA 3 , GHAFFAR ALI 1 and MUHAMMAD FAYAZ 1 1 Department of Agricultural and Applied Economics, The University of Agriculture, Peshawar - Pakistan. 2 Department of Mathematics, Statistics and Computer Science, The University of Agriculture, Peshawar - Pakistan. 3 Department of Economics, Hazara University - Pakistan. *Corresponding author: abbasjan@aup.edu.pk ABSTRACT This study investigates the statistical significance of the difference in cost of production of potato, tomato and cauliflower in district Mansehra, Pakistan by using linear and semi-logarithm ANOVA models. The linear model suggests that the average cost of production of potato per acre is Rs. 23460 while that of tomato and cauliflower is Rs. 3284.8 more and Rs. 3188 less than that of potato. These differences are statistically significant. The semi- logarithm model indicates that the median cost of production per acre of potato, tomato and cauliflower is Rs. 22471.4, Rs. 26135.2 and Rs. 19920.8 respectively. The estimated semielasticities reveal that the median cost of production of tomato and cauliflower is 16.3% more and 11.35% less than that of potato respectively. The difference in cost of production between potato and tomato is statistically significant (P < 0.05) while that between potato and cauliflower is insignificant (P > 0.05) -based upon the results of the log model. Various normality tests show that the residuals are normally distributed in the semi-log model which is not the case in linear model suggesting that log transformation is an effective technique for normalization of residuals. The non normal distribution of residuals in the linear model undermines the validity of t and F statistics and thus the hypothesis testing can potentially be misleading, especially in small samples. However in the log transformed model t and F statistics can be used with confidence for hypothesis testing. Keywords: Linear and semilogrithm ANOVA models, cost of production, semi elasticities, Normality, data transformation Citation: Jan. A. U., D. Jan., Y. Hayat., A. Sadiqa., G. Ali and M. Fayaz. 2014. Logarithmic transformation: a tool for normalizing residuals in anova models. Sarhad J. Agric. 30(3): 375-378 INTRODUCTION In regression analysis the dependant variable is frequently influenced by ratio scale variables but also by variables that are essentially qualitative and needs to be included among explanatory variables. Since, such variables usually indicate the presence or absence of a quality or attribute and can be quantified by constructing artificial variables that takes on values of 1 or 0, where 1 indicating the presence and 0 the absence of that attribute. Variables that assume 0 and 1 values are called dummy variables (Gujarati, 2003, Greene 2002, Wooldridge 2006). Such variables are thus essentially a device to classify the data into mutually exclusive categories. Regression models containing regressors that are all exclusively dummy or qualitative in nature are called Analysis of Variance (ANOVA) models. One of the important assumptions of the classical normal linear regression model (CNLRM) is that the disturbance term “υi” entering in the regression model is normally and independently distributed with zero mean and a finite constant variance σ 2 , i.e. υi ~ NID (0, σ 2 ). The CNLRM differs from the classical linear regression model (CLRM) in that it specifically assumes that the disturbance term υi is normally distributed. The CLRM does not require any assumption about the probability distribution of υi, it only requires that the mean value of υi is zero and its variance is a finite constant (Green 2002, Gujarati 2003). The assumption of normality of υi is not essential if the objective is estimation only. The ordinary least square (OLS) estimators are Best Linear Unbiased Estimators (BLUE) regardless of whether υi are normally distributed or not. The normality assumption however makes us able to establish that the OLS estimators of the regression coefficients follow the normal distribution and one could use the t, F and χ 2 tests to test various statistical hypotheses regardless of the sample size. If the disturbances are not normally distributed, the OLS estimators are still normally distributed asymptotically. However in practice, researchers do not have the luxury of large sample