Inference problems in binary regression model with misclassified responses A. Chatterjee *1 , T. Bandyopadhyay † 2 and S. Adhya ‡ 3 1 Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, Delhi 2 Production & Quantitative Methods Group, Indian Institute of Management, Ahmedabad 3 Department of Statistics, West Bengal State University, Kolkata Abstract Misclassification of binary responses, if ignored, may severely bias the maximum likelihood estimators (MLE) of regression parameters. For such data, a binary regression model incorporating misclassification probabilities is extensively used by researchers in different application contexts. The model, however, suffers from a serious estimation problem because of confounding of the unknown misclassification prob- abilities with the regression parameters. To overcome this problem, in addition to the main sample, use of internal validation data is proposed. However, the maximum likelihood estimators (MLE) are found to be substantially biased. Investigating further, we propose a maximum pseudo-likelihood method of es- timation which leads to bias reduction. For drawing inference on the regression parameters, we develop a rigorous asymptotic theory for the maximum pseudo-likelihood estimators under standard assump- tions. To facilitate its easy implementation, a bootstrapped version of the estimator is proposed, and its distributional consistency is proved. Extensions of these results are also provided for more general misclassification models. The results of the simulation studies are encouraging. The methodology is illustrated with a survey data. MSC2010 classification code: 62F12, 62F40, 62J12. Keywords: Contaminated data, Validation sample, Likelihood method, Z-estimators, Bootstrap. * cha@isid.ac.in. Research partially supported by SERB grant MTR/2017/000224. † tathagata@iima.ac.in ‡ sumanta.adhya@gmail.com 1 arXiv:1611.06727v2 [math.ST] 13 Dec 2019