Inference problems in binary regression model with misclassiﬁed responses A. Chatterjee *1 , T. Bandyopadhyay † 2 and S. Adhya ‡ 3 1 Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, Delhi 2 Production & Quantitative Methods Group, Indian Institute of Management, Ahmedabad 3 Department of Statistics, West Bengal State University, Kolkata Abstract Misclassiﬁcation of binary responses, if ignored, may severely bias the maximum likelihood estimators (MLE) of regression parameters. For such data, a binary regression model incorporating misclassiﬁcation probabilities is extensively used by researchers in diﬀerent application contexts. The model, however, suﬀers from a serious estimation problem because of confounding of the unknown misclassiﬁcation prob- abilities with the regression parameters. To overcome this problem, in addition to the main sample, use of internal validation data is proposed. However, the maximum likelihood estimators (MLE) are found to be substantially biased. Investigating further, we propose a maximum pseudo-likelihood method of es- timation which leads to bias reduction. For drawing inference on the regression parameters, we develop a rigorous asymptotic theory for the maximum pseudo-likelihood estimators under standard assump- tions. To facilitate its easy implementation, a bootstrapped version of the estimator is proposed, and its distributional consistency is proved. Extensions of these results are also provided for more general misclassiﬁcation models. The results of the simulation studies are encouraging. The methodology is illustrated with a survey data. MSC2010 classiﬁcation code: 62F12, 62F40, 62J12. Keywords: Contaminated data, Validation sample, Likelihood method, Z-estimators, Bootstrap. * cha@isid.ac.in. Research partially supported by SERB grant MTR/2017/000224. † tathagata@iima.ac.in ‡ sumanta.adhya@gmail.com 1 arXiv:1611.06727v2 [math.ST] 13 Dec 2019