Unsupervised Learning of Question Diﬃculty Levels using Assessment Responses Sankaran Narayanan, Vamsi Sai Kommuri, N Sethu Subramanian, Kamal Bijlani, and Nandu C Nair Amrita e-Learning Research Lab (AERL), Amrita School of Engineering, Amritapuri, Amrita Vishwa Vidyapeetham, Amrita University, India nsankaran@am.amrita.edu,vamsi5712@gmail.com,sethus@am.amrita.edu,kamal@amrita.edu, cnandu@am.amrita.edu Abstract. Question Diﬃculty Level is an important factor in deter- mining assessment outcome. Accurate mapping of the diﬃculty levels in question banks oﬀers a wide range of beneﬁts apart from higher as- sessment quality: improved personalized learning, adaptive testing, au- tomated question generation, and cheating detection. Adopting unsu- pervised machine learning techniques, we propose an eﬃcient method derived from assessment responses to enhance consistency and accuracy in the assignment of question diﬃculty levels. We show eﬀective feature extraction is achieved by partitioning test takers based on their test- scores. We validate our model using a large dataset collected from a two thousand student university-level proctored assessment. Preliminary re- sults show our model is eﬀective, achieving mean accuracy of 84% using instructor validation. We also show the model’s eﬀectiveness in ﬂagging mis-calibrated questions. Our approach can easily be adapted for a wide range of applications in e-learning and e-assessments. Keywords: e-assessments, unsupervised learning, personalized learn- ing, question bank, diﬃculty levels 1 Introduction Question Banks are a basic part of e-assessment systems. Questions in the bank span a wide-spectrum of subject matter. High quality questions can be eﬀective in measuring the extent to which test takers have succeeded in meeting learn- ing objectives [1]. The nature of questions directly inﬂuence assessment quality as well as the experience of test takers ([2], [10]). Precise estimation and dy- namic calibration of question diﬃculty can be highly beneﬁcial in personalized e-learning as well ([3], [4]). In e-assessments, examiners routinely deﬁne the proportion of questions to be chosen from each subject area, learning objective [1], diﬃculty level [2], and concept [13]. In randomized close-ended e-assessments, each test taker is given a random set of questions, selected according to these attributes. Quality of evaluation in randomized assessments thus depends on accuracy in mapping the