Unsupervised Learning of Question Difficulty Levels using Assessment Responses Sankaran Narayanan, Vamsi Sai Kommuri, N Sethu Subramanian, Kamal Bijlani, and Nandu C Nair Amrita e-Learning Research Lab (AERL), Amrita School of Engineering, Amritapuri, Amrita Vishwa Vidyapeetham, Amrita University, India nsankaran@am.amrita.edu,vamsi5712@gmail.com,sethus@am.amrita.edu,kamal@amrita.edu, cnandu@am.amrita.edu Abstract. Question Difficulty Level is an important factor in deter- mining assessment outcome. Accurate mapping of the difficulty levels in question banks offers a wide range of benefits apart from higher as- sessment quality: improved personalized learning, adaptive testing, au- tomated question generation, and cheating detection. Adopting unsu- pervised machine learning techniques, we propose an efficient method derived from assessment responses to enhance consistency and accuracy in the assignment of question difficulty levels. We show effective feature extraction is achieved by partitioning test takers based on their test- scores. We validate our model using a large dataset collected from a two thousand student university-level proctored assessment. Preliminary re- sults show our model is effective, achieving mean accuracy of 84% using instructor validation. We also show the model’s effectiveness in flagging mis-calibrated questions. Our approach can easily be adapted for a wide range of applications in e-learning and e-assessments. Keywords: e-assessments, unsupervised learning, personalized learn- ing, question bank, difficulty levels 1 Introduction Question Banks are a basic part of e-assessment systems. Questions in the bank span a wide-spectrum of subject matter. High quality questions can be effective in measuring the extent to which test takers have succeeded in meeting learn- ing objectives [1]. The nature of questions directly influence assessment quality as well as the experience of test takers ([2], [10]). Precise estimation and dy- namic calibration of question difficulty can be highly beneficial in personalized e-learning as well ([3], [4]). In e-assessments, examiners routinely define the proportion of questions to be chosen from each subject area, learning objective [1], difficulty level [2], and concept [13]. In randomized close-ended e-assessments, each test taker is given a random set of questions, selected according to these attributes. Quality of evaluation in randomized assessments thus depends on accuracy in mapping the