International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 13 (2017) pp. 3887-3893 © Research India Publications. http://www.ripublication.com 3887 Evaluating the Reliability and Quality of Examination Paper for Multi-tier Application Development Course using Rasch Measurement Model Zuhaira Muhammad Zain Information Systems Department, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, KSA. ORCID: 0000-0002-5973-387X Abstract Final exam has been used tremendously as an assessment tool to measure students’ academic performance in most of the higher institutions in the Kingdom of Saudi Arabia. A good quality of a set of constructed items/questions on final exam would be able to measure both students’ academic performance and their cognitive skills. Rasch Measurement Model has been used to evaluate the reliability and quality of the final exam questions for the Multi-tier Application Development course. The analysis indicated that the reliability and quality of the final exam questions constructed were relatively good and calibrated with students’ learned ability. Keywords: Bloom’s Taxanomy; Information systems; item constructions; quality; Rasch Model; reliability; students performance INTRODUCTION Nowadays, universities in Saudi Arabia need to comply to the American Accreditation Board of Engineering and Technology, 2000 (ABET) program accreditation requirements. One of the ABET general criteria is student. Student performance must be evaluated to monitor student progress in order to foster success in attaining student outcomes, thereby enabling graduates to attain program educational objectives [1]. Normally, student performance measurement has been essentially dependent on the students’ performance in carrying out tasks such as quizzes, assignments, mid examinations, projects and final exams. A quality task should provide the same level of cognitive thinking skills to all students on what they have learned. In order to increase the students’ performance quality, well organized and constructed tasks which are based on Bloom's cognitive thinking skills and the level of students' ability should be considered. A reliable and high quality assessment tools in teaching and learning process is required to measure students’ understanding and ability. Multi-tier Application Development (IS333D) is one of new courses introduced in the Information Systems (IS) Department at the College of Computer and Information Sciences (CCIS) at the Princess Nourah Bint Abdulrahman University (PNU). It is one of the core courses that must be completed by the IS students before they can graduate. The main objective of this course is to introduce the concept of multi-tier architecture to the students and they need to apply it in the Web application development. In this paper, the final examination questions for IS333D for Semester 1 Session 2015/2016 are taken into account as the assessment tool. Furthermore, in the process of constructing these examination questions, it is vital to have fairly distributed examination questions based on Bloom’s cognitive thinking skills, the level of students’ ability and level of questions/items difficulty. According to Morales, a discussion of reliability is essential in evaluating the quality of the questions [2]. The reliability is the degree to which an instrument consistently measures the ability of an individual or group. Generally, to the best of the author’s knowledge, in CCIS, there is no statistical measurement on reliability of any examination questions. The questions were only checked for their format, spelling, and the relevance of questions by the course specialist. Consequently, there is no statistical evidence to verify that a set of examination questions is reliable. The Rasch Measurement Model has been used to assess the reliability and quality of examination paper of some Engineering courses in Malaysia [3, 4, 5, 6, 7], nevertheless, to the best of the researcher’s knowledge, it has not been applied for Information Systems courses especially in Saudi Arabia. The model fulfill the guidelines that has been emphasized by Wright and Mok [8] that a measurement model must produce linear measures, overcome missing data, provide estimates of precision, detect misfits, and distinguish the parameters of the object being measured from those of the measuring instrument. Thus, it can generate meaningful inferences by transforming an ordinal score into a linear, interval-level variable, through estimating the fit of data to the Rasch model’s expectations. The basic principle underlying the Rasch Model is that the probability of a respondent/student successfully verifying a particular item/question is governed by the difference between the item/question’s difficulty and respondent/student’s ability [9, 10, 11]. The logic underlying this principle is that all respondents/students have a higher probability of answering easier items/questions and a lower probability of answering more difficult items/questions accurately [9]. Moreover, Rasch Model is one of the reliable and appropriate method in