Abstract—Presently there are many Chinese proficiency tests (CPTs) available today measuring participants’ proficiency in CSL. Most of them, the AP Chinese language and Cultural examination (AP), Hanyu Shuiping Kaoshi (HSK) and Test of Proficiency-Huayu (TOP), had classified their test results in proficiency levels which corresponded to the levels of CEFR (The Common European Framework of Reference for Languages: Learning, Teaching, and Assessment). However, some of Reading and Listening subjects in Top had not completely conducted their proficiency level in corresponding to CEFR. Therefore, the items implemented in this study were on the basis of CEFR for CSL CPT reading and listening subject construction. This study applied IRT 3PL model to analyze and interpret 751 reading and 762 listening subjects empirical data collected from Grace Christian Collage in Philippine on September 2009 via the computerized based test (CBT). The contribution of this study was not only on the construction of a CSL Proficiency Test on a basis of CEFR but also in comparison with examinees’ proficiency scales in referring to their background and explored factors that might affected CSL learning effectiveness. Keywords—CEFR, Chinese as Second Language, Proficiency Test, CSL proficiency scales I. INTRODUCTION nder the globalization market, multi-language proficiency becomes very important in the competitive business industry or other sectors of industries today. One of the examples is the recent fever in learning Chinese as second language (CSL). The ability of using Chinese language to communicate with others is another important area which has been neglected in the levels of Chinese Language proficiency test. The Chinese Language prerequisite of entrance to sectors of education or job demands increases which initiates the motivation of participants to take Chinese proficiency test (CPT). To measure examinee’s proficiency and to classify them according to their proficiency levels accurately and effectively has to do with the importance of item implementation incorporate with proficiency index during CPT constructed. Presently there are many CPT in the world today, classified their test results in proficiency levels which corresponded to the levels of CEFR (The Common European Framework of Reference for Languages: Learning, Teaching, and Assessment). The purpose of enable the comparison of proficiency scales between different tests was to distinguish the discrepancy of examinee’s proficiency with others further justifies their curriculum engagement. For examples TOEIC, TOEFL, BULATS, TestDaF, and DELF etc. [1, 2]. Regarding as CSL CPT in Taiwan (TOP), TOP Speaking and Writing subjects had also conducted their proficiency scales comparisons corresponding to CEFR. However, some of the TOP Reading and Listening subjects had not completely conducted their proficiency scales comparisons corresponding to CEFR. The proficiency scales comparison between TOP and CEFR had shown below Table 1. Therefore, the item implementation in this study was on the basis of CEFR B1 level for CSL CPT Reading and Listening subject construction. Table 1. The proficiency level comparison between TOP and CEFR TOP/ Reading & Listening CEFR TOP/ Speaking CEFR TOP/ Writing CEFR Beginner A2 Beginner A2 Beginner A2 Basic Learner B1 Learner B1 Intermediate Superior B2 Superior B2 Advanced Master C1 N/A Resource: TOP website The examinee’s proficiency was analyzed from the CSL CPT results based on the Classical Test Theory (CTT). CTT applied observed score (raw score) to classify examinees CSL proficiency level[3, 4]. For example, the proficiency scale of HSK used on the test report was transformed from the raw score [5, 6]. Other example, the test report on SAT was indicated in three different proficiency scales such as raw score, composite total score, and percentile at the same time [7]. The validity of using raw score to represent proficiency scale of a test is not based on the assumptions of meaningful measurement, unidimensionality, linearity, and mutuality of data characters. In addition, the test cut scores were unable to distinguish the proficiency scale of an examinee who participates in different test with more difficult items. Another word that different test could measure examinee’s proficiency differently while applying CTT model. On the contrary, IRT model overcomes all the shortcomings of CTT model [8]. This study applied IRT three-parameter logistic (3PL) model to analyze and interpret A study on CSL proficiency evaluation-Reading and Listening subject Rih-Chang Chao, Bor-Chen Kuo, Hsuan-Po Wang, Ya-Hsun Tsai U INTERNATIONAL JOURNAL OF EDUCATION AND INFORMATION TECHNOLOGIES Issue 1, Volume 5, 2011 51