Randwick International of Education and Linguistics Science (RIELS) Journal Vol. 4, No. 4, December 2023 | Page 937-944 | https://www.randwickresearch.com/index.php/rielsj -937- 1 English Education Study Program, Universitas Alwashliyah, Medan, Indonesia 2,3 Postgraduate Program, Universitas Negeri Medan, Medan, Indonesia 1 meidarabia55@gmail.com, 2 meisuri@yahoo.com, 3 berlinsibarani@unimed.ac.id Examining the Validity and Reliability of ChatGPT 3.5- Generated Reading Comprehension Questions for Academic Texts | Meida Rabia Sihite 1 | Meisuri 2 | Berlin Sibarani 3 | INTRODUCTION The integration of AI language models, such as ChatGPT 3.5, in educational and language education domains has garnered significant attention due to their potential to revolutionize the creation of educational materials and language assessments. These advanced models have demonstrated utility in generating virtual patient simulations, quizzes, and educational materials, showcasing their potential to enhance learning experiences (Eysenbach, 2023; Hung et al., 2023). However, concerns have been raised regarding the reliability and validity of AI-generated content, especially when applied to assess reading comprehension proficiency in academic texts (Rahman & Watanobe, 2023; Tyson, 2023; Vandiver, 2008) . ABSTRACT This research examines the capacity of ChatGPT 3.5 in generating reading comprehension questions for academic texts, with a focus on their alignment with higher-order cognitive skills as per Bloom’s Taxonomy. A paper-based test comprising 30 multiple- choice questions was constructed using ChatGPT 3.5, based on three selected TOEFL ITP reading comprehension passages. The study employed a mixed-methods approach, integrating qualitative content analysis to assess the cognitive level of each question and quantitative methods to analyze student responses. Data collection involved administering the AI-generated questions to students and scoring their responses. Analysis techniques included Pearson correlation coefficients to determine validity and reliability analysis using Cronbach's Alpha to measure internal consistency. The findings revealed that ChatGPT 3.5 is capable of producing questions that cover a range of cognitive levels, from analysis to creation, however only 10 out of 30 questions met the validity criteria, indicating a need for improvement in the AI's question generation process. The reliability of these questions was moderate, suggesting a reasonable level of internal consistency. The study concludes that while AI-generated questions show promise in educational assessments, ongoing improvement of AI models is necessary to enhance their effectiveness. The implications of this research are significant for the future integration of AI in educational settings, indicating a potential role for AI in developing meaningful assessment tools. The study recommends future research to explore various question types and incorporate student feedback to optimize the effectiveness of AI in education. KEYWORDS ChatGPT 3.5; validity; reliability; reading comprehension questions DOI: https://doi.org/10.47175/rielsj.v4i4.835 This work is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License.