A Case Study of Novice Programmers on Parallel Programming Models Xuechao Li 1* , Po-Chou Shih 2 , Xueqian Li 3 , Cheryl Seals 1 1 Dept. of Computer Science, Auburn University, 345 W. Magnolia Ave, Auburn, Alabama, USA. 2 Graduate Institute of Industrial and Business Management, National Taipei University of Technology, 1, Sec. 3, Zhongxiao E. Rd, Taipei, Taiwan. * Corresponding author. Tel.: +13344984008; email: xcl@auburn.edu Manuscript submitted May 8, 2017; accepted July 25, 2017. doi: 10.17706/jcp.13.5.490-502 Abstract: High performance computing (HPC) languages have been developed over many years but the ability to write parallel program has not been fully studied based on the programmer effort subjects. In this paper for obtaining quantitative comparison results, we conducted an empirical experiment, which quantified four indices related to the programming productivity of CUDA and OpenACC: the achieved speedup, programmers’ workload on parallel solutions, effort per line of code and the dispersion on speedup. Twenty-eight students were recruited to finish two parallel problems: Machine Problem 3(MP3) and Machine Problem 4 (MP4) and the research data was collected by a WebCode website developed by ourselves. The statistical results indicated that (1) the speedup in OpenACC is 11.3x less than CUDA speedup; (2) the workload of developing OpenACC solutions is at least 2.3x less than CUDA ones; (3) the effort per line of code in OpenACC is not significantly less than CUDA; (4) and the OpenACC dispersion on speedup is significantly less than CUDA. Key words: Development effort, programming productivity, empirical experiment, OpenACC, CUDA. 1. Introduction Journal of Computers 490 Volume 13, Number 5, May 2018 3 Dept. of Civil Engineering, Auburn University, 238 Harbert Engineering Center, Auburn, Alabama, USA. Among interfaces for GPU computing, the Compute Unified Device Architecture CUDA [1] programming model offers a more controllable interface to programmers compared to OpenACC [2]. Not surprisingly, programming GPUs is still time-consuming and error-prone work compared to the experience of serial coding [3]. For the CUDA programming model, professional knowledge of hardware and software is a big challenge to programmers, especially for novices, such as memory management of GPU, optimization technologies and understanding GPU architectures. Some applications or benchmarks are massive and complex, it is practically impossible for developers to re-write them yet it is possible theoretically [4]. To simplify the parallel programming, the OpenACC was released in 2012, which allows the compiler to convert directives to corresponding parallel code. But the conversion may consume more time. Hence users will naturally evaluate the tradeoff between the acceleration performance and the programmer effort. Specifically, even though the performance acceleration can be obtained with parallel models, writing parallel code may be not realistic if programmers need more time or effort on parallelized problems. Many researchers have worked on performance evaluation of OpenACC and CUDA, but to date, few studies have investigated performance based on human subjects such as programmers’ effort per line of