Computational Biology as a Compelling Pedagogical Tool in Computer Science Education Vijayalakshmi Saravanan i Rochester Institute of Technology Rochester, USA vsavse@rit.edu Anpalagan Alagan Ryerson University Toronto, Canada alagan@ee.ryerson.ca Kshirasagar Naik University of Waterloo Waterloo, Canada snaik@uwaterloo.edu ABSTRACT High-performance computing (HPC), and parallel and distributed computing (PDC) are widely discussed topics in computer science (CS) and computer engineering (CE) education. In the past decade, high-performance computing has also contributed significantly to addressing complex problems in bio-engineering, healthcare and systems biology. Therefore, computational biology applications provide several compelling examples that can be potent pedagogical tools in teaching high-performance computing. In this paper, we introduce a novel course curriculum to teach high- performance, parallel and distributed computing to senior graduate students (PhD) in a hands-on setup through examples drawn from a wealth of areas in computational biology. We introduce the concepts of parallel programming, algorithms and architectures and implementations via carefully chosen examples from computational biology. We believe that this course curriculum will provide students an engaging and refreshing introduction to this well-established domain. Keywords Pedagogical Tools · High-Performance Computing (HPC) · Parallel and Distributed Computing (PDC) · Computational Biology. 1. INTRODUCTION Over the last few years, computational biology has revolutionized medical research, bringing in novel analysis tools that accelerate diagnosis and drug discovery. The enormous amount of experimental data generated by the human genome project, proteomics, and clinical research has fostered this revolution by enabling extremely accurate, albeit complex models for various biological phenomena. The analysis of these models requires high processing power and time to find accurate solutions, making them attractive candidates for parallelization. For example, high- performance parallel computing has successfully contributed to the understanding of protein dynamics [1], ion channels and cellular reaction kinetics [2], resulting in several specialized high- throughput tools such as GROMACS, a parallelized molecular simulation toolkit [3]. Further, novel projects such as Folding@home [4] have enabled the pooling of distributed computing resources from around the world to analyze proteins. Recently, bioengineers have begun focusing on reverse engineering biological systems, by reconstructing gene and metabolic networks that describe the interactions between various genes and protein from experimental data. This relatively new area of research requires novel computational tools due to the vastly heterogeneous nature of the data involved [5]. While computer scientists have been able to contribute to improving the performance and accuracy of biological analysis, the striking applications found in the domain can also serve to provide a wealth of motivation for computer scientists. In addition, there are several different methods of implementing these applications, some more easily parallelized than others did (and the best implementation can depend on the application). Therefore, they provide an excellent opportunity for computer science students to gain insight into issues faced by programmers of parallel algorithms. For this reason, we believe that biomedical applications can be a powerful tool in teaching parallel and distributed computing. In this paper, we propose a novel course curriculum that introduces parallel and distributed computing to senior graduate students in a hands-on manner through a set of carefully chosen computational biology applications. We also propose several sample research term projects that can be carried out as a direct extension of the learning outcomes of this course. 1.1 Contribution and Related Work The ACM and NSF/TCPP guidelines recommend that parallel computing is introduced in CS and CE courses from early stages [28][29]. As parallelism and multi-core computing becomes more accessible, academic institutions in India are exploring the introduction of interdisciplinary concepts in CS and CE education. In this context, several courses have been developed to teach the parallel computing programming concepts with real-world examples [30] [31] [32] [33]. The first author has also introduced a course teaching parallelism with hands-on experimental learning activities as a member of the Board of Studies (BoS)/Curriculum Design Committee at Amrita/VIT University, India in 2005-2009. In this course, the author piloted a new course introducing certain concepts in HPC and PDC using real-world applications, including those in computational biology. Drawing upon this experience, the key contribution of this paper is the design of an interdisciplinary course curriculum that uses problems in computational biology as educational tools in computer science education. Currently, several courses designed for biology majors focusing on the fundamentals of parallel and distributed computing [6] [7]. Recently, courses incorporating high-performance computing for medical applications have also been developed [8]. Advanced courses in Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright ©JOCSE, a supported publication of the Shodor Education Foundation Inc. © 2020 Journal of Computational Science Education DOI: https://doi.org/10.22369/issn.2153-4136/11/1/8 Journal of Computational Science Education Volume 11, Issue 1 January 2020 ISSN 2153-4136 45