Computational Biology as a Compelling Pedagogical Tool
in Computer Science Education
Vijayalakshmi Saravanan
i
Rochester Institute of Technology
Rochester, USA
vsavse@rit.edu
Anpalagan Alagan
Ryerson University
Toronto, Canada
alagan@ee.ryerson.ca
Kshirasagar Naik
University of Waterloo
Waterloo, Canada
snaik@uwaterloo.edu
ABSTRACT
High-performance computing (HPC), and parallel and distributed
computing (PDC) are widely discussed topics in computer science
(CS) and computer engineering (CE) education. In the past decade,
high-performance computing has also contributed significantly to
addressing complex problems in bio-engineering, healthcare and
systems biology. Therefore, computational biology applications
provide several compelling examples that can be potent
pedagogical tools in teaching high-performance computing. In this
paper, we introduce a novel course curriculum to teach high-
performance, parallel and distributed computing to senior graduate
students (PhD) in a hands-on setup through examples drawn from
a wealth of areas in computational biology. We introduce the
concepts of parallel programming, algorithms and architectures and
implementations via carefully chosen examples from
computational biology. We believe that this course curriculum will
provide students an engaging and refreshing introduction to this
well-established domain.
Keywords
Pedagogical Tools · High-Performance Computing (HPC) ·
Parallel and Distributed Computing (PDC) · Computational
Biology.
1. INTRODUCTION
Over the last few years, computational biology has revolutionized
medical research, bringing in novel analysis tools that accelerate
diagnosis and drug discovery. The enormous amount of
experimental data generated by the human genome project,
proteomics, and clinical research has fostered this revolution by
enabling extremely accurate, albeit complex models for various
biological phenomena. The analysis of these models requires high
processing power and time to find accurate solutions, making them
attractive candidates for parallelization. For example, high-
performance parallel computing has successfully contributed to the
understanding of protein dynamics [1], ion channels and cellular
reaction kinetics [2], resulting in several specialized high-
throughput tools such as GROMACS, a parallelized molecular
simulation toolkit [3]. Further, novel projects such as
Folding@home [4] have enabled the pooling of distributed
computing resources from around the world to analyze proteins.
Recently, bioengineers have begun focusing on reverse engineering
biological systems, by reconstructing gene and metabolic networks
that describe the interactions between various genes and protein
from experimental data. This relatively new area of research
requires novel computational tools due to the vastly heterogeneous
nature of the data involved [5]. While computer scientists have been
able to contribute to improving the performance and accuracy of
biological analysis, the striking applications found in the domain
can also serve to provide a wealth of motivation for computer
scientists. In addition, there are several different methods of
implementing these applications, some more easily parallelized
than others did (and the best implementation can depend on the
application). Therefore, they provide an excellent opportunity for
computer science students to gain insight into issues faced by
programmers of parallel algorithms. For this reason, we believe that
biomedical applications can be a powerful tool in teaching parallel
and distributed computing. In this paper, we propose a novel course
curriculum that introduces parallel and distributed computing to
senior graduate students in a hands-on manner through a set of
carefully chosen computational biology applications. We also
propose several sample research term projects that can be carried
out as a direct extension of the learning outcomes of this course.
1.1 Contribution and Related Work
The ACM and NSF/TCPP guidelines recommend that parallel
computing is introduced in CS and CE courses from early stages
[28][29]. As parallelism and multi-core computing becomes more
accessible, academic institutions in India are exploring the
introduction of interdisciplinary concepts in CS and CE education.
In this context, several courses have been developed to teach the
parallel computing programming concepts with real-world
examples [30] [31] [32] [33]. The first author has also introduced a
course teaching parallelism with hands-on experimental learning
activities as a member of the Board of Studies (BoS)/Curriculum
Design Committee at Amrita/VIT University, India in 2005-2009.
In this course, the author piloted a new course introducing certain
concepts in HPC and PDC using real-world applications, including
those in computational biology. Drawing upon this experience, the
key contribution of this paper is the design of an interdisciplinary
course curriculum that uses problems in computational biology as
educational tools in computer science education. Currently, several
courses designed for biology majors focusing on the fundamentals
of parallel and distributed computing [6] [7]. Recently, courses
incorporating high-performance computing for medical
applications have also been developed [8]. Advanced courses in
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Copyright ©JOCSE, a
supported publication of the Shodor Education Foundation Inc.
© 2020 Journal of Computational Science Education
DOI: https://doi.org/10.22369/issn.2153-4136/11/1/8
Journal of Computational Science Education Volume 11, Issue 1
January 2020 ISSN 2153-4136 45