BioSystems 101 (2010) 222–232
Contents lists available at ScienceDirect
BioSystems
journal homepage: www.elsevier.com/locate/biosystems
Applying DNA computation to intractable problems in social network analysis
Rick C.S. Chen, Stephen J.H. Yang
∗
Department of Computer Science & Information Engineering, National Central University, Taiwan
article info
Article history:
Received 3 November 2009
Received in revised form 12 February 2010
Accepted 28 May 2010
Keywords:
Social network analysis
Cohesive subgroup
N-clique
N-clan
N-club
DNA-computing
abstract
From ancient times to the present day, social networks have played an important role in the formation
of various organizations for a range of social behaviors. As such, social networks inherently describe the
complicated relationships between elements around the world. Based on mathematical graph theory,
social network analysis (SNA) has been developed in and applied to various fields such as Web 2.0 for
Web applications and product developments in industries, etc. However, some definitions of SNA, such
as finding a clique, N-clique, N-clan, N-club and K-plex, are NP-complete problems, which are not eas-
ily solved via traditional computer architecture. These challenges have restricted the uses of SNA. This
paper provides DNA-computing-based approaches with inherently high information density and mas-
sive parallelism. Using these approaches, we aim to solve the three primary problems of social networks:
N-clique, N-clan, and N-club. Their accuracy and feasible time complexities discussed in the paper will
demonstrate that DNA computing can be used to facilitate the development of SNA.
© 2010 Elsevier Ireland Ltd. All rights reserved.
1. Introduction
Social relations link actors such as people, firms, communica-
tion users, Internet peers, animals, etc., forming complex social
networks around the world. Since 1943, social network analysis
(SNA) has been employed to analyze these relations, and the ensu-
ing technology has also inspired many applications in various fields.
In 2006, Batallas and Yassine used SNA to facilitate product devel-
opment. Companies find it beneficial to systematically develop
their products under the coordination of hundreds and even thou-
sands of specialists. Web 2.0 was first mentioned in O’Reilly Media
Web 2.0 conference in 2004. Web 2.0-based applications are boom-
ing because SNA plays an important role in implementing the
notions of information sharing, interoperability and collaboration
on the World Wide Web. Furthermore, the technology is also used
extensively in other academic fields, including economics, biology,
communication studies, geography, social psychology and sociolin-
guistics. In 1994, Wasserman and Faust carried out comprehensive
investigations of SNA, using various measures on different lev-
els. Cohesive subgroups form one of their measures (defined in
Wasserman, 1994, p. 249), and the authors conclude that “Cohesive
subgroups are subsets of actors among whom there are relatively
strong, direct, intense, frequent or positive ties.” Based on this mea-
sure, actors are classified into several subsets according to a specific
relation in a social network. To detect the coherence subgroups,
∗
Corresponding author.
E-mail address: chungshiuan@csie.ncu.edu.tw (S.J.H. Yang).
researchers usually use graph theory to precisely define different
types of coherence subgroups, where vertices stand for actors and
edges stand for relations among them. From 1949 to 1991, the
proposed definitions included cliques, N-cliques, N-clans, N-clubs,
K-plexs, and K-cores LS sets and lambda sets. They are also com-
monly deployed in a large number of applications. However, finding
most of these subgroups using silicon-based computers presents
an NP-complete problem, which requires rapidly increasing com-
puting time as the problem size grows. This challenge restricts the
uses of SNA. In 1994, Adleman initially solved the Hamiltonian path
problem, a well-known NP-complete problem, by deoxyribonucleic
acid (DNA) computation technology, which uses DNA, biochem-
istry and molecular biology. In this excellent work, a solution space
is generated by encoding DNA strands. The feasible solutions are
then revealed with the help of biochemical procedures to filter
infeasible solutions out from the space in parallel. The powers
of parallelism and vast memory allow DNA-based approaches to
solve huge NP-complete problems. In addition, Lipton (1995) pro-
posed DNA experiments to solve the satisfiability (SAT) problem.
In 1997, Ouyang et al. pioneered the solution of the maximal clique
problem using a DNA-based approach. Although their proposed
algorithm is efficiently performed in O(n), its extensions to solve
maximal N-clique, maximal N-clan and maximal N-club are silicon
and biochemistry mixed-computations, which will lead to heavy
computational loading and labor-intensive errors.
This paper revises the idea proposed by Ouyang et al. based
on Adleman’s approach, and aims to solve the problems of finding
maximal N-clique, maximal N-clan and maximal N-club. In addition
to their accuracy, we will prove the algorithms to be efficient. The
remainder of this paper is organized as follows: Section 2 provides
0303-2647/$ – see front matter © 2010 Elsevier Ireland Ltd. All rights reserved.
doi:10.1016/j.biosystems.2010.05.006