BioSystems 101 (2010) 222–232 Contents lists available at ScienceDirect BioSystems journal homepage: www.elsevier.com/locate/biosystems Applying DNA computation to intractable problems in social network analysis Rick C.S. Chen, Stephen J.H. Yang ∗ Department of Computer Science & Information Engineering, National Central University, Taiwan article info Article history: Received 3 November 2009 Received in revised form 12 February 2010 Accepted 28 May 2010 Keywords: Social network analysis Cohesive subgroup N-clique N-clan N-club DNA-computing abstract From ancient times to the present day, social networks have played an important role in the formation of various organizations for a range of social behaviors. As such, social networks inherently describe the complicated relationships between elements around the world. Based on mathematical graph theory, social network analysis (SNA) has been developed in and applied to various ﬁelds such as Web 2.0 for Web applications and product developments in industries, etc. However, some deﬁnitions of SNA, such as ﬁnding a clique, N-clique, N-clan, N-club and K-plex, are NP-complete problems, which are not eas- ily solved via traditional computer architecture. These challenges have restricted the uses of SNA. This paper provides DNA-computing-based approaches with inherently high information density and mas- sive parallelism. Using these approaches, we aim to solve the three primary problems of social networks: N-clique, N-clan, and N-club. Their accuracy and feasible time complexities discussed in the paper will demonstrate that DNA computing can be used to facilitate the development of SNA. © 2010 Elsevier Ireland Ltd. All rights reserved. 1. Introduction Social relations link actors such as people, ﬁrms, communica- tion users, Internet peers, animals, etc., forming complex social networks around the world. Since 1943, social network analysis (SNA) has been employed to analyze these relations, and the ensu- ing technology has also inspired many applications in various ﬁelds. In 2006, Batallas and Yassine used SNA to facilitate product devel- opment. Companies ﬁnd it beneﬁcial to systematically develop their products under the coordination of hundreds and even thou- sands of specialists. Web 2.0 was ﬁrst mentioned in O’Reilly Media Web 2.0 conference in 2004. Web 2.0-based applications are boom- ing because SNA plays an important role in implementing the notions of information sharing, interoperability and collaboration on the World Wide Web. Furthermore, the technology is also used extensively in other academic ﬁelds, including economics, biology, communication studies, geography, social psychology and sociolin- guistics. In 1994, Wasserman and Faust carried out comprehensive investigations of SNA, using various measures on different lev- els. Cohesive subgroups form one of their measures (deﬁned in Wasserman, 1994, p. 249), and the authors conclude that “Cohesive subgroups are subsets of actors among whom there are relatively strong, direct, intense, frequent or positive ties.” Based on this mea- sure, actors are classiﬁed into several subsets according to a speciﬁc relation in a social network. To detect the coherence subgroups, ∗ Corresponding author. E-mail address: chungshiuan@csie.ncu.edu.tw (S.J.H. Yang). researchers usually use graph theory to precisely deﬁne different types of coherence subgroups, where vertices stand for actors and edges stand for relations among them. From 1949 to 1991, the proposed deﬁnitions included cliques, N-cliques, N-clans, N-clubs, K-plexs, and K-cores LS sets and lambda sets. They are also com- monly deployed in a large number of applications. However, ﬁnding most of these subgroups using silicon-based computers presents an NP-complete problem, which requires rapidly increasing com- puting time as the problem size grows. This challenge restricts the uses of SNA. In 1994, Adleman initially solved the Hamiltonian path problem, a well-known NP-complete problem, by deoxyribonucleic acid (DNA) computation technology, which uses DNA, biochem- istry and molecular biology. In this excellent work, a solution space is generated by encoding DNA strands. The feasible solutions are then revealed with the help of biochemical procedures to ﬁlter infeasible solutions out from the space in parallel. The powers of parallelism and vast memory allow DNA-based approaches to solve huge NP-complete problems. In addition, Lipton (1995) pro- posed DNA experiments to solve the satisﬁability (SAT) problem. In 1997, Ouyang et al. pioneered the solution of the maximal clique problem using a DNA-based approach. Although their proposed algorithm is efﬁciently performed in O(n), its extensions to solve maximal N-clique, maximal N-clan and maximal N-club are silicon and biochemistry mixed-computations, which will lead to heavy computational loading and labor-intensive errors. This paper revises the idea proposed by Ouyang et al. based on Adleman’s approach, and aims to solve the problems of ﬁnding maximal N-clique, maximal N-clan and maximal N-club. In addition to their accuracy, we will prove the algorithms to be efﬁcient. The remainder of this paper is organized as follows: Section 2 provides 0303-2647/$ – see front matter © 2010 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2010.05.006