© 2000 Macmillan Magazines Ltd “H ow often have I said that when you eliminate the impossible, whatever remains, however improbable, must be the truth?” exhorted the great sleuth 1 . The principle of arriving at the truth by elimina- tion is ancient; but on page 175 of this issue, Liu et al. 2 report a new technique for mas- sively parallel elimination, which harnesses the power of DNA chemistry and biotech- nology to solve a particularly difficult prob- lem in mathematical logic. The difficulty of finding solutions to mathematical problems is classified by the speed at which the best algorithm can com- pute their solutions. ‘Easy’ problems have algorithms with ‘running times’ that scale as a polynomial function of the number of variables (polynomial time or P problems). There is also a class of problems charac- terized by proofs that are easy to verify (non-deterministic polynomial time or NP problems), such as the famous travelling salesman problem. In the worst case, ‘hard’ NP problems have running times that grow exponentially with the number of the vari- ables. For example, finding a factor of a given natural number N cannot be done in polynomial time, but verifying that another number d is a factor of N is easy. Computer scientists have been intensively studying whether sequential algorithms can solve all NP problems in polynomial time, but the answer is still unknown. In 1994, Leonard Adleman 3 shocked the computing world by presenting a DNA- based polynomial-time method for the Hamilton path problem (Fig. 1a), the prob- lem of finding an airline flight path between several cities on a map such that each city is visited only once. This NP problem is known to be one of the hardest. In order to achieve the small computation time, Adleman traded space (the amount of DNA needed) for time (the number of biochemical steps to be used). His key insight was that cities on a map, and paths between pairs of cities, may be encoded in strands of DNA. Millions of DNA strands, diffusing in a liquid, can self- assemble into all possible flight-path config- urations, from which a judicious series of molecular manoeuvres can fish out the cor- rect solution. Adleman, combining elegance with brute force, could isolate the one true solution out of many possibilities. Every NP problem can be seen as the search for a solution that simultaneously satisfies a number of logical clauses, each composed of three variables (which can be true or false), connected by ‘or’ statements: for example ( x 1 OR x 2 OR x – 3 ) AND ( x – 4 OR x 5 OR x – 6 ). This particular problem, known as 3-SAT, is the hardest of all NP problems. Liu et al. 2 show how to solve a simple case of 3-SAT in a reasonable amount of time by using a brute-force search made possible by the parallel nature of their DNA computing techniques. They begin with a string of binary num- bers representing the variables in a given 3-SAT formula. Such a binary string can be represented by a unique sequence of nucleo- tides in single-stranded DNA; for example, TGCGG might stand for 001. For n variables, there are 2 n unique answer (or Watson) strands, so for three variables you need eight Watson strands. For each Watson strand, there is also a complementary Crick strand created by the base-pairing rule — A bonds to T, and C bonds to G. The goal is to identify those strings out of a library of eight that satisfy all the clauses of a particular 3-SAT formula (Fig. 1b). Liu et al. 2 first immobilized the Watson DNA strings corresponding to all candidate solutions on a specially treated gold surface. Next they added all possible Crick strands that will stick to a Watson string satisfying the first clause. Such pairing creates double- stranded DNA. The remaining single- stranded molecules are those that do not satisfy the first clause, and these are destroyed by enzymes. The surface is then heated to melt away the complementary strands, washed and a fresh collection of Crick strands is paired with strings satisfying the second clause. This cycle is repeated for each of the remaining clauses. At the end, only those strands whose sequence satisfies the original formula survive. In this system, the DNA ‘answers’ are attached randomly to the surface (rather than in an ordered array) so, to read out the answer, the surviving strands first have to be amplified using the polymerase chain reaction. Their identities are then deter- mined by pairing with an ordered array of strings identical to the original set of sequences. Not counting the number of steps required to produce the DNA molecules in the first place, the algorithm takes only (3k+1) steps, where k is the number of clauses, for a brute-force evaluation of all 2 n possible answers. This represents a remark- able improvement over the best conventional NATURE| VOL 403 | 13 JANUARY2000 | www.nature.com 143 news and views DNA computing on a chip Mitsunori Ogihara and Animesh Ray In a DNA computer, the input and output are both strands of DNA. A computer in which the strands are attached to the surface of a chip can now solve difficult problems quite quickly. 2 1 3 4 5 6 7 a x 1 x 2 x 3 0 0 0 ATGCC 1 0 0 1 TGCGG 2 0 1 0 AAGCG 3 0 1 1 CCTAT 4 1 0 0 TAGAC 5 1 0 1 GGATT 6 1 1 0 CTTCG 7 1 1 1 GTAAT 8 Binary string DNA string Surface b Figure 1 The parallel power of DNA computing. a, An example of the Hamilton path problem solved by Adleman 3 . Can you go from node 1 to node 7 using only the paths shown such that you visit all the nodes exactly once? The answer is positive. b, The hardest of such computationally difficult or NP problems is 3-SAT. In order to find a solution to the 3-SAT problem defined by these two clauses (x 1 OR x 2 OR x – 3 ) AND (x – 1 OR x 2 OR x – 3 ), Liu et al. 2 attach DNA strings encoding all possible answers to a specially treated surface. Complementary DNA strands that satisfy the first clause are added to the solution, and stick to strands numbered 1 and 3–8. The remaining single strand 2 is destroyed by enzymes. The complementary strands are removed and the surface is washed. The cycle is repeated for the second clause, which results in the destruction of strand 6. The identities of the remaining strands are read out to give the correct solutions to the problem: 000, 010, 011, 100, 110, 111.