Physica A 264 (1999) 570–580 Random aggregation models for the formation and evolution of coding and non-coding DNA A. Provata Institute of Physical Chemistry, NRCPS Demokritos, 15310, Athens, Greece Received 28 July 1998 Abstract A random aggregation model with inux is proposed for the formation of the non-coding DNA regions via random co-aggregation and inux of biological macromolecules such as viruses, par- asite DNA, and replication segments. The constant mixing (transpositions) and inux drives the system in an out-of-equilibrium steady state characterised by a power law size distribution. The model predicts the long range distributions found in the noncoding eucaryotic DNA and explains the observed correlations. For the formation of coding DNA a random closed aggregation model is proposed which predicts short range coding size distributions. The closed aggregation process drives the system in an almost “frozen” stable state which is robust to external perturbations and which is characterised by well dened space and time scales, as observed in coding sequences. c 1999 Elsevier Science B.V. All rights reserved. PACS: 02.50; 05.40; 87.10 Keywords: Power law; Long range correlations; Coding = non-coding DNA sequences; Out-of-equilibrium steady state; Aggregation 1. Introduction Recent studies on the structure of DNA macromolecules have revealed the existence of unexpected long range correlations in the non-coding part of higher eucariotic DNA, while the coding part of most organisms seems to be statistically uncorrelated [1– 7]. The long range correlations were rst demonstrated via the “DNA walk” model proposed by Peng et al. in 1992 [1]. Later, the same conclusions were reached by examining the size distribution of Purine and Pyrimidine clusters in pure coding and non-coding regions of dierent organisms [2,3]. * E-mail: aprovata@limnos.nrcps.ariadne-t.gr. 0378-4371/99/$ – see front matter c 1999 Elsevier Science B.V. All rights reserved. PII: S0378-4371(98)00546-9