Six (one archaean and five eukaryotic) protein families have similar domain architecture that includes a central globular Brix domain, and optional N- and obligatory C-terminal segments, both w ith charged low-complexity regions. Biological data for some proteins in this superfamily suggest a role in ribosome biogenesis and rRNA binding. The biogenesis of ribosomes is a multistep process requiring dozens of proteins and small nucleolar RNAs as essential supporting factors 1 . Recently, a previously uncharacterized transcript isolated from stage 11 Xenopus laevis embryos and later named Brix (biogenesis of ribosomes in Xenopus, AF319877) was linked with the pathway of ribosome formation. In HeLa cells transiently transfected with a cDNA encoding a green fluorescent protein–Brix fusion protein, this protein was found exclusively in nucleoli and coiled (Cajal) bodies, showing complete colocalization with fibrillarin. Binding assays indicate an association of Brix with rRNAs (preferentially of the large ribosomal subunit) but not with U3 or U8 snoRNAs. Additionally, the knockout of yol077c/brx1 (S66770), the yeast orthologue of Brix, was lethal and displayed defects in the synthesis of large ribosomal subunits. Publication of these experimental data is currently in preparation (Kaser, A., Bogengruber, E. et al., unpublished). Six families of homologous proteins with similar domain architecture A thorough analysis of the Brix protein sequence (339 residues) yielded several surprising findings. After applying a homologue-searching strategy in the non-redundant protein sequence database including iterative profile searches with the PSI-BLAST tool (with standard sequence inclusion condition E<0.002 and filters for compositional bias 2 ) and a fan-like search heuristic (starting individual new searches with each of the significant hits), we collected a closed set of ~50 protein sequences. The common region of similarity among all superfamily members includes a sequence domain of 150–180 residues length that we propose to call the Brix domain (Fig. 1, Brix homepage http://mendel.imp.univie.ac.at/ SEQUENCES/BRIX/). Based on the degree of sequence similarity, the superfamily can be divided into six families: one archaean family (I) including hypothetical proteins (one per genome); and five eukaryote families, each named according to a representative member and including close homologues of this prototype: (II) Peter Pan (D. melanogaster) and Ssf1/2 (S. cerevisiae); (III) yhr088wp/Rpf1p (S. cerevisiae); (IV) IMP4 (S. cerevisiae); (V) brix (X. laevis) and yol077c/brx1 (S. cerevisiae); and (VI) ykr081cp (S. cerevisiae). In database searches, each sequence was found to identify its own family first and then, with a gap in E-value of many orders of magnitude, encounters the first non-family hit. The nature of the first non-family hit reveals the relative positions of the families in the sequence space and possible evolutionary relationships: groups I and IV are closely related; searches with sequences from groups III and V find members of the IMP4 family (IV) to be the closest non-family neighbours; and similarly, groups II and VI get linked to groups III and V, respectively. Interestingly, each of the extensively sequenced eukaryote genomes (baker’s and fission yeasts, worm, fly, human and Arabidopsis thaliana) has at least one representative in each of the six families. We compared the Brix domain with available domain libraries and found that the PFAM entry PF01945 (Ref. 3) unifies groups I, III and IV into one class but does not recognize the other three groups. Sequences of subfamily II have, except for the L. major sequence, already been collected by Migeon et al. 4 While this work was in press, an independently collected subset of the Brix domain family, including members from all six groups, has been published by Mayer et al. 5 Typically, a protein sequence belonging to the Brix domain superfamily contains a highly charged N-terminal segment (50 residues) followed by a single copy of the Brix domain and another highly charged C-terminal region (100 residues). Generally, positive charges are more frequent (the content of Lys and Arg is usually >20%). In the regions flanking the Brix domain, low-complexity segments are common 6 . The archaean sequences have two unique characteristics: (1) the charged regions are totally absent at the N-terminus and are reduced in number to 10 residues at the C-terminus; and (2) the C-terminal part of the Brix domain itself is minimal. Two eukaryote groups have large insertions within the C-terminal region: 70 residues in the group III of yhro88w/Rpf1p-like proteins and 120 residues in the Peter Pan group II (Fig. 1). This finding is in agreement with the results of similarity searches suggesting that group III is the closest neighbour of group II. Three worm sequences [T32923 (family II), and T19409 and P54073 (both family III)] are much longer (700 residues) in contrast to the strong conservation of sequence architecture between other species; it is possible that these are in part gene prediction or genome assembly artifacts. Noticeably, the hypothetical human protein CAB77112.1 (CAC18877.1, group II) has also a similarly excessive length 7 . Additional information on Brix family members, family collection history, analysis of terminal charged regions and sequence artifacts is given in great detail on the Brix homepage (http://mendel.imp.univie.ac.at/ SEQUENCE/BRIX/). with a suggested role in ribosomal biogenesis More than 80% of all superfamily members (including essentially all family I, III and VI proteins) lack any functional TRENDS in Biochemical Sciences Vol.26 No.6 June 2001 http://tibs.trends.com 0968-0004/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S0968-0004(01)01851-5 345 Research Update Protein Sequence Motif The Brix domain protein family – a key to the ribosomal biogenesis pathway? Frank Eisenhaber, Christian Wechselberger and Günther Kreil