Six (one archaean and five eukaryotic)
protein families have similar domain
architecture that includes a central globular
Brix domain, and optional N- and
obligatory C-terminal segments, both w ith
charged low-complexity regions. Biological
data for some proteins in this superfamily
suggest a role in ribosome biogenesis and
rRNA binding.
The biogenesis of ribosomes is a multistep
process requiring dozens of proteins and
small nucleolar RNAs as essential
supporting factors
1
. Recently, a
previously uncharacterized transcript
isolated from stage 11 Xenopus laevis
embryos and later named Brix (biogenesis
of ribosomes in Xenopus, AF319877) was
linked with the pathway of ribosome
formation. In HeLa cells transiently
transfected with a cDNA encoding a green
fluorescent protein–Brix fusion protein,
this protein was found exclusively in
nucleoli and coiled (Cajal) bodies,
showing complete colocalization with
fibrillarin. Binding assays indicate an
association of Brix with rRNAs
(preferentially of the large ribosomal
subunit) but not with U3 or U8 snoRNAs.
Additionally, the knockout of yol077c/brx1
(S66770), the yeast orthologue of Brix,
was lethal and displayed defects in the
synthesis of large ribosomal subunits.
Publication of these experimental data is
currently in preparation (Kaser, A.,
Bogengruber, E. et al., unpublished).
Six families of homologous proteins with
similar domain architecture…
A thorough analysis of the Brix protein
sequence (339 residues) yielded several
surprising findings. After applying a
homologue-searching strategy in the
non-redundant protein sequence
database including iterative profile
searches with the PSI-BLAST tool
(with standard sequence inclusion
condition E<0.002 and filters for
compositional bias
2
) and a fan-like search
heuristic (starting individual new
searches with each of the significant hits),
we collected a closed set of ~50 protein
sequences. The common region of
similarity among all superfamily members
includes a sequence domain of 150–180
residues length that we propose to call the
Brix domain (Fig. 1, Brix homepage
http://mendel.imp.univie.ac.at/
SEQUENCES/BRIX/).
Based on the degree of sequence
similarity, the superfamily can be divided
into six families: one archaean family (I)
including hypothetical proteins (one per
genome); and five eukaryote families, each
named according to a representative
member and including close homologues of
this prototype: (II) Peter Pan
(D. melanogaster) and Ssf1/2 (S. cerevisiae);
(III) yhr088wp/Rpf1p (S. cerevisiae);
(IV) IMP4 (S. cerevisiae); (V) brix
(X. laevis) and yol077c/brx1 (S. cerevisiae);
and (VI) ykr081cp (S. cerevisiae). In
database searches, each sequence was
found to identify its own family first and
then, with a gap in E-value of many orders
of magnitude, encounters the first
non-family hit. The nature of the first
non-family hit reveals the relative
positions of the families in the sequence
space and possible evolutionary
relationships: groups I and IV are closely
related; searches with sequences from
groups III and V find members of the IMP4
family (IV) to be the closest non-family
neighbours; and similarly, groups II and VI
get linked to groups III and V, respectively.
Interestingly, each of the extensively
sequenced eukaryote genomes (baker’s
and fission yeasts, worm, fly, human and
Arabidopsis thaliana) has at least one
representative in each of the six families.
We compared the Brix domain with
available domain libraries and found
that the PFAM entry PF01945 (Ref. 3)
unifies groups I, III and IV into one class
but does not recognize the other three
groups. Sequences of subfamily II have,
except for the L. major sequence,
already been collected by Migeon et al.
4
While this work was in press, an
independently collected subset of the
Brix domain family, including members
from all six groups, has been published
by Mayer et al.
5
Typically, a protein sequence
belonging to the Brix domain superfamily
contains a highly charged N-terminal
segment (∼50 residues) followed by a
single copy of the Brix domain and
another highly charged C-terminal
region (∼100 residues). Generally,
positive charges are more frequent (the
content of Lys and Arg is usually >20%).
In the regions flanking the Brix domain,
low-complexity segments are common
6
.
The archaean sequences have two unique
characteristics: (1) the charged regions
are totally absent at the N-terminus and
are reduced in number to ∼10 residues at
the C-terminus; and (2) the C-terminal
part of the Brix domain itself is minimal.
Two eukaryote groups have large
insertions within the C-terminal region:
∼70 residues in the group III of
yhro88w/Rpf1p-like proteins and ∼120
residues in the Peter Pan group II (Fig. 1).
This finding is in agreement with the
results of similarity searches suggesting
that group III is the closest neighbour of
group II. Three worm sequences [T32923
(family II), and T19409 and P54073
(both family III)] are much longer
(∼700 residues) in contrast to the strong
conservation of sequence architecture
between other species; it is possible that
these are in part gene prediction or
genome assembly artifacts. Noticeably,
the hypothetical human protein
CAB77112.1 (CAC18877.1, group II) has
also a similarly excessive length
7
.
Additional information on Brix family
members, family collection history,
analysis of terminal charged regions
and sequence artifacts is given in great
detail on the Brix homepage
(http://mendel.imp.univie.ac.at/
SEQUENCE/BRIX/).
…with a suggested role in ribosomal
biogenesis
More than 80% of all superfamily
members (including essentially all family
I, III and VI proteins) lack any functional
TRENDS in Biochemical Sciences Vol.26 No.6 June 2001
http://tibs.trends.com 0968-0004/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S0968-0004(01)01851-5
345 Research Update
Protein Sequence Motif
The Brix domain protein family – a key to the ribosomal
biogenesis pathway?
Frank Eisenhaber, Christian Wechselberger and Günther Kreil