Journal of Biotechnology 78 (2000) 221 – 234
The role SWISS-PROT and TrEMBL play in the genome
research environment
Vivien Junker *, Sergio Contrino, Wolfgang Fleischmann, Henning Hermjakob,
Fiona Lang, Michele Magrane, Maria Jesus Martin, Nicoletta Mitaritonna,
Claire O’Donovan, Rolf Apweiler
EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire,
CB10 1SD, UK
Received 1 February 1999; accepted 5 July 1999
Abstract
SWISS-PROT, a curated protein sequence data bank, contains not only sequence data but also annotation relevant
to a particular sequence. The annotation added to each entry is done by a team of biologists and comes, primarily,
from articles in journals reporting the actual sequencing and sometimes characterisation. Review articles and
collaboration with external experts also play a role along with the use of secondary databases like PROSITE and
Pfam in addition to a variety of feature prediction methods. Annotation added by these methods is checked for
relevance and likelihood to a particular sequence. The onset of genome sequencing has led to a dramatic increase in
sequence data to be included in SWISS-PROT. This has led to the production of TrEMBL (Translation of the EMBL
database). TrEMBL consists of entries in a SWISS-PROT format that are derived from the translation of all coding
sequences in the EMBL nucleotide sequence database, that are not in SWISS-PROT. Unlike SWISS-PROT entries
those in TrEMBL are awaiting manual annotation. However, rather than just representing basic sequence and source
information, steps have been taken to add features and annotation automatically. In taking these steps it is hoped
that TrEMBL entries are enhanced with some indication as to what a protein is, could or may be. © 2000 Elsevier
Science B.V. All rights reserved.
Keywords: Genome sequence data; Annotation; Automation; SWISS-PROT; TrEMBL
www.elsevier.com/locate/jbiotec
1. Introduction
SWISS-PROT (Bairoch and Apweiler, 1999) is
a curated protein sequence data bank, which
strives to provide the necessary parameters of a
public sequence database. Namely, to provide a
high level of annotation (such as the description
of the function of the protein, post-translational
modifications, variants, etc), to have a minimal
level of redundancy and to provide a high level of
integration with other databases. TrEMBL
(Translation of EMBL) (Bairoch and Apweiler,
1999) is a computer-annotated supplement to
* Corresponding author. Fax: +44-1223-494472.
E-mail address: junker@ebi.ac.uk (V. Junker)
0168-1656/00/$ - see front matter © 2000 Elsevier Science B.V. All rights reserved.
PII:S0168-1656(00)00198-X