These are for the example in the "Example of an E-Mail Request" section. Click each plot to view a PDF version locally at higher resolution.
Requesting e-mail message
This shows the e-mail message as it would be composed by the user. The
WWW interface also generates something that
looks like an email message internally, but the user only sees this as
an attachment to the acknowledgement message.
To: psa-request@darwin.bu.edu
Subject: Seq 23
Analysis-assumptions: monomeric-soluble
; psa-plot-format: postscript
; Wilson Brandlesnarf
; BMERC
; Boston MA
; 617-353-7123
Sequence 23
GWEIPEPYVWDESFRVFYEQLDEEHKKIFKGIFDCIRDNSAPNLATLVKV
TTNHFTHEEAMMDAAKYSEVVPHKKMHKDFLEKIGGLSAPVDAKNVDYCK
EWLVNHIKGTDFKYKGKL
Regards, Wilson
This is the same message as illustrated in the "Example of an E-Mail Request"
section; see there for an explanation of the syntax of e-mail messages.
The first three lines are part of the e-mail header (it probably looks
different in every system ever written for composing e-mail, so your
system is unlikely to be an exception).
It also includes the request ID assigned by the server upon receipt,
and an indication of the server queue size. If the sequence length
and/or label are not as expected, it could mean that the server had
trouble parsing the message; in that case, please recheck and try again.
This section covers the first e-mail message returned to the user
when the analysis is complete. Since it is fairly large, we break it
into pieces for purposes of discussion; click here to see the full text of the cover letter.
After the initial "announcement" paragraph, there are several
paragraphs explaining the other messages, and how to view the plots; we
have omitted those here.
Finally, the transcript from the compute engine is included.
E-mail acknowledgement from the server
The acknowledgement consists mostly of an echo of the original mail
message (together with whatever e-mail headers were added in transit).
From: psa@darwin.bu.edu (Protein Structure Analysis server)
To: wb@darwin.bu.edu
Subject: Received request 14756: [Seq 23]
Date: Wed, 18 Nov 1998 17:52:50 -0500
We have received your request dated "Wed, 18 Nov 1998 17:52:38 -0500"
containing an amino acid sequence of 118 residues labelled "Sequence
23" for a protein structure analysis run; it has been queued as
request number 14756. There are no requests ahead of it in the queue.
--------------------------- Original message ---------------------------
Date: Wed, 18 Nov 1998 17:52:38 -0500
Message-Id: <199811182252.RAA15950@gamow>
From: Wilson Brandlesnarf <wb@darwin.bu.edu>
To: psa-request@darwin.bu.edu
Subject: Seq 23
Analysis-assumptions: monomeric-soluble
; psa-plot-format: postscript
; Wilson Brandlesnarf
; BMERC
; Boston MA
; 617-353-7123
Sequence 23
GWEIPEPYVWDESFRVFYEQLDEEHKKIFKGIFDCIRDNSAPNLATLVKV
TTNHFTHEEAMMDAAKYSEVVPHKKMHKDFLEKIGGLSAPVDAKNVDYCK
EWLVNHIKGTDFKYKGKL
Regards, Wilson
E-mail results cover letter
From: psa@darwin.bu.edu (Protein Structure Analysis server)
To: wb@darwin.bu.edu
Subject: Request 14756 result (1 of 4): [Seq 23]
Date: Wed, 18 Nov 1998 17:55:15 -0500
The analysis of your protein sequence has been completed. A search of
the Protein Data Bank, using Blast, indicates that your sequence is
similar to the proteins 1A7D (length 118), 2MHR (length 118), 1A7E
(length 118), 1HRB (length 113), 2HMQA (length 113), 2HMZA (length
113), 1HMDA (length 113), and 1HMOA (length 113), which all have known
structures. The following analysis results were generated without
reference to these known structures or any of their known homologs.
Note how the server has caught the fact that we have submitted a
sequence of known structure in order to test the server;
"Sequence 23" is in fact the sequence of PDB locus
2mhr.
------------------------------- Sequence -------------------------------
; This is the actual sequence used.
Sequence 23
GWEIP EPYVW DESFR VFYEQ LDEEH KKIFK GIFDC IRDNS APNLA TLVKV
TTNHF THEEA MMDAA KYSEV VPHKK MHKDF LEKIG GLSAP VDAKN VDYCK
EWLVN HIKGT DFKYK GKL1
Following the text, the sequence is echoed in the form used by the
server software.
------------------------------ Transcript ------------------------------
Analyzing Sequence 23. This is 18-Nov-98 (17:51:3).
Using the Type-1 DSM library mdata11o.
The sequence contains 118 residues.
30 length-compatible Type-1 DSMs are available for analyzing this sequence.
FILTERING RESULTS:
3 Most Probable Super Classes:
1st Superclass alpha has probability 0.41285
2nd Superclass irregular has probability 0.32734
3rd Superclass alpha-beta has probability 0.23807
3 Most Probable Macro Classes:
1st Macroclass apb has probability 0.41285
2nd Macroclass ir has probability 0.32734
3rd Macroclass sab has probability 0.15959
Secondary-Structure Probabilities:
RESIDUE LOOP HELIX TURN STRAND
1 1 0 0 0
2 0.995 0.000 0 0.004
3 0.967 0.012 0 0.021
4 0.953 0.017 0 0.030
5 0.945 0.022 0.001 0.032
6 0.913 0.038 0.005 0.043
7 0.878 0.053 0.005 0.063
8 0.754 0.127 0.007 0.112
9 0.687 0.183 0.010 0.119
10 0.483 0.386 0.022 0.109
11 0.436 0.437 0.077 0.050
12 0.281 0.592 0.090 0.037
13 0.189 0.683 0.091 0.038
14 0.163 0.709 0.076 0.053
15 0.151 0.727 0.024 0.097
16 0.136 0.743 0.012 0.109
17 0.117 0.767 0.012 0.104
18 0.073 0.837 0.016 0.074
19 0.052 0.883 0.014 0.052
20 0.033 0.928 0.016 0.024
. . .
110 0.756 0.161 0.052 0.031
111 0.808 0.104 0.026 0.062
112 0.850 0.055 0.014 0.081
113 0.875 0.030 0.004 0.091
114 0.893 0.018 0.003 0.087
115 0.913 0.011 0.004 0.071
116 0.947 0.003 0.004 0.046
117 0.967 0.001 0.002 0.030
118 1 0 0 0
End of Log file for Sequence 23.
The transcript gives exact values (as opposed to reading the plots) for
the three most probable superclasses and macroclasses (shown graphically
in the structural class probability plot), and
secondary-structure probabilities (shown graphically in the secondary-structure probability plot). The
superclasses and macroclasses themselves are described in more detail on
the "Description of Type-1 DSMs" page.
For brevity, we omit here the secondary-structure probabilities for residues 21 through 109; they are available in the full transcript.
In the structural class probability plot, we see that the alpha
superclass has a probability of about 0.4, the irregular superclass has
a probability above 0.3, alpha-beta superclass probability is slightly
above 0.2, and the beta superclass has a probability near zero. This
means that psa-request is confident that the protein sequence
has properties that tend to rule out the all-beta superclass.
Furthermore, the alpha superclass is slightly more probable than the
other two candidates, but not by much.
Looking at the macroclass probabilities, we see that the apb (antiparallel bundle) macroclass is
more probable than any other macroclass, but that ir (general irregular) runs a close
second.
This plot provides a detailed view of secondary-structure
probabilities. Each row corresponds to a different secondary structural
state, and each column corresponds to a different residue position.
The probabilities of each residue being in each of the
structural states are depicted using contour lines of constant
probability in increments of 0.1. Areas surrounded by many contour
lines are regions of high probability, while areas outside of the
contours have low probabilities of less than 0.1.
The structural states are shown in four groups, with horizontal lines
drawn for each row that denotes a structural state, and empty rows in
between the groups. The first group has the three beta strand states:
The four turn states are numbered 1 through 4 from the bottom up
(with the "2" on the "TURN" line implicit). These
denote a tight turn in the secondary structure, and usually appear in
the Type-1 models as part of a beta hairpin structure. Turn states are
comparatively rare in the Type-1 models overall, so their probabilities
are usually low. However, they are shown in order to aid researchers
who have other reasons to suspect that their sequence includes a turn,
i.e. evidence of a hairpin. See the
"Summary of Type-1 DSMs" table for more information on the typical
number of turns included in each of the Type-1 models.
Next, the helix-buried and helix-exposed states are displayed
together in the same way as for strands, except that there is no helix
state for "undeclared" exposure. The advantage to showing the
helix-exposed and helix-buried states separately in this format is that
amphipathic helices show up clearly as an easily-recognizable pattern of
high probability that alternates between these two states, with a clear
periodicity of 3.6 residues per helical winding. In this example, the
sequence is predicted to be mostly amphipathic helix, but there are two
breaks in the pattern, so it looks like three helices.
Finally, the loop state appears across the bottom. There is only one
type of loop as far as the DSMs are concerned, so this "group" uses only
one row. The loop state is defined loosely as "none of the above,"
i.e. if it's neither helix nor strand nor turn, it must be a loop. As a
consequence of this definition, the probabilities in each row of the transcript add up to one. Since the Type-1 DSMs
were constructed to reflect the fact that proteins are roughly half
secondary structure and half loop, the loop probability is usually above
ten percent, and often above 50 percent, especially near the start and
end of the sequence.
For example, the 40th residue has probability between 0.7 and 0.8 of
being in a loop, because there are seven contour lines surrounding the
point on the loop row for this residue. The actual value is 0.709, as
can be gleaned from the full
transcript.
Looking at the plot as a whole, three amphipathic helices are clearly
depicted with their buried and exposed residues (there might be four if
one considers the middle helix to consist of two helices with minimal
transition between them). In contrast, the strand state is improbable:
every point along the strand row has no more than two contour lines
around it.
The probability contours in this plot, together with the reasonable
hypothesis that the protein belongs to the alpha parallel-bundle class
(which has four helices of approximately equal length), support the
following summary prediction of the protein's secondary structural
topology:
For reference, here are the DSSP secondary structure
assignments for PDB locus
2mhr:
The three graphs in this plot show the probabilities for each residue
position being in a strand, turn, or helix. This is the same
information as in the transcript section of
the cover letter, which provides exact values for all residues, and
a subset of the information in the
secondary-structure probabilities plot, which also breaks the helix
and strand probabilities down by exposure.
For example, the 20th residue has a probability of greater than 0.9
of being in a helix, and negligible (< 0.02) of being in a turn or
strand. The remaining probability, about 0.03, is the probability of
being in a loop state. (The exact values for all residues are included
in the cover letter.) The undulating
helical probabilities in the third graph support the presence of four
helices of nearly equal lengths connected to each other by short loops
or turns.
Go to:
Please direct your questions and comments about these Web pages and
the PSA e-mail server to:
Structural Class Probabilities
Secondary-Structure Probabilities
buried
STRAND
exposed
Some strands appear in the model as amphipathic, with alternating
exposed (to the solvent) and buried (not exposed to the solvent) states.
[The other strand state on the middle line is called "average strand";
that is probably for strands of "undeclared" exposure, using "average"
statistics, but I'm not certain of this. Looking at this plot, it would
appear to be the sum of the other two rows. -- rgr, 27-Sep-00.]
(Loop)-(amph. helix)-(short loop)-(amph. helix)-(short loop or
turn)- (amph. helix)-(short loop)-(amph. helix)-(short loop)
SS start end Helix 1 19 37 Helix 2 41 64 Helix 3 70 85 Helix 4 93 109
Strand/Turn/Helix Probabilities
Bob Rogers
<rogers@darwin.bu.edu>
Last modified: Mon Mar 12 13:29:55 EST 2001
BioMolecular Engineering Research
Center
Boston University, Boston Massachusetts