This example uses the same sequence as in the "Example of an E-Mail Request" section, but requests Type-2 analysis instead of Type-1 analysis (which is shown on the "Example of Type-1 analysis" page). Click each plot to view a PDF version locally at higher resolution.
Requesting e-mail message
This shows the e-mail message as it would be composed by the user. The
WWW interface also generates something that
looks like an email message internally, but the user only sees this as
an attachment to the acknowledgement message.
To: psa-request@darwin.bu.edu
Subject: Seq 23
Analysis-assumptions: minimal
; psa-plot-format: postscript
; Wilson Brandlesnarf
; BMERC
; Boston MA
; 617-353-7123
Sequence 23
GWEIPEPYVWDESFRVFYEQLDEEHKKIFKGIFDCIRDNSAPNLATLVKV
TTNHFTHEEAMMDAAKYSEVVPHKKMHKDFLEKIGGLSAPVDAKNVDYCK
EWLVNHIKGTDFKYKGKL
Regards, Wilson
Except for the :Analysis-assumptions:" line, this is the same
message as illustrated in the "Example of an E-Mail Request"
section; see there for an explanation of the syntax of e-mail messages.
The first three lines are part of the e-mail header (it probably looks
different in every system ever written for composing e-mail, so your
system is unlikely to be an exception).
It also includes the request ID assigned by the server upon receipt,
a statement of the fact that the request was for a Type-2 analysis, and
an indication of the server queue size. If the sequence length and/or
label are not as expected, it could mean that the server had trouble
parsing the message; in that case, please recheck and try again. If the
message does not explicitly state that the request is for a Type-2
analysis run (i.e. it looks like an e-mail acknowledgement for a
Type-1 analysis request), then the server was unable to parse the "Analysis-Assumptions:"
field and started a Type-1 analysis by
default. (This sort of confusion should never happen for requests
submitted via Web.)
This section covers the first e-mail message returned to the user
when the analysis is complete. Since it is fairly large, we abbreviate
and break it into pieces for purposes of discussion; click here to see
the full text of the cover letter.
After the initial "announcement" paragraph, there are several
paragraphs explaining the other messages, and how to view the plots; we
have omitted those here.
Finally, the transcript from the compute engine is included.
E-mail acknowledgement from the server
The acknowledgement consists mostly of an echo of the original mail
message (together with whatever e-mail headers were added in transit).
From: psa@darwin.bu.edu (Protein Structure Analysis server)
To: wb@darwin.bu.edu
Subject: Received request 14757: [Seq 23]
Date: Wed, 18 Nov 1998 17:59:24 -0500
We have received your request dated "Wed, 18 Nov 1998 17:58:25 -0500"
containing an amino acid sequence of 118 residues labelled "Sequence
23" for a protein structure analysis run; it has been queued as
request number 14757. There are no requests ahead of it in the queue.
Note: You are getting the new Type 2 models because you requested them
explicitly. If this is not what you expected, change the
"analysis-assumptions:" line in the message header to read:
analysis-assumptions: monomeric-soluble
starting in column one. (Alphabetic case does not matter.)
--------------------------- Original message ---------------------------
Date: Wed, 18 Nov 1998 17:58:25 -0500
Message-Id: <199811182258.RAA15982@gamow>
From: Wilson Brandlesnarf <wb@darwin.bu.edu>
To: psa-request@darwin.bu.edu
Subject: Seq 23
Analysis-assumptions: minimal
; psa-plot-format: postscript
; Wilson Brandlesnarf
; BMERC
; Boston MA
; 617-353-7123
Sequence 23
GWEIPEPYVWDESFRVFYEQLDEEHKKIFKGIFDCIRDNSAPNLATLVKV
TTNHFTHEEAMMDAAKYSEVVPHKKMHKDFLEKIGGLSAPVDAKNVDYCK
EWLVNHIKGTDFKYKGKL
Regards, Wilson
E-mail results cover letter
From: psa@darwin.bu.edu (Protein Structure Analysis server)
To: wb@darwin.bu.edu
Subject: Request 14757 result (1 of 3): [Seq 23]
Date: Wed, 18 Nov 1998 18:00:19 -0500
The analysis of your protein sequence has been completed. A search of
the Protein Data Bank, using Blast, indicates that your sequence is
similar to the proteins 1A7D (length 118), 2MHR (length 118), 1A7E
(length 118), 1HRB (length 113), 2HMQA (length 113), 2HMZA (length
113), 1HMDA (length 113), and 1HMOA (length 113), which all have known
structures. The following analysis results were generated without
reference to these known structures or any of their known homologs.
Note how the server has caught the fact that we have submitted a
sequence of known structure in order to test the server;
"Sequence 23" is in fact the sequence of PDB locus
2mhr.
------------------------------- Sequence -------------------------------
; This is the actual sequence used.
Sequence 23
GWEIP EPYVW DESFR VFYEQ LDEEH KKIFK GIFDC IRDNS APNLA TLVKV
TTNHF THEEA MMDAA KYSEV VPHKK MHKDF LEKIG GLSAP VDAKN VDYCK
EWLVN HIKGT DFKYK GKL1
Following the text, the sequence is echoed in the form used by the
server software.
Analyzing Sequence 23. This is 18-Nov-98 (17:57:38).
The sequence contains 118 residues.
Using the Type-2 DSM library.
2 Type-2 DSMs are available for analyzing this sequence.
FILTERING RESULTS:
Model generic has probability 0.96541
Model mem_span has probability 0.034587
Secondary-Structure Probabilities:
RESIDUE LOOP HELIX TURN STRAND
1 0.400 0.205 0.140 0.255
2 0.419 0.226 0.062 0.292
3 0.415 0.187 0.126 0.272
4 0.525 0.132 0.209 0.133
5 0.490 0.148 0.261 0.101
6 0.570 0.112 0.211 0.107
7 0.448 0.199 0.120 0.233
8 0.404 0.235 0.094 0.267
9 0.313 0.375 0.089 0.222
10 0.349 0.396 0.126 0.129
11 0.275 0.486 0.125 0.115
12 0.252 0.516 0.097 0.135
13 0.177 0.594 0.056 0.173
14 0.173 0.624 0.024 0.180
15 0.155 0.643 0.016 0.186
16 0.185 0.648 0.019 0.148
17 0.128 0.734 0.027 0.112
18 0.128 0.771 0.035 0.066
19 0.107 0.819 0.031 0.043
20 0.090 0.852 0.022 0.036
. . .
110 0.536 0.100 0.126 0.238
111 0.404 0.110 0.096 0.390
112 0.374 0.114 0.073 0.439
113 0.323 0.111 0.123 0.444
114 0.324 0.110 0.192 0.373
115 0.386 0.110 0.194 0.310
116 0.364 0.190 0.117 0.329
117 0.347 0.238 0.069 0.346
118 0.360 0.199 0.052 0.389
End of Log file for Sequence 23.
The transcript gives exact values (as opposed to reading the plot) for
the "generic" versus the "mem_span" model probabilities and
secondary-structure probabilities (shown graphically in the secondary-structure probability plot). The models
themselves are described in more detail on the "Description of Type-2 DSMs" page.
For brevity, we omit the secondary-structure probabilities for residues 21 through 109.
Note that, unlike Type-1 analysis, there is no structural class probability plot, as there are only two models, neither of which denotes a specific tertiary structure.
This plot provides a detailed view of secondary-structure probabilities. Each row corresponds to a different secondary structural state, and each column corresponds to a different residue position. (See the "Secondary-Structure Probabilities" section of the Type-1 example for a detailed discussion of the secondary structural state used by the psa-request DSMs.) The probabilities of each residue being in each of the structural states are depicted using contour lines of constant probability in increments of 0.1. Areas surrounded by many contour lines are regions of high probability, while areas outside of the contours have low probabilities of less than 0.1.
For example, the 40th residue has probability slightly over 0.5 of being in a loop, because there are five contour lines surrounding the point on the loop row for this residue, though it lies very close to the innermost line.
For reference, here are the DSSP secondary structure assignments for PDB locus 2mhr:
SS start end Helix 1 19 37 Helix 2 41 64 Helix 3 70 85 Helix 4 93 109
Notice how this sequence's secondary structure prediction for Type-2 analysis is noticeably different from the secondary-structure probabilities in the Type-1 example. This is because all secondary structure predictions are made in light of the most probable DSM; since the "generic" DSM supports nearly arbitrary combinations of secondary structure, while the Type-1 DSMs are constructed for known folding classes, the Type-1 secondary structure prediction is necessarily much more constrained. In this case, since the answer is known, the Type-1 prediction is also much more nearly correct. In the Type-2 plot, we see two out of the four amphipathic helices, the first and third, depicted clearly with their buried and exposed residues. The second and fourth helices are visible, with the start of the second helix apparently shifted left somewhat. This shift would represent an improvement if not for the fact that the loop state gets greater weight for the first three-quarters of the length of the second helix, and hence nearly misses the second helix altogether.
In contrast, the strand state is still considered rather improbable, except for an odd suggestion of a single amphipathic strand at the very end of the sequence. This serves as another example of how Type-1 analysis is in general preferable, since the Type-2 models do not describe specific structures and hence it is not possible to rule out such spurious isolated strands. However, if the requirements for Type-1 analysis as described in the "Overview" section are not met, it is quite possible, even likely, that no Type-1 DSM is correct for the sequence, in which case the secondary structure predictions would be made with the wrong underlying structural assumption.
These plots show the probabilities for each residue position being in
a strand, turn, or helix. This is the same information as in the transcript section of the cover letter, which
provides exact values for all residues, and a subset of the information
in the secondary-structure probabilities plot,
which also breaks the helix and strand probabilities down by exposure.
For example, the 20th residue has a probability of greater than 0.8
of being in a helix, and negligible (<= 0.03) of being in a turn or
strand. The remaining probability, about 0.09, is the probability of
being in a loop state. (The exact values for all residues are included
in the cover letter.) The undulating
helical probabilities in the third graph support the presence of at
least three helices; strands are not likely, but not completely ruled
out, either.
Go to:
Please direct your questions and comments about these Web pages and
the PSA e-mail server to:
Strand/Turn/Helix Probabilities
Bob Rogers
<rogers@darwin.bu.edu>
Last modified: Mon Mar 12 13:30:03 EST 2001
BioMolecular Engineering Research
Center
Boston University, Boston Massachusetts