Example of Type-1 analysis

BMERC : psa-request : Server results : Type-1 example

These are for the example in the "Example of an E-Mail Request" section. Click each plot to view a PDF version locally at higher resolution.

  1. Example of Type-1 analysis
    1. Submitted e-mail message
    2. E-mail acknowledgement from the server
    3. E-mail results cover letter
    4. Structural Class Probabilities
    5. Secondary-Structure Probabilities
    6. Strand/Turn/Helix Probabilities


Requesting e-mail message

This shows the e-mail message as it would be composed by the user. The
WWW interface also generates something that looks like an email message internally, but the user only sees this as an attachment to the acknowledgement message.
    To: psa-request@darwin.bu.edu
    Subject: Seq 23
    Analysis-assumptions: monomeric-soluble

    ; psa-plot-format: postscript
    ; Wilson Brandlesnarf
    ; BMERC
    ; Boston MA
    ; 617-353-7123
    Sequence 23
    GWEIPEPYVWDESFRVFYEQLDEEHKKIFKGIFDCIRDNSAPNLATLVKV
    TTNHFTHEEAMMDAAKYSEVVPHKKMHKDFLEKIGGLSAPVDAKNVDYCK
    EWLVNHIKGTDFKYKGKL

    Regards, Wilson
This is the same message as illustrated in the "Example of an E-Mail Request" section; see there for an explanation of the syntax of e-mail messages. The first three lines are part of the e-mail header (it probably looks different in every system ever written for composing e-mail, so your system is unlikely to be an exception).


E-mail acknowledgement from the server

The acknowledgement consists mostly of an echo of the original mail message (together with whatever e-mail headers were added in transit).

    From: psa@darwin.bu.edu (Protein Structure Analysis server)
    To: wb@darwin.bu.edu
    Subject: Received request 14756: [Seq 23]
    Date: Wed, 18 Nov 1998 17:52:50 -0500

    We have received your request dated "Wed, 18 Nov 1998 17:52:38 -0500"
    containing an amino acid sequence of 118 residues labelled "Sequence
    23" for a protein structure analysis run; it has been queued as
    request number 14756.  There are no requests ahead of it in the queue.

    --------------------------- Original message ---------------------------
    Date: Wed, 18 Nov 1998 17:52:38 -0500
    Message-Id: <199811182252.RAA15950@gamow>
    From: Wilson Brandlesnarf <wb@darwin.bu.edu>
    To: psa-request@darwin.bu.edu
    Subject: Seq 23
    Analysis-assumptions: monomeric-soluble

    ; psa-plot-format: postscript
    ; Wilson Brandlesnarf
    ; BMERC
    ; Boston MA
    ; 617-353-7123
    Sequence 23
    GWEIPEPYVWDESFRVFYEQLDEEHKKIFKGIFDCIRDNSAPNLATLVKV
    TTNHFTHEEAMMDAAKYSEVVPHKKMHKDFLEKIGGLSAPVDAKNVDYCK
    EWLVNHIKGTDFKYKGKL

    Regards, Wilson

It also includes the request ID assigned by the server upon receipt, and an indication of the server queue size. If the sequence length and/or label are not as expected, it could mean that the server had trouble parsing the message; in that case, please recheck and try again.


E-mail results cover letter

This section covers the first e-mail message returned to the user when the analysis is complete. Since it is fairly large, we break it into pieces for purposes of discussion; click here to see the full text of the cover letter.


    From: psa@darwin.bu.edu (Protein Structure Analysis server)
    To: wb@darwin.bu.edu
    Subject: Request 14756 result (1 of 4): [Seq 23]
    Date: Wed, 18 Nov 1998 17:55:15 -0500

    The analysis of your protein sequence has been completed.  A search of
    the Protein Data Bank, using Blast, indicates that your sequence is
    similar to the proteins 1A7D (length 118), 2MHR (length 118), 1A7E
    (length 118), 1HRB (length 113), 2HMQA (length 113), 2HMZA (length
    113), 1HMDA (length 113), and 1HMOA (length 113), which all have known
    structures.  The following analysis results were generated without
    reference to these known structures or any of their known homologs.
Note how the server has caught the fact that we have submitted a sequence of known structure in order to test the server; "Sequence 23" is in fact the sequence of PDB locus 2mhr.

After the initial "announcement" paragraph, there are several paragraphs explaining the other messages, and how to view the plots; we have omitted those here.


    ------------------------------- Sequence -------------------------------
    ; This is the actual sequence used.
    Sequence 23
    GWEIP EPYVW DESFR VFYEQ LDEEH KKIFK GIFDC IRDNS APNLA TLVKV
    TTNHF THEEA MMDAA KYSEV VPHKK MHKDF LEKIG GLSAP VDAKN VDYCK
    EWLVN HIKGT DFKYK GKL1
Following the text, the sequence is echoed in the form used by the server software.

Finally, the transcript from the compute engine is included.


    ------------------------------ Transcript ------------------------------
    Analyzing Sequence 23. This is 18-Nov-98 (17:51:3).

    Using the Type-1 DSM library mdata11o.

    The sequence contains 118 residues.

    30 length-compatible Type-1 DSMs are available for analyzing this sequence.

    FILTERING RESULTS:
    3 Most Probable Super Classes:
    1st Superclass alpha      has probability 0.41285
    2nd Superclass irregular  has probability 0.32734
    3rd Superclass alpha-beta has probability 0.23807

    3 Most Probable Macro Classes:
    1st Macroclass apb        has probability 0.41285
    2nd Macroclass ir         has probability 0.32734
    3rd Macroclass sab        has probability 0.15959


    Secondary-Structure Probabilities:

		 RESIDUE      LOOP     HELIX      TURN    STRAND
		       1         1         0         0         0
		       2     0.995     0.000         0     0.004
		       3     0.967     0.012         0     0.021
		       4     0.953     0.017         0     0.030
		       5     0.945     0.022     0.001     0.032
		       6     0.913     0.038     0.005     0.043
		       7     0.878     0.053     0.005     0.063
		       8     0.754     0.127     0.007     0.112
		       9     0.687     0.183     0.010     0.119
		      10     0.483     0.386     0.022     0.109
		      11     0.436     0.437     0.077     0.050
		      12     0.281     0.592     0.090     0.037
		      13     0.189     0.683     0.091     0.038
		      14     0.163     0.709     0.076     0.053
		      15     0.151     0.727     0.024     0.097
		      16     0.136     0.743     0.012     0.109
		      17     0.117     0.767     0.012     0.104
		      18     0.073     0.837     0.016     0.074
		      19     0.052     0.883     0.014     0.052
		      20     0.033     0.928     0.016     0.024
                     . . . 
		     110     0.756     0.161     0.052     0.031
		     111     0.808     0.104     0.026     0.062
		     112     0.850     0.055     0.014     0.081
		     113     0.875     0.030     0.004     0.091
		     114     0.893     0.018     0.003     0.087
		     115     0.913     0.011     0.004     0.071
		     116     0.947     0.003     0.004     0.046
		     117     0.967     0.001     0.002     0.030
		     118         1         0         0         0

    End of Log file for Sequence 23.
The transcript gives exact values (as opposed to reading the plots) for the three most probable superclasses and macroclasses (shown graphically in the
structural class probability plot), and secondary-structure probabilities (shown graphically in the secondary-structure probability plot). The superclasses and macroclasses themselves are described in more detail on the "Description of Type-1 DSMs" page.

For brevity, we omit here the secondary-structure probabilities for residues 21 through 109; they are available in the full transcript.


Structural Class Probabilities

Two bar charts (7Kb)

In the structural class probability plot, we see that the alpha superclass has a probability of about 0.4, the irregular superclass has a probability above 0.3, alpha-beta superclass probability is slightly above 0.2, and the beta superclass has a probability near zero. This means that psa-request is confident that the protein sequence has properties that tend to rule out the all-beta superclass. Furthermore, the alpha superclass is slightly more probable than the other two candidates, but not by much.

Looking at the macroclass probabilities, we see that the apb (antiparallel bundle) macroclass is more probable than any other macroclass, but that ir (general irregular) runs a close second.


Secondary-Structure Probabilities

Contour plot of probabilities (21Kb)

This plot provides a detailed view of secondary-structure probabilities. Each row corresponds to a different secondary structural state, and each column corresponds to a different residue position. The probabilities of each residue being in each of the structural states are depicted using contour lines of constant probability in increments of 0.1. Areas surrounded by many contour lines are regions of high probability, while areas outside of the contours have low probabilities of less than 0.1.

The structural states are shown in four groups, with horizontal lines drawn for each row that denotes a structural state, and empty rows in between the groups. The first group has the three beta strand states:

    buried
    STRAND
    exposed
Some strands appear in the model as amphipathic, with alternating exposed (to the solvent) and buried (not exposed to the solvent) states. [The other strand state on the middle line is called "average strand"; that is probably for strands of "undeclared" exposure, using "average" statistics, but I'm not certain of this. Looking at this plot, it would appear to be the sum of the other two rows. -- rgr, 27-Sep-00.]

The four turn states are numbered 1 through 4 from the bottom up (with the "2" on the "TURN" line implicit). These denote a tight turn in the secondary structure, and usually appear in the Type-1 models as part of a beta hairpin structure. Turn states are comparatively rare in the Type-1 models overall, so their probabilities are usually low. However, they are shown in order to aid researchers who have other reasons to suspect that their sequence includes a turn, i.e. evidence of a hairpin. See the "Summary of Type-1 DSMs" table for more information on the typical number of turns included in each of the Type-1 models.

Next, the helix-buried and helix-exposed states are displayed together in the same way as for strands, except that there is no helix state for "undeclared" exposure. The advantage to showing the helix-exposed and helix-buried states separately in this format is that amphipathic helices show up clearly as an easily-recognizable pattern of high probability that alternates between these two states, with a clear periodicity of 3.6 residues per helical winding. In this example, the sequence is predicted to be mostly amphipathic helix, but there are two breaks in the pattern, so it looks like three helices.

Finally, the loop state appears across the bottom. There is only one type of loop as far as the DSMs are concerned, so this "group" uses only one row. The loop state is defined loosely as "none of the above," i.e. if it's neither helix nor strand nor turn, it must be a loop. As a consequence of this definition, the probabilities in each row of the transcript add up to one. Since the Type-1 DSMs were constructed to reflect the fact that proteins are roughly half secondary structure and half loop, the loop probability is usually above ten percent, and often above 50 percent, especially near the start and end of the sequence.

For example, the 40th residue has probability between 0.7 and 0.8 of being in a loop, because there are seven contour lines surrounding the point on the loop row for this residue. The actual value is 0.709, as can be gleaned from the full transcript.

Looking at the plot as a whole, three amphipathic helices are clearly depicted with their buried and exposed residues (there might be four if one considers the middle helix to consist of two helices with minimal transition between them). In contrast, the strand state is improbable: every point along the strand row has no more than two contour lines around it.

The probability contours in this plot, together with the reasonable hypothesis that the protein belongs to the alpha parallel-bundle class (which has four helices of approximately equal length), support the following summary prediction of the protein's secondary structural topology:

(Loop)-(amph. helix)-(short loop)-(amph. helix)-(short loop or turn)- (amph. helix)-(short loop)-(amph. helix)-(short loop)

For reference, here are the DSSP secondary structure assignments for PDB locus 2mhr:

SSstartend
Helix 11937
Helix 24164
Helix 37085
Helix 493109


Strand/Turn/Helix Probabilities

3 x-y plots (12K)

The three graphs in this plot show the probabilities for each residue position being in a strand, turn, or helix. This is the same information as in the transcript section of the cover letter, which provides exact values for all residues, and a subset of the information in the secondary-structure probabilities plot, which also breaks the helix and strand probabilities down by exposure.

For example, the 20th residue has a probability of greater than 0.9 of being in a helix, and negligible (< 0.02) of being in a turn or strand. The remaining probability, about 0.03, is the probability of being in a loop state. (The exact values for all residues are included in the cover letter.) The undulating helical probabilities in the third graph support the presence of four helices of nearly equal lengths connected to each other by short loops or turns.


Go to:


Please direct your questions and comments about these Web pages and the PSA e-mail server to:

Bob Rogers <rogers@darwin.bu.edu>
BioMolecular Engineering Research Center
Boston University, Boston Massachusetts
Last modified: Mon Mar 12 13:29:55 EST 2001