Example of WD-repeat analysis

BMERC : psa-request : Server results : WD-repeat example

This is similar to the example in the "Example of an E-Mail Request" section, but uses a different sequence that does in fact contain a WD repeat (it is the SwissProt locus CAFA_HUMAN, which has a a region of four WD repeats) and requests a WD-repeat analysis.

  1. Example of WD-repeat analysis
    1. Requesting e-mail message
    2. E-mail acknowledgement from the server
    3. E-mail results cover letter
    4. Structural Class Probabilities
    5. Beta propeller core model


Requesting e-mail message

This shows the e-mail message as it would be composed by the user. The
WWW interface also generates something that looks like an email message internally, but the user only sees this as an attachment to the acknowledgement message.
    To: psa-request@darwin.bu.edu
    Subject: Seq 42

    ; analysis-assumptions: wd-repeat
    ; Wilson Brandlesnarf
    ; BMERC
    ; Boston MA
    ; 617-353-7123
    Sequence 42
    MKVITCEIAWHNKEPVYSLDFQHGTAGRIHRLASAGVDTNVRIWKVEKGP
    DGKAIVEFLSNLARHTKAVNVVRFSPTGEILASGGDDAVILLWKVNDNKE
    PEQIAFQDEDEAQLNKENWTVVKTLRGHLEDVYDICWATDGNLMASASVD
    NTAIIWDVSKGQKISIFNEHKSYVQGVTWDPLGQYVATLSCDRVLRVYSI
    QKKRVAFNVSKMLSGIGAEGEARSYRMFHDDSMKSFFRRLSFTPDGSLLL
    TPAGCVESGENVMNTTYVFSRKNLKRPIAHLPCPGKATLAVRCCPVYFEL
    RPVVETGVELMSLPYRLVFAVASEDSVLLYDTQQSFPFGYVSNIHYHTLS
    DISWSSDGAFLAISSTDGYCSFVTFEKDELGIPLKEKPVLNMRTPDTAKK
    TKSQTHRGSSPGPRPVEGTPASRTQDPSSPGTTPPQARQAPAPTVIRDPP
    SITPAVKSPLPGPSEEKTLQPSSQNTKAHPSRRVTLNTLQAWSKTTPRRI
    NLTPLKTDTPPSSVPTSVISTPSTEEIQSETPGDAQGSPPELKRPRLDEN
    KGGTESLDP
See the "Example of an E-Mail Request" section for an explanation of the syntax of e-mail messages. The first two lines are the e-mail header (it probably looks different in every system ever written for composing e-mail, so your system is unlikely to be an exception).


E-mail acknowledgement from the server

The acknowledgement consists mostly of an echo of the original mail message (together with whatever e-mail headers were added in transit).

    From: psa@darwin.bu.edu (Protein Structure Analysis server)
    To: wb@darwin.bu.edu
    Subject: Received request 425: [Seq 42]
    Date: Tue, 5 Jan 1999 18:02:28 -0500

    We have received your request dated "Tue, 5 Jan 1999 18:02:16 -0500"
    containing an amino acid sequence of 559 residues labelled "Sequence
    42" for a WD repeat analysis run; it has been queued as request number
    425.  There are no requests ahead of it in the queue.

    --------------------------- Original message ---------------------------
    Date: Tue, 5 Jan 1999 18:02:16 -0500
    Message-Id: <199901052302.SAA15340@gamow>
    From: Wilson Brandlesnarf <wb@darwin.bu.edu>
    To: psa-request@darwin.bu.edu
    Subject: Seq 42

    ; analysis-assumptions: wd-repeat
    ; Wilson Brandlesnarf
    ; BMERC
    ; Boston MA
    ; 617-353-7123
    Sequence 42
    MKVITCEIAWHNKEPVYSLDFQHGTAGRIHRLASAGVDTNVRIWKVEKGP
    DGKAIVEFLSNLARHTKAVNVVRFSPTGEILASGGDDAVILLWKVNDNKE
    PEQIAFQDEDEAQLNKENWTVVKTLRGHLEDVYDICWATDGNLMASASVD
    NTAIIWDVSKGQKISIFNEHKSYVQGVTWDPLGQYVATLSCDRVLRVYSI
    QKKRVAFNVSKMLSGIGAEGEARSYRMFHDDSMKSFFRRLSFTPDGSLLL
    TPAGCVESGENVMNTTYVFSRKNLKRPIAHLPCPGKATLAVRCCPVYFEL
    RPVVETGVELMSLPYRLVFAVASEDSVLLYDTQQSFPFGYVSNIHYHTLS
    DISWSSDGAFLAISSTDGYCSFVTFEKDELGIPLKEKPVLNMRTPDTAKK
    TKSQTHRGSSPGPRPVEGTPASRTQDPSSPGTTPPQARQAPAPTVIRDPP
    SITPAVKSPLPGPSEEKTLQPSSQNTKAHPSRRVTLNTLQAWSKTTPRRI
    NLTPLKTDTPPSSVPTSVISTPSTEEIQSETPGDAQGSPPELKRPRLDEN
    KGGTESLDP

It also includes the request ID assigned by the server upon receipt, a statement of the fact that the request was for a WD-repeat analysis, and an indication of the server queue size. If the sequence length and/or label are not as expected, it could mean that the server had trouble parsing the message; in that case, please try again. If the message does not explicitly state that the request is for a WD-repeat analysis run (i.e. it looks like an e-mail acknowledgement for a Type-1 analysis request), then the server was unable to parse the "Analysis-Assumptions:" field and started a Type-1 analysis by default. (This sort of confusion should never happen for requests submitted via Web.)


E-mail results cover letter

This section covers the first e-mail message returned to the user when the analysis is complete. Since it is fairly large, we break it into pieces for purposes of discussion; click here to see the full text of the cover letter.


    From: psa@darwin.bu.edu (Protein Structure Analysis server)
    To: wb@darwin.bu.edu
    Subject: Request 425 result (1 of 3): [Seq 42]
    Date: Tue, 5 Jan 1999 18:05:46 -0500

    The analysis of your protein sequence has been completed.

    The tertiary class and profile probabilities we computed for your
    sequence are appended below.  The profile probabilities "profile1" and
    "profile2" shown below identify residues that match the two diagnostic
    profiles, to which the regular expression on the WD-repeat protein
    webpage at http://bmerc-www.bu.edu/wdrepeat/ is an approximation.  A
    page of additional graphical output, in PostScript format, is being
    sent to you in an additional e-mail message.  This plot shows the
    tertiary-class probability distributions, indicating the degree to
    which the psa-request server believes that the sequence you submitted
    could be a WD-repeat and how many repeats it believes the sequence
    has.  The final message contains a core format file with backbone
    coordinates for the sequence as a beta propeller.  For more
    information, please see the WD repeat example on the
    http://bmerc-www.bu.edu/psa/wd-example.htm page.
The initial "announcement" paragraph would have warned of PDB homologs if any had been found; see the "E-mail results cover letter" section of the "Example of Type-1 analysis" page. After the announcement paragraph, there are several paragraphs explaining the other messages, and how to view the plots; we have omitted some of those here.

    -------------------------- Original sequence ---------------------------
    ; This is the actual sequence used.
    Sequence 42
    MKVITCEIAW HNKEPVYSLD FQHGTAGRIH RLASAGVDTN VRIWKVEKGP
    DGKAIVEFLS NLARHTKAVN VVRFSPTGEI LASGGDDAVI LLWKVNDNKE
    PEQIAFQDED EAQLNKENWT VVKTLRGHLE DVYDICWATD GNLMASASVD
    NTAIIWDVSK GQKISIFNEH KSYVQGVTWD PLGQYVATLS CDRVLRVYSI
    QKKRVAFNVS KMLSGIGAEG EARSYRMFHD DSMKSFFRRL SFTPDGSLLL
    TPAGCVESGE NVMNTTYVFS RKNLKRPIAH LPCPGKATLA VRCCPVYFEL
    RPVVETGVEL MSLPYRLVFA VASEDSVLLY DTQQSFPFGY VSNIHYHTLS
    DISWSSDGAF LAISSTDGYC SFVTFEKDEL GIPLKEKPVL NMRTPDTAKK
    TKSQTHRGSS PGPRPVEGTP ASRTQDPSSP GTTPPQARQA PAPTVIRDPP
    SITPAVKSPL PGPSEEKTLQ PSSQNTKAHP SRRVTLNTLQ AWSKTTPRRI
    NLTPLKTDTP PSSVPTSVIS TPSTEEIQSE TPGDAQGSPP ELKRPRLDEN
    KGGTESLDP1
Following the text, the sequence is echoed in the form used by the server software.

The next two sections are included only if the sequence is found to contain a WD repeat. The first such section gives the predicted structure of the WD repeat in the same tabular format as on the pages describing members of the WD repeat family. (In fact, since this is really the SwissProt locus CAFA_HUMAN, this very alignment, in slightly different form, appears there on the CAFA_HUMAN WD repeat page.)


    --------------------- Predicted WD-repeat structure --------------------

    Sequence42      [  1]  MKVITCEIAW
                                   ------            ------       ------  
    Sequence42.1    [ 11]  HNKEPV  YSLDFQ  HGTAGRIH  RLASAG  VDT  NVRIWK  VEKGPDGKAI
                    [ 56]  VEFLSNLA
    Sequence42.2    [ 64]  RHTKAV  NVVRFS  PTGE      ILASGG  DDA  VILLWK  VNDNKEPEQI
                    [105]  AFQDEDEAQLNKENWTVVKTLR
    Sequence42.3    [127]  GHLEDV  YDICWA  TDGN      LMASAS  VDN  TAIIWD  VSKGQKISIF
                    [168]  N
    Sequence42.4    [169]  EHKSYV  QGVTWD  PLGQ      YVATLS  CDR  VLRVYS  IQKKRVAFNV
                    [210]  SKMLSGIGAEGEARSYRMFHDDSMKSFFRRLSFTPDGSLLLTPAGCVESG
                    [260]  ENVMNTTYVFSRKNLKRPIAHLPCPGKATLAVRCCPVYFELRPVVETGVE
                    [310]  LMSLPYRLVFAVASEDSVLLYDTQQSFPFGYVSNIHYHTLSDISWSSDGA
                    [360]  FLAISSTDGYCSFVTFEKDELGIPLKEKPVLNMRTPDTAKKTKSQTHRGS
                    [410]  SPGPRPVEGTPASRTQDPSSPGTTPPQARQAPAPTVIRDPPSITPAVKSP
                    [460]  LPGPSEEKTLQPSSQNTKAHPSRRVTLNTLQAWSKTTPRRINLTPLKTDT
                    [510]  PPSSVPTSVISTPSTEEIQSETPGDAQGSPPELKRPRLDENKGGTESLDP
                                   ------            ------       ------  

    Residues in columns marked with "------" are predicted to fold into
    beta-strands.
Each line starting with the sequence name followed by ".n" describes the nth repeat. Each repeat sequence is broken into columns that describe the structure of the repeat; leading, trailing, and inter-repeat loops are wrapped as necessary to maintain readability. The first three predicted strands of each blade are shown in the second, fourth, and sixth sequence columns, identified by sets of six dashes ("------") placed above and below the columns. The columns are related to the profiles as follows: The exact position of the fourth strand varies within the seventh column, and is not predicted by the profiles.

Finally, an attempt is made to find sequence similarities that do not involve the WD-repeat motif itself between the submitted sequence and other known WD repeat proteins.


    ----------------------- Similar WD repeat proteins ---------------------

    Identification of the WD repeat 'domain' implicitly divides the
    protein into three fragments:  the WD repeat region itself, and the
    subsequences before and after it.  Regions that were at least 40
    residues in length were used independently to search a BLASTP database
    of corresponding regions in known WD repeat proteins.  The conserved
    portions of the WD repeat region itself were shadowed with X's in
    order to search for sequences with similar loops, disregarding the
    conserved repeat region.  For more information, please see the WD
    repeat example on the http://bmerc-www.bu.edu/psa/wd-example.htm page.

    The BLASTP search results are as follows:

       * The amino subsequence is too small for BLASTP searching (less
    than 40 residues).

       * The loop subsequence (length 189) has no BLASTP matches.

       * The carboxy subsequence (length 349) matches CAFA_HUMAN (length
    360) with a score of 1825 (P=1.7e-191).

    This may help to characterize the protein.
Once the sequence is identified as a WD repeat, one can splice out the non-WD-repeat leader, trailer, and loop portions, and use those to search a database made from the corresponding pieces of known WD-repeat sequences. This is done for subsequences that are at least 40 residues in length; shorter sequences are presumed not to represent independent domains, and therefore to be of little use for searching. Raw scores and P values are reported for all hits with at least 40% equivalent identities over the match region.

The conserved portions of the WD repeat region itself were shadowed with X's in order to search for sequences with similar loops; the loops within the WD-repeat region itself were therefore sought collectively, rather than independently.

The BLASTP search is done in the hope that knowing other homologous domains (in the case of leader and trailer sequences) or of similarities in potential active sites or protein-protein interaction sites (in the case of intra-repeat loops) will aid in characterizing the protein.

For more information on BLASTP itself, see either WU-BLAST (Washington University BLAST) version 2.0 or the earlier NCBI-BLAST version 1.4. At BMERC, we use WU-BLAST because it can handle gaps, but NCBI-BLAST is in the public domain.

Finally, the transcript from the compute engine is included. Once again, this has been excerpted for space; see the full text of the cover letter for the complete transcript.


    ------------------------------ Transcript ------------------------------
    Analyzing Sequence 42. This is 5-Jan-99 (18:0:25).

    Using the Type-3 DSM library mdatawd.

    The sequence contains 559 residues.

    FILTERING RESULTS:
    3 Most Probable Super Classes:
    1st Superclass wd repeat  has probability 1
    2nd Superclass generic    has probability 2.414e-34

    3 Most Probable Macro Classes:
    1st Macroclass wd4        has probability 1
    2nd Macroclass wd7        has probability 4.2808e-08
    3rd Macroclass wd5        has probability 1.0328e-08


    Profile probabilities 

    seq       profile1       profile2
    M       0       0
    K       0       0
    V       0       0
    I       0       0
    T       0       0
    C       0       0
    E       0       0
    I       0       0
    A       0       0
    W       0       0
    H       1       0
    N       1       0
    K       1       0
    E       1       0
    P       1       0
    V       1       0
    Y       1       0
    S       1       0
    L       1       0
    D       1       0
    F       1       0
    Q       1       0
    H       0       0
    G       0       0
    T       0       0
    A       0       0
    G       0       0
    R       0       0
    I       0       0
    H       0       0
    R       0       1
    L       0       1
    A       0       1
    S       0       1
    A       0       1
    G       0       1
    V       0       1
    D       0       1
    T       0       1
    N       0       1
    V       0       1
    R       0       1
    I       0       1
    W       0       1
    K       0       1
    V       0       0
    E       0       0
    K       0       0
    G       0       0
    . . .
    T       0       0
    E       0       0
    S       0       0
    L       0       0
    D       0       0
    P       0       0
    O       0       0

    End of Log file for Sequence 42.
The transcript gives exact values (as opposed to reading the plots) for both of the two superclasses that are considered, and the highest-scoring macroclasses (shown graphically in the structural class probability plot). Afterwards, if the sequence has been predicted to be a WD repeat, the profile probabilities are shown in tabular format. The number in each column shows the probability of that residue appearing in any position in the corresponding profile. The DSMs themselves and the profiles they use are described in more detail on the "Description of WD-repeat DSMs" page.

For brevity, we show only enough of the sequence to illustrate the relative placement of the two profiles over the first repeat. Notice that the sequence between the two profile hits (i.e. the sequence of residues with double zeros between the "1/0" and "0/1" residues) is "HGTAGRIH", as it appears in the "Sequence42.1" line in the "Predicted WD-repeat structure" section of the cover letter.


Structural Class Probabilities

(Click on the plot to view a PDF version locally at higher resolution.)

Two bar charts (6Kb)

In the structural class probability plot, we see that the generic superclass has essentially zero probability, and the wd repeat superclass has a probability near unity. This means that psa-request is quite confident that the protein sequence is a WD repeat.

Looking at the macroclass probabilities, we see a strong (not to say exclusive) preference for wd4, a WD-repeat domain with four repeats.


Beta propeller core model

If the server determines that the sequence contains a WD repeat with four to ten repeats, then the final e-mail message of the set consists of a
skeleton structure for the sequence.

In this example, the server discovered four repeats, so the model contains four beta sheets of four strands each. The sheets are designated E1 through E4; each strand is preceded by the secondary structure designator for the sheet to which it belongs. Here we show the first strand only; the full text of the core file message has sixteen such strands.


    E1
    ATOM      1  N   TYR    17       0.87    2.22    8.89   1.00  0.00
    ATOM      2  CA  TYR    17       0.88    3.40    7.94   1.00  0.00
    ATOM      3  C   TYR    17       1.23    3.06    6.45   1.00  0.00
    ATOM      4  O   TYR    17       1.07    3.93    5.55   1.00  0.00
    ATOM      5  CB  TYR    17       1.87    4.52    8.44   1.00  0.00
    ATOM      6  N   SER    18       1.68    1.80    6.20   1.00  0.00
    ATOM      7  CA  SER    18       2.05    1.39    4.82   1.00  0.00
    ATOM      8  C   SER    18       2.33   -0.14    4.68   1.00  0.00
    ATOM      9  O   SER    18       2.72   -0.84    5.67   1.00  0.00
    ATOM     10  CB  SER    18       3.30    2.24    4.34   1.00  0.00
    ATOM     11  N   LEU    19       2.08   -0.66    3.45   1.00  0.00
    ATOM     12  CA  LEU    19       2.33   -2.08    3.10   1.00  0.00
    ATOM     13  C   LEU    19       2.71   -2.11    1.59   1.00  0.00
    ATOM     14  O   LEU    19       2.38   -1.16    0.83   1.00  0.00
    ATOM     15  CB  LEU    19       1.07   -2.99    3.37   1.00  0.00
    ATOM     16  N   ASP    20       3.47   -3.15    1.18   1.00  0.00
    ATOM     17  CA  ASP    20       3.88   -3.29   -0.24   1.00  0.00
    ATOM     18  C   ASP    20       4.09   -4.77   -0.63   1.00  0.00
    ATOM     19  O   ASP    20       4.93   -5.49   -0.01   1.00  0.00
    ATOM     20  CB  ASP    20       5.21   -2.52   -0.51   1.00  0.00
    ATOM     21  N   PHE    21       3.31   -5.21   -1.66   1.00  0.00
    ATOM     22  CA  PHE    21       3.38   -6.61   -2.23   1.00  0.00
    ATOM     23  C   PHE    21       4.73   -6.88   -2.99   1.00  0.00
    ATOM     24  O   PHE    21       5.30   -5.97   -3.66   1.00  0.00
    ATOM     25  CB  PHE    21       2.23   -6.82   -3.28   1.00  0.00
    ATOM     26  N   GLN    22       5.23   -8.14   -2.90   1.00  0.00
    ATOM     27  CA  GLN    22       6.48   -8.51   -3.61   1.00  0.00
    ATOM     28  C   GLN    22       6.09   -8.97   -5.05   1.00  0.00
    ATOM     29  O   GLN    22       5.00   -9.58   -5.24   1.00  0.00
    ATOM     30  CB  GLN    22       7.18   -9.70   -2.86   1.00  0.00
The model coordinates are for the backbone atoms and beta carbons only of the blade strands. No attempt is made to place loop residues or sidechain atoms beyond the beta carbon.

The model is produced by starting with a beta-propeller model constructed with the appropriate number of blades. At present, there is a fixed set of models, one for each number of blades from four to ten. These models were each constructed by selecting a representative blade from an actual beta propeller structure and replicating it with the requisite geometry. [need reference, hypertext or otherwise. -- rgr, 16-Dec-98.]

The psa-request server inserts the amino acid names and sequence indices from the query sequence into the model at the positions dictated by the profile matches. Accordingly, we see that the sequence for this strand, "YSLDFQ", is the first strand of the first blade (the second column on the line labeled "Sequence42.1") in the "Predicted WD-repeat structure" section of the cover letter. Atom numbers are assigned arbitrarily from 1. No additional processing (e.g. CHARMm relaxation) is done; users who submit multiple queries that have the same number of repeats will find that the returned numeric coordinates are identical.


Go to:


Please direct your questions and comments about these Web pages and the PSA e-mail server to:

Bob Rogers <rogers@darwin.bu.edu>
BioMolecular Engineering Research Center
Boston University, Boston Massachusetts
Last modified: Wed Sep 27 21:24:15 EDT 2000