About the PSA Server

BMERC : psa-request : About PSA

Table of Contents

  1. About the PSA Server
    1. Table of Contents
    2. Overview
    3. Limitations
    4. Using the PSA Server
    5. Commercial Users
    6. Publication of Results


Overview

The Protein Sequence Analysis (PSA) server predicts protein secondary and tertiary structure based on sequence, and is available for researchers who have amino acid sequences for proteins of unknown structure and for which no homologous sequences are known. To use PSA, one submits a single amino acid sequence to the server, which may be instructed to analyze the sequence in one of three ways: using Type-1, Type-2, or WD-repeat DSMs.
DSMs are Discrete State-space Models for patterns of alpha-helices, strands, tight turns, and loops in specific structural classes.

Type-1 models are for complete sequences from monomeric, single-domain, globular, water-soluble proteins in several recognized structural classes. Now, however, there is an exception, because a set of Type-1 DSMs have been included for transmembrane proteins with a beta-barrel fold like porin. In any case, the Type-1 analysis may be applied to sequences having lengths in the range from 40 to 350 residues. Type-1 DSMs are also appropriate for those subsequences of membrane-spanning proteins that are believed to extend beyond the membrane (based on a hydropathy profile). See the "Description of Type-1 DSMs" section for more information on Type-1 models.

Type-2 models, in contrast, are for either partial or complete sequences from potentially large proteins that violate one or more of the modeling assumptions embodied in Type-1 models. For example, Type-2 models are appropriate for proteins that have one or more of the following properties: (1) they are multimeric; (2) they have more than one structural domain; or (3) they are not globular or soluble (e.g., membrane-spanning proteins). Type-2 models can be applied to sequences up to 1000 residues long. See the "Description of Type-2 DSMs" section for more information on Type-2 models.

WD-repeat models are specially designed for the WD-repeat family of proteins (see the http://bmerc-www.bu.edu/wdrepeat/ pages). These models combine a particular Type-1 structural model with sequence-specific pattern information. They can be applied to sequences up to 1000 residues long. Multi-domain proteins can be handed to the server intact; the region containing the WD-repeat domain will be identified by the server automatically. See the "Description of WD-repeat DSMs" section for more information on WD-repeat models.

The PSA System determines the probable placement of secondary structural elements along the sequence. In addition, when using Type-1 models, it also determines the probable tertiary structural class of the protein. In fact, it uses knowledge of this structural class when it computes the probabilities for secondary structural elements. The output of the server can either be e-mailed to the requester, or put on the Web at a private address that is e-mailed instead. The exact nature of the output depends on the type of analysis requested.

Plots may be produced in either PostScript or Portable Document Format (PDF) format. These are e-mailed back to the requester (for e-mail return of results), or converted to PNG format for immediate viewing, with a link for downloading in the requested printable format.

The analysis algorithm is based on probabilistic Discrete State-space Models (DSMs) and optimal filtering and smoothing algorithms as described in the paper "Structural analysis based on state-space modeling" by C.M. Stultz, J.V. White, and T.F. Smith, Protein Science (1993), 2:305-314. The mathematical basis for the models and algorithms is presented in "Protein Classification by Stochastic Modeling and Optimal Filtering of Amino-Acid Sequences," by J.V. White, C.M. Stultz, and T.F. Smith, Mathematical Biosciences (1994), 119:35-75. For an extended discussion of our approach, see "Predicting Protein Structure with Probabilistic Models." by C.M. Stultz, R. Nambudripad, R.H. Lathrop, and J.V. White, pp. 447-506 in: "Protein Structural Biology in Bio-Medical Research" (1997) (Editors: N. Allewell and C. Woodward), Vol. 22B, "Advances in Molecular and Cell Biology" (Editor: E.E. Bittar), JAI Press, Greenwich.

Limitations

The psa-request server and its libraries of DSMs are subject to the following limitations:
  1. When using Type-1 models to determine the probable tertiary structural class of the protein, the probability calculated for each structural class indicates the relative probability of that structural class compared with all of the other structural classes in our library. The probabilities are weights of support for the available DSMs as explanations of your sequence. Protein folds for which we have no DSM are not predicted by the PSA server. Moreover, the probabilities reported by our system are not the probabilities that the analysis is "correct" or "true." Rather, the probabilities indicate which of our DSMs are the best explanations of your sequence.

  2. The current library of Type-1 models is primarily intended for monomeric, single-domain, soluble, globular proteins having sequences in the length range from 35 through 350 residues. The Type-2 models and WD-repeat models are based on fewer analysis assumptions and deal with sequences of any length. However, for practical reasons, sequences analyzed by the PSA System using Type-2 models and WD-repeat models are limited to 1000 residues. To analyze longer sequences, you may split them into 1000-residue subsequences, with some overlap, and analyze each of the subsequences separately.

Using the PSA Server

You may submit a sequence to the PSA server for analysis in either of two ways:
See the "Results from the psa-request server" section for more information on what data are returned and how to intepret them.

Commercial Users

There is a limit of one request per month from unlicensed commercial users. If you are interested in obtaining a commercial license for using the PSA server, please contact TASC at
http://www.tasc.com/.

To obtain information about more detailed applications of our analysis technique, which may be useful for your specific research project, please contact James White:

James V. White
JVWhite.Com
5 Kelly Road
Cambridge MA 02139
e-mail: jvwhite@jvwhite.com
v-mail:617-868-3045
fax:617-868-3372

These more detailed applications involve the use of novel DSMs that are not in the standard libraries used by the PSA server. Such DSMs can be constructed to model the specific structural hypotheses being considered for a single sequence or set of related sequences.

Publication of Results

If you publish any information provided by our PSA e-mail server, please reference this Web site and these technical articles:


Go to:


Please direct your questions and comments about these Web pages and the PSA e-mail server to:

Bob Rogers <rogers@darwin.bu.edu>
BioMolecular Engineering Research Center
Boston University, Boston Massachusetts
Last modified: Fri Dec 15 08:25:25 EST 2000