Overview of the prediction process

BMERC : needle tools : Introduction : needle prediction overview


This page describes the larger context of sequence-based structure prediction within which the needle tools are useful.

Table of contents

  1. Overview of the prediction process
    1. Table of contents
    2. Prediction steps
    3. Generating the core library
    4. Generating multiple homolog alignments
    5. Generating MRF files
    6. Running needle

Prediction steps

From scratch, and given a set of PDB files, the overall process of using needle for structure prediction broadly consists of the following three steps:
Core generation
Produce core, sequence, and exposure files from the PDB files. This is decribed in detail below.
Homolog alignment generation (optional)
Given the core and sequence files, obtain suitable homologous sequences from a database, and produce a structure-directed multiple alignment. This step is also described below. [It is still being documented/designed/revised. -- rgr, 28-Mar-97.]
MRF score/environment file generation
Produce MRF score/environment files from the core, sequence, and exposure files (and also homolog alignments, if desired). This is documented on the MRF score generation page, with examples for the simpler cases.
needle prediction
Feed almost all of the above, along with the sequence(s) to be predicted, into the needle program. This is poorly documented on the "seq-choose-core" section of the needle page. [I generated this from a mail message from Rick a year ago when I didn't understand the process myself. -- rgr, 7-Mar-97.]
The process of selecting the PDB file set is not covered here. [Presumably it will be when we revisit this step in the not-too-distant future. -- rgr, 7-Mar-97.]

Generating the core library

This step includes generation of sequence and exposure files, as well as the core files themselves. The MRF score/environment generation process requires all of these files, one each for each core in the library, although needle itself only uses the core and sequence file sets, so the exposure files can be skipped if for some reason the MRF score function is not desired.

Figure 1: Generating exposure, core, and sequence files

The figure above shows the dataflow involved in generating exposure, core, and sequence files from a given PDB file. Click on a program name (in the rectangular boxes) to follow a link to its detailed description; click on a file format name (in the ovals) to follow a link to its definition.

Note that the only input is the PDB file, and that the three files can be generated independently. The core file is generated from the PDB file and a segment definition file, which can be supplied either from DSSP, or (if the PDB file includes HELIX and/or SHEET records) from the crystallographers' secondary structure assignments in the PDB file.

Generating multiple homolog alignments

Structure-directed multiple alignments of sequences homologous to a given core segment are used to "sweeten" the amino acid counts made by the mrf-counts program. Instead of just counting the amino acid that actually appears in a singleton position in a given core, we can use the homology data to count each amino acid that could potentially appear there (and similarly for each unique pair of AA's for pairwise environments). Doing so appears to produce better threading results [data or reference?].

In the future, we would like to be able to thread sets of related sequences simultaneously, and hope to be able to bring these tools to bear on aligning the sequences to thread.

[The original multiple alignment implementation that was used to produce the files in the ~thread/aligment/ directory at BMERC (a) required a fair amount of manual intervention, and (b) was not a true multiple alignment in any case. The right thing to do would be to re-engineer it to use the pima_profile program. Since we are not actually using homolog data at present, preferring to use a large set of actual cores instead, this project is on indefinite hold. -- rgr, 20-Jan-98.]

Generating MRF files

[move new MRF documentation here? or integrate this stuff there? -- rgr, 7-Mar-97.]

Running needle

[elsewhere. -- rgr, 7-Mar-97.]


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Tue Jan 18 21:28:40 EST 2000