BMERC : needle tools : File formats : Structure file formats
This includes secondary structure as well as tertiary structure. The
former is usually based on the DSSP program, and
the latter on PDB ATOM format.
Table of contents
Segment file format
A segment file describes the secondary structures for a PDB entry
(file), one structure per line. Secondary structure elements (segments)
must appear in the same order as the ATOM records in the PDB file. Each
segment is characterized by the following tab-delimited fields:
Segment files have various extensions, generally denoting their origin.
Most of the segment files in use at BMERC have a ".dssp"
extension because they were derived from DSSP, and some end in
".dssp2" or ".dssp3" because they were hand-edited or
postprocessed versions derived from an original ".dssp" file.
[Unfortunately, this convention is a little too terse. A
".dssp" file is neither an
abbreviated DSSP format file, nor the output of
the DSSP program. -- rgr, 29-Jan-97.]
See [Kabsch,W., and Sander,C., Biopolymers 22(1983) 2577-2637], or the DSSP web pages for more information on DSSP, and how to obtain a copy. See the Description of the DSSP program page for details of how to run dssp and how to interpret the "raw" DSSP output format.
An abbreviated DSSP file has a single header line, followed by a series of records each with eight tab-delimited fields. There is one record per residue, plus extra records with an "amino acid" of "!" to denote chain breaks or transitions between chains. Some of these fields, described below, have subfields.
DSSP state codes:
These are interpreted in the order given; if more than one applies, the first is chosen. (Based on [Kabsch&Sander].)
Note that the helical states (H, G, and I) need not consist of an unbroken series of bridged residues; a residue will qualify for a given helical state if it is flanked by at least two pairs of helically bridged residues. The first and last bridged residues are not considered part of the helix, however, which is why the minimum alpha helix length is four (residues i through i+3, with one bridge between i-1 and i+3, and another between i and i+4).
Letter Name Definition H Alpha helix (4-12) Two or more consecutive bridge partners at i and i+4. B Isolated beta-bridge residue Must not have a neighbor that qualifies it for H, E, G, or I status. Bridge partner is identified in BP1 or BP2 column. E Strand ("extended") Has at least one bridge partner and at least one neighbor bridged in parallel or antiparallel. G 3-10 helix Two or more consecutive bridge partners at i and i+3. I pi helix Two or more consecutive bridge partners at i and i+5. T Turn Bridge partner at i+3, i+4, or i+5, but no bridged neighbor that would qualify them for H, G, or I status. S Bend Local curvature greater than 70 degrees, measured as the angle between alpha carbons at i-2, i, and i+2. blank None Meets none of the criteria above.
Backbone phi and psi angles, chirality, disulphide bonds, and solvent exposure are also computed, but do not affect the state code.
Known bugs:
The file consists of 50-character lines, one per residue, with fields as described below. Columns not mentioned must be spaces (though columns 10 through 43 are universally ignored).
Old files may have more fields, not documented here; the extra whitespace is for compatibility. Note that old files will NOT have the chain ID, though the space that will be in that position should serve as such.
Multiple chains may appear in a core file, but it is assumed that it makes sense to thread the entire model with a single sequence in the order in which the segments appear.
Residues within each core segment and the segments themselves within the file appear in amino to carboxyl order. The backbone must be contiguous, and each residue will have exactly five atoms that appear in the order N, CA, C, O, and CB. It is recommended that core files that are not generated automatically by make-core.pl be passed through filter-pdb-atoms.pl in order to guarantee these conditions. Use
filter-pdb-atoms.pl -output cb -hcb -pass-through all \ handmade.core > checked.coreto perform these checks, eliminate extra variants, and generate missing glycine beta carbons. (The -pass-through all option copies the segment designators to the output, but does not check them for validity.)
A core abstraction file is a series of lines, each of which contains one or more tab-delimited fields, with the first field being the record tag. All tags and string values are in lower case.
As an example, here is the entire contents of 1puc.cab, a core of minimal (if not sub-minimum) size:
loop a 10 seg H 5 13.361 13.423 57.722 21.622 13.524 57.136 res 121.997 res 103.892 res 21.241 res 41.551 res 81.506 loop i 51 seg H 6 11.361 27.337 46.732 16.827 21.339 52.382 res 72.848 res 80.841 res 1.468 res 15.439 res 66.237 res 29.616 loop c 33