Additional tools that are not distributed
BMERC : needle tools : Programs : "Extra" tools
This page describes various software tools loosely associated with the needle tools set that are presently not
available outside of BMERC. They are documented here for convenience.
For more information, please see the needle overview page.
Table of contents
- Additional tools that are not distributed
- Table of contents
- pdb-to-dsm.pl
- expand-dssp.pl
pdb-to-dsm.pl
pdb-to-dsm.pl takes an exposure file and an abbreviated DSSP
format file and generates a series of DSM secondary structure states
on the standard output. The DSSP file is passed through dssp4.pl -clean
option in order to generate "smoothed" states.
Usage:
pdb-to-dsm.pl -locus locus [ -chain chain-id ]
[ dssp-file [ exposure-file ] ]
Arguments:
- -locus locus
- string that is used as the locus name, mainly for error messages,
and for constructing default file names. (The locus may also
come before the DSSP file name without the keyword, but that
syntax is now deprecated. -- rgr, 26-Aug-97.)
- -chain chain-id
- optional single-character chain identifier. If
the -chain argument is not supplied,
pdb-to-dsm.pl uses the first chain encountered in the
DSSP file.
- dssp-file
- abbreviated
DSSP format file for locus. If the dssp-file
is not supplied it defaults to
"/structure/dssp/pdblocus.ent.out", the standard
BMERC location. If the dssp-file is "-", then
the secondary structure information is taken from the standard
input, and it is assumed to already be in the
dssp4.pl default output format.
- exposure-file
- exposure file for
locus. If the exposure-file is not supplied, it
defaults to "locus.nexp" in the current
directory, and then (if that does not exist),
"~thread/structure/full/dssp/exposure/locus.nexp".
pdb-to-dsm.pl will die with an error message if it can't
read either of the required files, or if the entries don't correspond,
or if the requested chain ID doesn't exist in either file. It will also
die if it generated zero states, though that shouldn't happen when the
chain exists.
Normally, the DSSP and exposure file entries are one-to-one, but
pdb-to-dsm.pl tries to be smart about minor inconsistencies.
If a residue is present in the DSSP data but not in the exposure file,
the residue is assumed to be buried; this is only relevant for helix or
strand states, in which case a warning is printed. If a residue is
present in the exposure file but not the DSSP, then the secondary
structure is assumed to be a loop; a warning is always printed.
The output is a single line of space-separated decimal state codes,
one code per residue in the combined files. The output codes are
defined by the following table:
1 helix buried
2 helix exposed
3 strand buried
4 strand exposed
5 loop
6 turn 1
7 turn 2
8 turn 3
9 turn 4
[Anything else is turned into a loop, with a warning since
dssp4.pl should produce only these. -- rgr, 20-Dec-96.]
Known bugs:
- We should use
generate-exposure to generate the exposure file if we can't
find it. -- rgr, 20-Dec-96.
- The exposure threshold is wired at 33.6; there is no way to
change it. -- rgr, 26-Aug-97.
- pdb-to-dsm.pl has hardwired knowledge of standard BMERC
file locations. -- rgr, 26-Aug-97.
expand-dssp.pl
In general, PDB atom records may represent a subset of amino acids of
the complete protein, because loops may be disordered enough not to be
visible in the crystal structure. Furthermore, the residues output by
the DSSP program may be an even smaller subset,
since DSSP requires all backbone atoms to be present. If one is willing
to consider all missing residues as being in "loop" states, then this
limitation can be lifted using expand-dssp.pl, which takes DSSP
data in the
dssp4.pl default output format and a complete protein
sequence. The resulting output has the same DSSP data in the same
format, but with loop resides inserted where needed to make it look as
if DSSP had been run on the complete sequence.
Note that expand-dssp.pl always operates on exactly one
chain, even if the -chain argument is not specified.
Usage:
expand-dssp.pl [-chain L] full-sequence < dssp-in > dssp-out
Arguments:
- -chain L
- optional single-character chain identifier. If
the -chain argument not supplied, it defaults to the
first chain encountered in the DSSP file.
- full-sequence
- the full protein sequence as a string of one-letter abbreviations
on the command line.
- dssp-in
- input in the
dssp4.pl default output format; can be from a named
file or from the standard input.
- dssp-out
- output in the
dssp4.pl default output format, sent to the standard
output.
- -verbose
- if specified, prints debugging info to the standard error stream.
Known bugs:
-
Bob Rogers
<rogers@darwin.bu.edu>
Last modified: Fri Nov 26 19:46:49 EST 1999