Additional tools that are not distributed

BMERC : needle tools : Programs : "Extra" tools


This page describes various software tools loosely associated with the needle tools set that are presently not available outside of BMERC. They are documented here for convenience.

Table of contents

  1. Additional tools that are not distributed
    1. Table of contents
    2. abstract-core.pl
    3. pdb-to-dsm.pl
    4. expand-dssp.pl


abstract-core.pl

abstract-core.pl produces a description of a core file ("abstract" is a verb here) along with exposure information, in the
"core abstraction" or ".cab" file format. abstract-core.pl takes optional exposure file and visible volume singleton files. Loop records are added if a -seq-file is specified.

Usage:

    abstract-core.pl -core-file core-file-name [ -seq-file seq-file-name ]
	  [ -o core-state-file ]
	  [ -exposure-file exposure-file-name ] [ -vv-file vv-file-name ]

Arguments:

-core-file core-file-name
names a core format file, required for input.
-o core-state-file
if given, names the output file; the default is the standard output, which may also be named explicitly as "-".
-seq-file seq-file-name
names an optional IG-format sequence file. If this is supplied, the generated core state file will contain loop records.
-exposure-file exposure-file-name
optional exposure file for this core locus, usually included.
-vv-file vv-file-name
optional visible volume singleton file for this core locus, usually omitted.
Note that it is possible to omit both exposure files, but the current core DSM generator requires at least EFA exposure.


pdb-to-dsm.pl

pdb-to-dsm.pl takes an
exposure file and an abbreviated DSSP format file and generates a series of DSM secondary structure states on the standard output. The DSSP file is passed through dssp4.pl -clean option in order to generate "smoothed" states.

Usage:

    pdb-to-dsm.pl -locus locus [ -chain chain-id ]
		  [ dssp-file [ exposure-file ] ]

Arguments:

-locus locus
string that is used as the locus name, mainly for error messages, and for constructing default file names. (The locus may also come before the DSSP file name without the keyword, but that syntax is now deprecated. -- rgr, 26-Aug-97.)
-chain chain-id
optional single-character chain identifier. If the -chain argument is not supplied, pdb-to-dsm.pl uses the first chain encountered in the DSSP file.
dssp-file
abbreviated DSSP format file for locus. If the dssp-file is not supplied it defaults to "/structure/dssp/pdblocus.ent.out", the standard BMERC location. If the dssp-file is "-", then the secondary structure information is taken from the standard input, and it is assumed to already be in the dssp4.pl default output format.
exposure-file
exposure file for locus. If the exposure-file is not supplied, it defaults to "locus.nexp" in the current directory, and then (if that does not exist), "~thread/structure/full/dssp/exposure/locus.nexp".

pdb-to-dsm.pl will die with an error message if it can't read either of the required files, or if the entries don't correspond, or if the requested chain ID doesn't exist in either file. It will also die if it generated zero states, though that shouldn't happen when the chain exists.

Normally, the DSSP and exposure file entries are one-to-one, but pdb-to-dsm.pl tries to be smart about minor inconsistencies. If a residue is present in the DSSP data but not in the exposure file, the residue is assumed to be buried; this is only relevant for helix or strand states, in which case a warning is printed. If a residue is present in the exposure file but not the DSSP, then the secondary structure is assumed to be a loop; a warning is always printed.

The output is a single line of space-separated decimal state codes, one code per residue in the combined files. The output codes are defined by the following table:

 
   1	helix buried
   2	helix exposed
   3	strand buried
   4	strand exposed
   5	loop
   6	turn 1
   7	turn 2
   8	turn 3
   9	turn 4
[Anything else is turned into a loop, with a warning since dssp4.pl should produce only these. -- rgr, 20-Dec-96.]

Known bugs:

  1. We should use generate-exposure to generate the exposure file if we can't find it. -- rgr, 20-Dec-96.
  2. The exposure threshold is wired at 33.6; there is no way to change it. -- rgr, 26-Aug-97.
  3. pdb-to-dsm.pl has hardwired knowledge of standard BMERC file locations. -- rgr, 26-Aug-97.


expand-dssp.pl

In general, PDB atom records may represent a subset of amino acids of the complete protein, because loops may be disordered enough not to be visible in the crystal structure. Furthermore, the residues output by the
DSSP program may be an even smaller subset, since DSSP requires all backbone atoms to be present. If one is willing to consider all missing residues as being in "loop" states, then this limitation can be lifted using expand-dssp.pl, which takes DSSP data in the dssp4.pl default output format and a complete protein sequence. The resulting output has the same DSSP data in the same format, but with loop resides inserted where needed to make it look as if DSSP had been run on the complete sequence.

Note that expand-dssp.pl always operates on exactly one chain, even if the -chain argument is not specified.

Usage:


    expand-dssp.pl [-chain L] full-sequence < dssp-in > dssp-out

Arguments:

-chain L
optional single-character chain identifier. If the -chain argument not supplied, it defaults to the first chain encountered in the DSSP file.
full-sequence
the full protein sequence as a string of one-letter abbreviations on the command line.
dssp-in
input in the dssp4.pl default output format; can be from a named file or from the standard input.
dssp-out
output in the dssp4.pl default output format, sent to the standard output.
-verbose
if specified, prints debugging info to the standard error stream.

Known bugs:


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Wed Apr 4 11:29:40 EDT 2001