Core programs

BMERC : needle tools : Programs : Core programs


The first two programs documented below produce core files from PDB atomic coordinates. The third extracts the secondary structure information implicit in the core structure.

Table of contents

  1. Core programs
    1. Table of contents
    2. make-core.pl
    3. make-domain-core.pl
    4. core-ss-states


make-core.pl

make-core.pl takes
the atom records of a PDB file and a segment definition file and produces on the standard output a BMERC-format core file.

Usage:


	make-core.pl [-seg seg-file] [-chain X] [-min-segs n]
		[-locus locus] [-pdb-file pdb-file-name]
   or
	make-core.pl [-seg seg-file] [-chain X] [-min-segs n]
		locus pdb-file-name
If the first form is used, the keyword arguments may be in any order, and the locus is optional. If the second form is used, the locus must be present, and must come before the pdb file name and after the keywords arguments. The PDB file name is required in any case, but may be "-" to denote the standard input, in which case the -seg option must be specified.

Arguments:

-seg seg-file-name ( segment definition file)
name of the file that defines the segment start and end points, which defaults to the standard input. See make-dssp-segs for info on how to make these from "smoothed DSSP" structure assignments.
-chain chain-letter (a single character)
letter identifying the chain to select; defaults to the first chain in the segment definition file, not all chains.
-min-segs n (integer)
defines the minimum number of segments desired; default is 4. If fewer total segments than this are found, a warning message is generated. It is an error for if no segments are found in any case.
-locus locus (string)
identifying string used for warnings.
-pdb-file pdb-file-name
name of a PDB format file.
make-core.pl selects the appropriate ATOM records out of the named PDB file, and pipes them through filter-pdb-atoms.pl to clean up the format, hallucinate beta carbons, and report any chain breaks or other potential problems. Any such problems are therefore in the core segments themselves, and not the loops.

If either the starting or ending residue of a segment cannot be found, make-core.pl exits abnormally with an error message. This also happens if no segments are found for the desired chain, which can either mean that the chain does not exist, or it exists (has residues) but has no secondary structure. A warning message will be generated if the core has at least one segment, but fewer than the -min-segs value; this can be turned off by setting -min-segs to 0, but this does not disable the error for zero-segment cores.

make-core.pl makes a core from exactly one chain. To treat two or more chains as continuous within the same core model, simply concatenate the core files. For example:


   make-core.pl -chain 'A' -seg 1lts.dssp -pdb-file 1lts.ent > 1ltsAC.core
   make-core.pl -chain 'C' -seg 1lts.dssp -pdb-file 1lts.ent >> 1ltsAC.core
This makes 1ltsAC.core from chains A and C of the PDB file 1lts.ent based on segment definitions in the 1lts.dssp file.

Note: In Release 1.0 and earlier, the default for the -seg option was "locus.dssp"; this was changed to "-" (the standard input) in Release 1.1.

Note: In Release 1.0 and earlier, make-core.pl passed -hcb to filter-pdb-atoms.pl; it now fills in all missing beta carbons with -hcb-all.

Known bugs:

  1. If both residues have negative PDB sequence numbers, the segment is truncated to the first atom of the first residue. -- rgr, 23-Feb-97. [Fixed in Release 1.0. -- rgr, 4-Mar-97.]
  2. Segments are prematurely truncated by HETATM records (or any other kind or record for that matter) that appear within the sequence of ATOM records. [Fixed in Release 1.0. rgr, 4-Mar-97.]
  3. *** If beta carbons are missing for residues other than glycine, make-core.pl silently fills them in by passing -hcb-all to filter-pdb-atoms.pl. -- rgr, 20-Mar-98.


make-domain-core.pl

make-domain-core.pl takes
the atom records of a PDB file and a segment definition file and produces on the standard output a BMERC-format core file. This is essentially identical to make-core.pl except that make-domain-core.pl supports an extended chain specification syntax that makes it possible to construct cores from multiple chains and/or portions of chains.

Usage:

	make-domain-core.pl [-seg seg-file] [-chain spec] [-min-segs n]
		[-locus locus] [-pdb-file pdb-file-name]
   or
	make-domain-core.pl [-seg seg-file] [-chain spec] [-min-segs n]
		locus pdb-file-name
If the first form is used, the keyword arguments may be in any order, and the locus is optional. If the second form is used, the locus must be present, and must come before the pdb file name and after the keywords arguments. The PDB file name is required in any case, but may be "-" to denote the standard input (but see the multichain caveat below), in which case the -seg option must be explicitly specified as something other than the standard input.

Arguments:

-seg seg-file-name ( segment definition file)
name of the file that defines the segment start and end points, which defaults to the standard input. See make-dssp-segs for info on how to make these from "smoothed DSSP" structure assignments.
-chain chain-spec
-chains chain-spec
specifies which chain or chains or portions thereof to use; see the "Core chain specification syntax" section for chain-spec syntax details. (For backward compatibility, if only whole chains are desired, the commas between the chain letters can be omitted.) The default is "_", which selects the chain with a chain ID of space (" "). Note that this is different from make-core.pl, which selects the first chain by default. (-chain and -chains are synonyms; at most one may be specified, as they are not cumulative.)
-min-segs n (integer)
defines the minimum number of segments desired; default is 4. If fewer total segments than this are found, a warning message is generated. Regardless of the value of n, it is an error for if no segments are found.
-locus locus (string)
identifying string used for warnings.
-pdb-file pdb-file-name
name of a PDB format file. Note that this is a required parameter for which there is no default.

make-domain-core.pl can make cores from more than one chain. For example:


   make-domain-core.pl -chains 'A,C' -seg 1lts.dssp -pdb-file 1lts.ent > 1ltsAC.core
This makes 1ltsAC.core from chains A and C of the PDB file 1lts.ent based on segment definitions in the 1lts.dssp file. This can be done in one pass over the PDB file since the ATOM records for these chains appear in this order. If one had specified "-chains 'C,A'" instead (the single quotes are not strictly necessary), make-domain-core.pl would have required two passes over the PDB file, in which case it would not have worked to supply the PDB file on the standard input by specifying "-pdb-file -"

More precisely, make-domain-core.pl processes each chain or chain subrange in the chain specification one at a time (e.g. "A,B:1-85" specifies all of chain A in the first subrange, followed by residues 1 through 85 of chain B in the second subrange). If make-domain-core.pl cannot find the chain, or finds the chain but cannot find the indicated starting residue, it exits abnormally with a "can't find chain subrange 'subrange' in PDB file" message. If the chain or chain subrange exists but has no secondary structure, make-domain-core.pl generates a "No segs in 'subrange'?" warning, but continues to make the core. However, if no segments are found for the entire core, make-domain-core.pl exits abnormally with a "Found no segments for chain 'chain-spec'" message. A warning message will be generated if the core has at least one segment, but fewer than the -min-segs value; this can be turned off by setting -min-segs to 0, but this does not disable the error for zero-segment cores.

make-domain-core.pl includes only whole segments (as defined by the segment definition file) in the core it constructs. Therefore, in order for a residue to make it into the core file, it must lie within both the inclusive segment limits of a segment and the inclusive chain subrange boundaries. A warning is generated if the segment runs past the end of the subrange, where it is truncated. (Segments that overlap the start of a subrange aren't detected, which asymmetry could be considered a bug. However, subranges that interrupt core segments are probably pathological anyway, so I'm not inclined to fix it. -- rgr, 20-Sep-98.)

On the other hand, if the last residue of a segment can't be found at all, make-domain-core.pl produces a "Ran out of chain" message, which is fatal (though it continues to process any other chain subranges).

Once make-domain-core.pl selects the appropriate ATOM records out of the PDB file, it pipes them through filter-pdb-atoms.pl to clean up the format, hallucinate beta carbons, and report any chain breaks or other potential problems. Any such problems are therefore in the core segments themselves, and not the loops.

After making the core, make-domain-core.pl generates a warning when any sheets are split. A sheet is considered split if some of its segments lie within the core but others do not.

Known bugs:

  1. *** If beta carbons are missing for residues other than glycine, make-domain-core.pl silently fills them in by passing -hcb-all to filter-pdb-atoms.pl. -- rgr, 20-Mar-98.
  2. *** Should also generate warnings if barrels (not just sheets) are split. -- rgr, 18-Sep-98.
  3. For the -chains option, only old-style (sequential) SCOP residue indices are supported. SCOP stopped using these sometime between July 1998 and March 1999. [fixed in release 1.6. -- rgr, 10-Jan-00.]


core-ss-states

Given a core file and its matching sequence file, the core-ss-states program prints a string of secondary structure letters: 'E' for strands, 'H' for helices, and 'L' for loops. One letter is produced on the standard output for each residue in the sequence (that's how it knows how long the loops are).

Usage:

	core-ss-states -core-file core-file-name
		-sequence-file sequence-file-name

Arguments:

-core-file core-file-name
name of a "core" format file, e.g. "2mhr.core", required for input.
-sequence-file sequence-file-name
name of an IG format sequence file, e.g. "2mhr.seq", required for input.
Note that there are no defaults for either argument, and one cannot specify the standard input by giving "-" as one of the file names.

[Specifying both arguments keyword-style may be somewhat onerous, but it allows me a choice of alternatives for supplying core/sequence data in the future. -- rgr, 28-Mar-97.] [Like I've needed it. -- rgr, 29-Oct-99.]


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Mon Jan 10 17:19:20 EST 2000