Dependency file generators

BMERC : needle tools : Programs : Dependency generators


This page gives detailed descriptions of the needle tools dependency file generator scripts. General documentation is provided on the Dependency files for make page.

Table of contents

  1. Dependency file generators
    1. Table of contents
    2. make-core-depends.pl
      1. make-core-depends.pl file types
      2. Macros used by make-core-depends.pl
    3. make-mrf-depends.pl
      1. make-mrf-depends.pl file types
    4. Known bugs in the dependency file generators


make-core-depends.pl

By default, make-core-depends.pl generates targets on the standard output that build a set of core and sequence files. It can also be used to generate targets for exposure files; these three sets of files are orthogonal, and can be generated independently (see
the core file generation flowchart) from PDB files.

If core files are desired, make-core-depends.pl will also generate smoothed DSSP secondary structure definitions if necessary from abbreviated DSSP files, which in turn will be generated if necessary from the PDB file.

Additional arguments:

-make-exposure-files
generate exposure file targets for Eisenberg "fat alanine" exposure, which are not made by default.
-no-core-files
suppress generating core and segment file targets, which are otherwise made by default.
-no-seq-files
suppress generating sequence file targets, which are otherwise made by default.

make-core-depends.pl file types

The following table contains an alphabetical listing of all file types supported by make-core-depends.pl, together with their default prefix, suffix, and macro name values.

[need to include file naming convention used. -- rgr, 8-Jul-98.]

File Type File name prefix/suffix/macro/make state args & defaults Description (with file format)
Invocation macro name and default(s)
abbrev-dssp -abbrev-dssp-file-prefix ''
-abbrev-dssp-file-suffix '.ent.out'
-abbrev-dssp-macro abbrev-dssp-files

-path-abbrev-dssp-files
Abbreviated DSSP file; file naming is based on the PDB entry.
GENERATE-DSSP = generate-dssp
core -core-file-prefix ''
-core-file-suffix '.core'
-core-macro core-files

-make-core-files
Core file.
MAKE-CORE = make-core.pl
MAKE-CORE = make-domain-core.pl
exposure -exposure-file-prefix ''
-exposure-file-suffix '.nexp'
-exposure-macro exposure-files

-local-exposure-files
Eisenberg "fat ALA" exposure (.nexp) file.
GENERATE-EXPOSURE = efa.pl
GENERATE-EXPOSURE = generate-exposure
gmt-env -gmt-env-file-prefix 'singleton_environments_MRF_'
-gmt-env-file-suffix '.dat'
-gmt-env-macro gmt-environments

-local-gmt-env-files
GMT environment file (singleton environment format).
MRF-GMT-ENVS = mrf-envs
pdb -pdb-file-prefix ''
-pdb-file-suffix '.ent'
-pdb-macro pdb-files

-path-pdb-files
PDB file.
seg -seg-file-prefix ''
-seg-file-suffix '.dssp'
-seg-macro segment-files

-local-seg-files
Segment definition file.
MAKE-SS-DESIGNATIONS = make-ss-designations
seq -seq-file-prefix ''
-seq-file-suffix '.seq'
-seq-macro sequence-files

-make-seq-files
Sequence (IG) file.
MAKE-SEQ-FILE = make-seq-file.pl
MAKE-SEQ-FILE = pdb-domain-seq.pl

Macros used by make-core-depends.pl

[probably need to generalize this, or at least refer to the general discussion. -- rgr, 28-Apr-98.]

The following make macros must be defined by the including makefile in order for the output of make-core-depends.pl to run successfully. For the most part, these macros are named after the programs they are typically used to invoke. Somewhere in the makefile, these should be defined as follows:

    GENERATE-DSSP = generate-dssp
    DSSP4 = dssp4.pl -dont-fill-e-gaps
    MAKE-SS-DESIGNATIONS = make-ss-designations
    GENERATE-EXPOSURE = generate-exposure
    MAKE-CORE = make-core.pl
    MAKE-SEQ-FILE = pdb-to-seq.pl
Notice how the DSSP4 macro supplies an optional argument in addition to the program name. Other programs could be used, but they must accept similar arguments; consult the output of make-core-depends.pl to see exactly which arguments are passed in any given case.

Name Usage Compatible program
Unless core files were suppressed via -no-core-files:
GENERATE-DSSP Invokes a program that makes abbreviated DSSP file format from a PDB file. generate-dssp
DSSP4 Invokes a program that produces filtered "ss" format from abbreviated DSSP file format. dssp4.pl
MAKE-SS-DESIGNATIONS Invokes a program that produces segment file format from "ss" format from abbreviated DSSP file format. make-ss-designations
MAKE-CORE Invokes a program that makes a core file from a PDB file and its corresponding segment file. make-core.pl
Unless sequence files were suppressed via -no-seq-files:
MAKE-SEQ-FILE Invokes a program that makes a sequence file from a PDB file. pdb-to-seq.pl
If exposure files were requested via -make-exposure-files:
GENERATE-EXPOSURE Invokes a program that makes an Eisenberg "fat alanine" exposure file from a PDB file. generate-exposure


make-mrf-depends.pl

[the cross-validation discussion in the "Types of cross-validation" section belongs in mrf.html instead. -- rgr, 24-Apr-98.]

The make-mrf-depends.pl script takes a list of core names and produces on the standard output the appropriate rules that tell the Unix make utility how to create the MRF score and environment files.

make-mrf-depends.pl overflows with options, so that every aspect of score/environment file generation can be customized, but using the defaults works well for the simplest case of producing scores for a new core library. See elsewhere for a simple example of usage. The important arguments are -core-list-file, which is required to identify the model set; and -search-path, which allows make-mrf-depends.pl to find files in other directories. For more complicated cases, one can either (a) modify one of the examples [give link]; (b) pipe the output of make-mrf-depends.pl through (e.g.) a sed or awk script that makes the required changes; or (c) hand-edit the results of running with default parameters (though this does make it more difficult to change the model set).

Arguments:

See the "Dependency file generator arguments" and "MRF-specific dependency generator arguments" sections for details of common arguments.

Note that GMT environment files are independent of the singleton and pairwise environment files. GMT environments do not depend on anything except the length of the core, and are used only for threading and not counting or score generation, so they are completely orthogonal to the rest of the MRF process. (Indeed, the fact that they are still generated by the mrf-envs program is something of a historical accident.)

make-mrf-depends.pl file types

The following table contains an alphabetical listing of all file types supported by make-mrf-depends.pl, together with their default prefix, suffix, and macro name values.

[need to include file naming convention used. -- rgr, 8-Jul-98.]

File Type File name prefix/suffix/macro/make state args & defaults Description (with file format)
Invocation macro name and default(s)
abbrev-dssp -abbrev-dssp-file-prefix ''
-abbrev-dssp-file-suffix '.ent.out'
-abbrev-dssp-macro abbrev-dssp-files

-path-abbrev-dssp-files
Abbreviated DSSP file; file naming is based on the PDB entry.
GENERATE-DSSP = generate-dssp
core -core-file-prefix ''
-core-file-suffix '.core'
-core-macro core-files

-path-core-files
Core file.
MAKE-CORE = make-core.pl
MAKE-CORE = make-domain-core.pl
counts -counts-file-prefix ''
-counts-file-suffix '-mrf.cnt'
-counts-macro core-counts

-local-counts-files
Counts file.
MRF-COUNTS = mrf-counts
exposure -exposure-file-prefix ''
-exposure-file-suffix '.nexp'
-exposure-macro exposure-files

-path-exposure-files
Eisenberg "fat ALA" exposure (.nexp) file.
GENERATE-EXPOSURE = efa.pl
GENERATE-EXPOSURE = generate-exposure
gmt-env -gmt-env-file-prefix 'singleton_environments_MRF_'
-gmt-env-file-suffix '.dat'
-gmt-env-macro gmt-environments

-make-gmt-env-files
GMT environment file (singleton environment format).
MRF-GMT-ENVS = mrf-envs
gmt-score -gmt-score-file-prefix 'singleton_scores_x_MRF_'
-gmt-score-file-suffix '.dat'
-gmt-score-macro gmt-scores

-make-gmt-score-files
GMT score file (singleton score format).
MRF-GMT-SCORES = mrf-scores \
    -gmt-marginal-file mrf.msd \
    -min-pair-count 4
loop-score -loop-score-file-prefix 'loop_scores_x_MRF_'
-loop-score-file-suffix '.dat'
-loop-score-macro loop-scores

-make-loop-score-files
Loop score file.
MRF-LOOP-SCORES = mrf-scores -poisson -normalize
pairwise-env -pairwise-env-file-prefix 'pairwise_environments_reference_'
-pairwise-env-file-suffix '.dat'
-pairwise-env-macro pairwise-environments

-local-pairwise-env-files
Pairwise environment file.
MRF-PAIR-ENVS = mrf-envs
pairwise-score -pairwise-score-file-prefix 'pairwise_scores_x_MRF_'
-pairwise-score-file-suffix '.dat'
-pairwise-score-macro pairwise-scores

-make-pairwise-score-files
Pairwise score file.
MRF-PAIR-SCORES = mrf-scores -pair-poisson 1
pdb -pdb-file-prefix ''
-pdb-file-suffix '.ent'
-pdb-macro pdb-files

-path-pdb-files
PDB file.
seg -seg-file-prefix ''
-seg-file-suffix '.dssp'
-seg-macro segment-files

-path-seg-files
Segment definition file.
MAKE-SS-DESIGNATIONS = make-ss-designations
seq -seq-file-prefix ''
-seq-file-suffix '.seq'
-seq-macro sequence-files

-path-seq-files
Sequence (IG) file.
MAKE-SEQ-FILE = make-seq-file.pl
MAKE-SEQ-FILE = pdb-domain-seq.pl
singleton-env -singleton-env-file-prefix 'mrf-se-10efa-2ss-'
-singleton-env-file-suffix '.dat'
-singleton-env-macro singleton-environments

-local-singleton-env-files
Singleton environment file.
MRF-SING-ENVS = mrf-envs
MRF-SING-ENVS = sing-envs.pl
singleton-score -singleton-score-file-prefix 'mrf_ss_x_'
-singleton-score-file-suffix '.dat'
-singleton-score-macro singleton-scores

-make-singleton-score-files
Singleton score file.
MRF-SING-SCORES = mrf-scores -poisson -normalize


Known bugs in the dependency file generators

  1. *** There is no (user-accessible) way to define how to make secondary structure definitions other than smoothed DSSP, though the dependency generators will use files created with the -seg-file-suffix if they already exist on the search path. -- rgr, 20-Feb-97.
  2. *** If you specify an -exposure-file-suffix other than ".nexp", any locus.nexp files that happen to be in that directory get overwritten anyway. This is a limitation of the generate-exposure program. -- rgr, 27-Mar-97. [which is why this should be rewritten to use efa.pl instead. -- rgr, 6-Nov-98.]


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Tue Apr 4 22:39:28 EDT 2000