Dependency files for make

BMERC : needle tools : Introduction : Dependency files


[ make-vvenv-depends.pl is being documented here for completeness, even though it may not be distributed as part of needle tools for some time. -- rgr, 24-Apr-98.]

Table of contents

  1. Dependency files for make
    1. Table of contents
    2. About make and dependency files
      1. makefile syntax
      2. Macros used to invoke commands
      3. How make works
    3. Dependency file generator arguments
      1. Standard arguments to dependency file generators
      2. File-type-specific arguments
      3. Defined file types
      4. The -search-path feature
      5. Only one generator per directory
      6. The -use-plus feature
    4. Which files are constructed
      1. Exceptional cases
    5. Targets and macros
    6. MRF-specific dependency generator arguments
      1. Cross-validation
      2. Sharing cross-validation set score files

About make and dependency files

In order to build a core set and a scoring function for use by needle, a great many files must be generated, either directly for use by needle, or for use by other programs as intermediates. Using the Unix make utility greatly speeds the process of constructing all of these files -- once a makefile has been created, that is. In order to ease the burden of managing such relatively large makefiles, needle tools includes three perl scripts --
make-core-depends.pl, make-mrf-depends.pl, and make-vvenv-depends.pl -- that automate most of the task of constructing specific files. The user need only supply the following information in the makefile:
  1. Which programs and standard options to use for constructing a given type of files.
  2. Catch-all "targets" that may specify, among other things, which types of files to make by default, which to "clean" (i.e. remove) when rebuilding everything, how to build any common files (e.g. cross-validation sets) that are used in standard options, and how to build exceptions to the standard rules.
  3. How to generate the necessary dependencies, in the form of an invocation of one of the dependency generators listed above.
The information supplied to the dependency generator consists of the following:
  1. A set of loci for which to build files.
  2. The types of files to build, e.g. core, sequence, etc.
  3. A set of directories in which to find input files (i.e. those that are not to be built).
  4. Any modifications to the file naming conventions.
  5. Any exceptions to the standard rules for constructing files.

makefile syntax

For most purposes, a makefile consists of five kinds of lines:

Macro, dependency, and command lines may be continued by putting a backslash ("\") character immediately before the end-of-line. Macro definitions continued this way are turned into single lines by substituting the backslash, newline, and any leading whitespace on the next line with a single space character. Command lines continued this way are executed (by a subshell) as a unit, but the backslashes and newlines are kept intact.

Comment lines start with "#" in the first column, and run to the end of the line. They may not be continued with backslashes.

Macro definition lines are of the form

    name = value
Whitespace before the name and around the "=" is not significant. Environment variables are implicitly defined as macros. make predefines a few macros such as "CC", which invokes the C compiler; most of these are standard across different versions, but some are not. If a given name would be defined by more than one of these ways, then explicitly defined macros take precedence over environment variable values, which in turn take precedence over "built-in" macros.

For example, a macro defined using

    search-path = ../foo/ /usr/bar/ /usr/baz/
may be invoked with "${search-path}" or "$(search-path)", and so

    random-target:  ${search-path}
            make-core-depends.pl -search-path ${search-path} . . .
is understood as

    random-target:  ../foo/ /usr/bar/ /usr/baz/
            make-core-depends.pl -search-path ../foo/ /usr/bar/ /usr/baz/ . . .
Macros that are not defined expand to the null string when invoked, without causing an error.

The convention used by the needle tools code is that macros used for lists of file names are in lower case, and macros used for command invocations are in upper case.

Dependency lines define which files (the dependencies) are used to construct a given target, and are of the form:

    target: dependency1 dependency2 . . .
In the example above, make would decide that the random-target file was out of date and needed to be reconstructed if it was missing, or if one or more of its dependencies -- foo, bar, and baz -- was newer (i.e. had a more recent file write date). In that case, make would execute the associated command lines.

Command lines start with a tab character and must appear directly after a dependency line. If continued onto multiple lines via backslashes, each continuation line must also start with a tab. (Command lines are not strictly required, since make has implicit rules for constructing object files from source files for a variety of compiler languages, but none of these rules apply to needle files.) Multiple command lines may be specified for a single target; the target is considered to have been successfully built only if all commands return zero status when executed.

Note that there need not be any commands at all. Furthermore, the target need not correspond to a file (which means that the commands are not bound to create one). This is useful for creating shorthand targets, such as the "all" target in the example in the next section.

Here is an example taken from the "Making the dependency file" section of the "MRF Generation Examples" page:


    1bmtA-mrf.cnt: mrf-se-10efa-2ss-1bmtA.dat \
		pairwise_environments_reference_1bmtA.dat ../cores/1bmtA.core
	    ${MRF-COUNTS} -pair-env-file pairwise_environments_reference_1bmtA.dat \
		    -sing-env-file mrf-se-10efa-2ss-1bmtA.dat -core-file \
		    ../cores/1bmtA.core -sequence-file ../cores/1bmtA.seq \
		    -write-counts 1bmtA-mrf.cnt
The dependency line (split into two physical lines for readability) states that 1bmtA-mrf.cnt is to be made from mrf-se-10efa-2ss-1bmtA.dat, pairwise_environments_reference_1bmtA.dat, and ../cores/1bmtA.core. The command line immediately following (also continued across several physical lines) dictates how; if make determines that 1bmtA-mrf.cnt needs to be rebuilt, then it will execute that command after replacing "${MRF-COUNTS}" with the current value of the MRF-COUNTS macro.

Include lines are of the form

    include filename
The first eight characters on the line must be "include " (note the space), and the file name must occupy the rest of the line. The named file is read as if it had been inserted in the makefile in place of the include line.

Macros used to invoke commands

One broadly-used makefile convention is to employ macros to supply command names and standard options. For example, the following standard definition causes
make-mrf-depends.pl to use the sing-envs.pl script for computing singleton environments.
   MRF-SING-ENVS = sing-envs.pl
This is because make-mrf-depends.pl generates singleton environment targets that look something like this:

    mrf-se-f10vv-2ss-1aba.dat: ../cores/1aba.core ../vv-data/1aba_bb.singl \
		../cores/1aba.seq
	    ${MRF-SING-ENVS} -sing-env-file mrf-se-f10vv-2ss-1aba.dat -core-file \
		    ../cores/1aba.core -sequence-file ../cores/1aba.seq \
		    -exposure-file ../vv-data/1aba_bb.singl
Instead, one can use any program that accepts the same pattern of arguments and computes the same result (more or less) using the same file formats, such as mrf-envs:
    MRF-SING-ENVS = mrf-envs
In order for this target to work, something must be assigned to this macro so that expanding "${MRF-SING-ENVS}" results in a complete command.

Additionally, the program can be supplied with extra options that influence its behavior.

   MRF-SING-ENVS = sing-envs.pl ${sing-env-opts}
Such macro-supplied options always appear before the file name options produced by the dependency generator; it is not possible to reorder or alter these (unless, of course, you write a script that parses all the options, massages them, and passes them on to the appropriate program).

How make works

make operates by reading in the entire makefile and all included files (including a system-defined file of standard definitions), and then goes about constructing the target or targets named on the command line, or the first target in the file if none were given. In order to build a target, make first ensures that its dependencies (if any) are up to date by building them recursively. If the target file itself is not present, or any dependency file needed rebuilding, or the target file is older than any of the dependency files, then make considers the target out-of-date and would execute the command lines associated with the target.

Since the first target is built by default, it is traditionally called "all", and is typically a shorthand target for building all of the interesting targets of which the makefile is capable. For example, the following "all" target is suitable for a core makefile:

    all: core-files segment-files sequence-files
Invoking "make" (or explicitly "make all") will simply cause each of the dependency targets -- core-files, segment-files, and sequence-files -- to be built. These are shorthand targets for the three classes of files for which make-core-depends.pl has presumably been told to generate dependencies; each of them in turn depends on the actual files, which are then created. There are no additional commands to execute for this target, and make does not complain about the lack of a file named "all" in the current directory.

Other traditional targets are "clean", which deletes the files that are made, presumably in order to remake them again, and "install", which, for a sofware package such as needle tools, causes the newly built programs to be moved to the correct place in the system so that they may be used by users other than the person who constructed the system. The clean target is usually implemented with "rm -f ${files}", where "${files}" is a macro (or series of macros) that names all of the files to be deleted. The install target is not necessary for needle tools files, since they are used "in place."

Dependency file generator arguments

The dependency file generator programs ( make-core-depends.pl, make-mrf-depends.pl, and make-vvenv-depends.pl) support a large set of options, so that most aspects of dependency file generation can be customized. These arguments break down into three categories:

  1. general arguments that are accepted by all dependency file generators;
  2. file-type-specific arguments, which have the same form for each file type; and
  3. program-specific arguments, which are mostly for backward compatibility with older implementations.
In any case, most options can be safely ignored on the first pass, since using the defaults works well for most cases. The important arguments are -core-list-file, which is required to identify the model set; and -search-path, which allows the dependency file generators to find files in other directories.

Standard arguments to dependency file generators

-core-list-file core-def-file
supplies the name of a file in core definition file format. The default is to read the core list from the standard input.
-search-path path-prefix . . .
list of prefixes (not directory names) with which to search for files. See the -search-path discussion below for an example. (These are prefixes for historic reasons, in order to accomodate historic naming conventions formerly in effect at BMERC.)
-but-not file . . .
-input-files file . . .
specifies the names of intermediate files for which the dependency generator must not generate targets, even when they are required for something else. This allows them to be hand-tweaked to fix problems. -but-not just suppresses the target, -input-files also keeps them off the macro containing all files of that type. See the "Exceptional cases" section for details.
-use-plus
formerly, when specified, -use-plus directed the dependency file generator to use the "+" feature of some make programs. This is no longer supported. See below for details.
-use-links
[finish; see below. -- rgr, 11-Oct-98.]
-line-length len (integer)
specifies the line length to use for output; default is 80 characters. Lines are wrapped with a backslash before this point for readability.
-dump-rules
requests rule debugging. If this option is specified, the makefile generator produces a table of rules on the standard output after all arguments are processed, and exits without generating the normal makefile syntax output. The rule table is a tab-delimited listing of internal rule index, rule name, alias index for this rule, prefix, suffix, macro name, finder ('default' means they are found normally), and make state. After all of the rules are shown, aliases are listed with a null string for the internal index, the alias name, the index of the aliased rule, and the rule's "true" name. All of these values will reflect all options specified on the command line, which makes -dump-rules useful (if not exactly convenient) for figuring out why the makefile generator isn't doing what you expected with them.
-verbose
specifies verbose debugging output. This argument may be repeated for added effect (but the output volume tends to grow exponentially).

File-type-specific arguments

Each type of file (e.g. core or abbreviated DSSP files) has its own standard file naming convention, macro name, and default for whether or not it should be built for all loci. These are separately controllable through the following type-specific arguments, where each type may be replaced by any of the defined file types. These are listed in the table below, together with the default values for these options.
-make-type-files
specifies that type files should be made for all loci in the core list, except for those files explicitly mentioned in a -but-not or -input-files list. The defaults depend on which program was invoked (see the "Which files are constructed" section below).
-local-type-files
specifies that type files should be made locally for loci in the core list but only if they are required for constructing something else. The search path is not checked in this case. For most types, this is the default, but these options may be used to override a dependency file generator's implicit -make-type-files default (see the "Which files are constructed" section below). (These used to be documented as the -no-type-files arguments, but "local" makes more sense than "no" in most contexts, especially when considering these options in contrast with -path-type-files. The "no" versions are also supported, however.)
-path-type-files
specifies that type files should not be made for loci in the core list unless they are required for constructing something else and cannot be found on the search path. This is the default in make-mrf-depends.pl and make-vvenv-depends.pl for files constructed by make-core-depends.pl, since the score file generator do not expect to have to make (e.g.) core files.
-never-make-type-files
specifies that type files should never be made under any circumstances. [This should probably be the default for files we can't make. -- rgr, 8-May-98.] It is an error if such a file is ever needed (see the "Which files are constructed" section below).
-type-macro string
specifies the name of the macro and target for all files constructed for type.
-type-file-prefix string
-type-file-suffix string
specifies the standard prefix (or suffix) for all files of the type. All locus-specific file names are generating by concatenating a type-specific prefix, the locus name, and a type-specific suffix. The user must take care that the prefix/locus/suffix combinations are unique for all files.
-type-prefix string
-type-suffix string
same as -type-file-prefix and -type-file-suffix, respectively; included for backward compatibility with Release 1.0 arguments.

Defined file types

The following table contains the complete alphabetical listing of all file types known to any of the makefile generators, together with their default prefix, suffix, and macro name values, and whether they are made or sought on the search path (or both) by default. In cases where the default is different for different makefile generators, the value "***" is shown. (But note that, for instance, the edge-env and edge-score values apply only to
make-vvenv-depends.pl, since that is the only program that knows how to make them.)

[need to include constructor macro name. -- rgr, 9-May-98.] [should also tell which programs know how to make which files, since they don't know about files they don't need to make by default. -- rgr, 18-Sep-98.]

File Type File name prefix/suffix/macro args & defaults Description (with file format)
abbrev-dssp -abbrev-dssp-file-prefix ''
-abbrev-dssp-file-suffix '.ent.out'
-abbrev-dssp-macro abbrev-dssp-files

-path-abbrev-dssp-files
Abbreviated DSSP file; file naming is based on the PDB entry.
GENERATE-DSSP = generate-dssp
core -core-file-prefix ''
-core-file-suffix '.core'
-core-macro core-files

-***-core-files
Core file.
MAKE-CORE = make-core.pl
MAKE-CORE = make-domain-core.pl
count-env -count-env-file-prefix 'vvenv_ce_'
-count-env-file-suffix '.dat'
-count-env-macro counting-environments

-local-count-env-files
Singleton environment file.
VV-COUNT-ENVS = vv-envs.pl -ss6 \
    -contact-defs ${contact-def-file}
counts -counts-file-prefix ''
-counts-file-suffix ***
-counts-macro core-counts

-local-counts-files
Counts file.
MRF-COUNTS = mrf-counts
edge-env -edge-env-file-prefix 'vvenv_ee_'
-edge-env-file-suffix '.dat'
-edge-env-macro edge-environments

-make-edge-env-files
Edge environment file (pairwise environment format).
VVENV-PAIR-SCORES = vvenv-pair-scores.pl -ss6 -contact-defs ${contact-def-file}
edge-score -edge-score-file-prefix 'vvenv_es_x_'
-edge-score-file-suffix '.dat'
-edge-score-macro edge-scores

-make-edge-score-files
Edge score file (pairwise score format).
VVENV-PAIR-SCORES = vvenv-pair-scores.pl -ss6 -contact-defs ${contact-def-file}
exposure -exposure-file-prefix ''
-exposure-file-suffix '.nexp'
-exposure-macro exposure-files

-***-exposure-files
Eisenberg "fat ALA" exposure (.nexp) file.
GENERATE-EXPOSURE = efa.pl
GENERATE-EXPOSURE = generate-exposure
gmt-env -gmt-env-file-prefix 'singleton_environments_MRF_'
-gmt-env-file-suffix '.dat'
-gmt-env-macro gmt-environments

-***-gmt-env-files
GMT environment file (singleton environment format).
MRF-GMT-ENVS = mrf-envs
gmt-score -gmt-score-file-prefix 'singleton_scores_x_MRF_'
-gmt-score-file-suffix '.dat'
-gmt-score-macro gmt-scores

-***-gmt-score-files
GMT score file (singleton score format).
MRF-GMT-SCORES = mrf-scores \
    -gmt-marginal-file mrf.msd \
    -min-pair-count 4
hyperenv -hyperenv-file-prefix 'env_'
-hyperenv-file-suffix '.pair'
-hyperenv-macro line-of-sight-files

-***-hyperenv-files
Line-of-sight pairwise contact information (for VVenv); file naming is based on the PDB entry and chain ID.
LOS = los.pl -use-old-format
loop-score -loop-score-file-prefix ***
-loop-score-file-suffix '.dat'
-loop-score-macro loop-scores

-make-loop-score-files
Loop score file.
MRF-LOOP-SCORES = mrf-scores -poisson -normalize
pairwise-env -pairwise-env-file-prefix ***
-pairwise-env-file-suffix '.dat'
-pairwise-env-macro pairwise-environments

-local-pairwise-env-files
Pairwise environment file.
MRF-PAIR-ENVS = mrf-envs
pairwise-score -pairwise-score-file-prefix ***
-pairwise-score-file-suffix '.dat'
-pairwise-score-macro pairwise-scores

-***-pairwise-score-files
Pairwise score file.
MRF-PAIR-SCORES = mrf-scores -pair-poisson 1
pdb -pdb-file-prefix ''
-pdb-file-suffix '.ent'
-pdb-macro pdb-files

-path-pdb-files
PDB file.
seg -seg-file-prefix ''
-seg-file-suffix '.dssp'
-seg-macro segment-files

-***-seg-files
Segment definition file.
MAKE-SS-DESIGNATIONS = make-ss-designations
seq -seq-file-prefix ''
-seq-file-suffix '.seq'
-seq-macro sequence-files

-***-seq-files
Sequence (IG) file.
MAKE-SEQ-FILE = make-seq-file.pl
MAKE-SEQ-FILE = pdb-domain-seq.pl
singleton-env -singleton-env-file-prefix 'mrf-se-10efa-2ss-'
-singleton-env-file-suffix '.dat'
-singleton-env-macro singleton-environments

-local-singleton-env-files
Singleton environment file.
MRF-SING-ENVS = mrf-envs
MRF-SING-ENVS = sing-envs.pl
singleton-score -singleton-score-file-prefix ***
-singleton-score-file-suffix '.dat'
-singleton-score-macro singleton-scores

-make-singleton-score-files
Singleton score file.
MRF-SING-SCORES = mrf-scores -poisson -normalize
vv-data -vv-data-file-prefix ''
-vv-data-file-suffix '_vv'
-vv-data-macro vv-data-files

-***-vv-data-files
Raw "line-of-sight" pairwise contact information (for VVenv); file naming is based on the PDB entry and chain ID.
VV-DATA = calculate-vv-all ${NORMALISATION} -distance ${DISTANCE} . . .
vv-singleton -vv-singleton-file-prefix ''
-vv-singleton-file-suffix '_bb.singl'
-vv-singleton-macro vv-singleton-files

-***-vv-singleton-files
VVenv singleton definitions; file naming is based on the PDB entry and chain ID.
VV-BB-SINGL = vv-bb-singl.pl

[There is currently no way of specifying the total counts file name -- it is always the standard counts file name constructed with a locus of "total", e.g. total-mrf.cnt or total-vvenv.cnt. -- rgr, 27-Jun-97.]

The -search-path feature

The -search-path feature allows the dependency file generators to figure out where scattered prerequisite files live without having to explicitly enumerate their locations individually. The only catch is that the files must already exist. For example, specifying
    -search-path ../cores/ ../seqs/
will tell the dependency file generator to look for the file 1foo.seq first as ../cores/1foo.seq, and then as ../seqs/1foo.seq (where one would imagine it would be found). If not found anywhere, the file must be made in the current directory. Relative pathnames are preserved, and appear in the generated makefile as shown, which of course means that "make depend" needs to run in the same directory as "make" so it can find the right relative files. Because of the file naming conventions, the same path can be used to search for all kinds of files (just don't leave random sequence files lying around in the ../cores directory).

The search path is only used if the dependency file generator has the option of searching. If -make-foo-files, -never-make-foo-files, or -local-foo-files has been specified (indeed, anything but -path-foo-files), then these files are never sought on the search path.

Only one generator per directory

The dependency file generators expect that each directory is dedicated to a single makefile generator, which creates all targets for automatically-generated files in a single invocation. This greatly simplifies things, because just as each program assumes that all files in other diretories are not to be touched, it can also assume that all files in the current directory are up for grabs.

This is why it is not desirable to have a dependency file generator search for files in the current directory. All files in the current directory should either be constructed explicitly (via program-generated rules, or human-written rules for exceptional files identified to the dependency file generator with -but-not), or declared explicitly as input files (via the -input-files option). See the "Exceptional cases" section below for more details. [NB: This is not compatible with versions prior to Release 1.1, which always searched the current directory. -- rgr, 6-May-98.]

Otherwise, if files are found in the current directory, then doing

    make depend
    make
    make depend
will give different results between the first "make depend" (when any such file will not yet exist) and the second (when it will). If this is acceptable, then you are free to include "./" in the search path.

The -use-plus feature

[For technical reasons, this feature is no longer supported in the new versions of the dependency file generators. Limited support may be reintroduced in the future, but the need is not great, since -use-plus is mostly just an efficiency hack anyway. -- rgr, 24-Apr-98.]

The -use-plus option directs the dependency file generator to use the "+" feature of some make implementations. By saying


file1 + file2:  depends
	command-that-makes -out1 file1 -out2 file2 -from depends
one can tell make that the two "target" files (file1 and file2) are made at the same time from the same "dependencies" (the depends file(s)) by a single invocation of the command. This saves a certain amount of redundant computation (about a third) when using a make that supports this feature. The Sun Microsystems version supports this feature, but the GNU gmake program surprisingly does not. (And that's all I know about other make implementations. -- rgr, 12-Feb-97.)

Which files are constructed

Given a locus and file type, the dependency file generator uses the following algorithm to decide whether and how to construct a target:
  1. First, the raw file name is constructed by concatenating the standard prefix, the core locus, and the standard suffix. (This is modified somewhat for certain file types; see the "Naming of other files" section.) Either prefix or suffix may be empty (the null string), but neither may contain slashes. The resulting file name therefore does not specify a directory.
  2. If the file is on either of the -input-files or -but-not lists, we assume it exists (or will exist) in the current directory (see below for details).
  3. If we have been allowed to find the file, look for it in the search path. If found, we'll use that.
  4. If not found in the previous step, then we have to make it.
    1. If we've been told never to make files of this type, then print a "Can't find 'filename'" message, and don't generate a target (but keep going, rather than dying immediately).
    2. If we don't know how to make files of this type (e.g. PDB files), then print a "Can't find 'filename'; assuming it's local" message, and don't generate a target.
    3. Otherwise, generate a target for this file.
    In the latter two cases, also put the file on the list of targets for this file type.

Exceptional cases

The -but-not and -input-files options list files that are to be treated as special cases by the dependency generator. In neither case do we generate a target, even when the file is required for something else. The distinction between the two is that for the -but-not files, we do put the file on the list of targets for this file type. Therefore, those specified with -input-files are considered to be unchanging -- truly inputs -- and the -but-not files are those that are made by explicit targets elsewhere in the makefile, presumably because they were constructed by hand.

As mentioned above in the "Only one generator per directory" section, each dependency generator expects to be completely responsible for a single directory. This can be relaxed somewhat via appropriate use of the -input-files option. Two makefile generators can coexist in the same directory if the second generator only uses input files from the first generator that are mentioned in an -input-files list (and the first generator must not use any files from the second, of course). One can use the macro naming all files of that type produced by the first generator in the -input-files argument to the second. There is no reason why using two makefile generators in the same directory would be necessary in ordinary circumstances, but a user might wish to write his or her own makefile generator for a new file type, or for a new method of using (e.g.) secondary structure information in core generation.

[note that -but-not files can be specified for files that cannot ordinarily be made (e.g. PDB files), in which case there will be a macro and target defined for such file types. This makes it possible to write a target by hand that produces a "fixed" copy of a PDB file from a buggy original; the "${pdb-files}" macro can be put in a "clean" target, and the copy will then behave as a normal intermediate file. -- rgr, 8-May-98.]

Targets and macros

For each file type for which a target was generated (or used for a -but-not file), the dependency file generators emit a macro and a target which includes all of the type files. For instance, if any core files are generated, the "core-files" macro may be used anywhere in the makefile (or its include files) to name all such core files, and building the target (e.g. "make core-files" at the command line) will construct all core files. All macros include only those files in the current directory for which targets have been created; some may therefore be empty if all such files were found on the search path. It is therefore safe to use these macros in a "make clean" target.

MRF-specific dependency generator arguments

This section describes arguments particular to
make-mrf-depends.pl and make-vvenv-depends.pl that are not shared by the other dependency generators. Predictably, they have to do with score and environment generation.
-warn-if-no-hlg-file
-xval-sets
[this is in cross-validation set file format. -- rgr, 27-Apr-99.]
-make-no-xval
-no-xval-only
[***here***: finish these. -- rgr, 15-Oct-98.]
-use-links
specifies that all score files for cores within a cross-validation set should be shared by making them once for the cross-validation set, then linking them to the various core-specific file names for members of the cross-validation set. If not specified (the default), singleton, loop, and pairwise score files for each cross-validation set member are made independently, even though they will have identical contents. See the "Sharing cross-validation set score files" section.
-use-vvenv-exposure
uses visible volume (instead of EFA) exposure. Not supported outside of BMERC. [***finish***: flesh out, needs hyperlinks. -- rgr, 18-Sep-98.]

Cross-validation

Cross-validation sets are allowed to have members that do not appear on the core list. In that case, make-mrf-depends.pl and make-vvenv-depends.pl will insist on finding that core's counts file on the search path. If the file is not found, a warning message is generated, and the locus is ignored.

In order to compute scores, the individual core counts files for each of the cross-validation sets is summed together. Then, a "total counts" file produced by adding all of the cross-validation sets together with all the counts for cores not belonging to any cross-validation set; the resulting sum is over the union of all cores in the core list file and the cross-validation sets file. Finally, for a given core (or cross-validation set), one computes the score files using the difference between the total counts and that core's (resp., cross-validation set's) counts.

Sharing cross-validation set score files

What should happen for cross-validation sets is that the pairwise, "true" singleton, and loop scores files should be shared. The code doesn't do this by default, but if you supply the -use-links option to make-mrf-depends.pl or make-vvenv-depends.pl, it will make each cross-validation set's pair, true singleton, and loop files once, and create links to them for each member of the set. GMT scores and filtered pairwise scores, since they depend on each residues' particular set of pairwise arcs, must still be made individually.


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Thu Apr 6 16:08:26 EDT 2000