BMERC : needle tools : Introduction : Dependency files
[ make-vvenv-depends.pl is being documented here for completeness, even though it may not be distributed as part of needle tools for some time. -- rgr, 24-Apr-98.]
For most purposes, a makefile consists of five kinds of lines:
Macro, dependency, and command lines may be continued by putting a
backslash ("\") character immediately before the end-of-line.
Macro definitions continued this way are turned into single lines by
substituting the backslash, newline, and any leading whitespace on the
next line with a single space character. Command lines continued this
way are executed (by a subshell) as a unit, but the backslashes and
newlines are kept intact.
Comment lines start with "#" in the first column, and
run to the end of the line. They may not be continued with
backslashes.
Macro definition lines are of the form
For example, a macro defined using
The convention used by the needle tools code is that macros
used for lists of file names are in lower case, and macros used for
command invocations are in upper case.
Dependency lines define which files (the dependencies) are
used to construct a given target, and are of the form:
Command lines start with a tab character and must appear
directly after a dependency line. If continued onto multiple lines via
backslashes, each continuation line must also start with a tab.
(Command lines are not strictly required, since make has
implicit rules for constructing object files from source files for a
variety of compiler languages, but none of these rules apply to
needle files.) Multiple command lines may be specified for a
single target; the target is considered to have been successfully built
only if all commands return zero status when executed.
Note that there need not be any commands at all. Furthermore, the
target need not correspond to a file (which means that the commands are
not bound to create one). This is useful for creating shorthand
targets, such as the "all" target in the example in the next section.
Here is an example taken from the "Making
the dependency file" section of the "MRF
Generation Examples" page:
Include lines are of the form
Additionally, the program can be supplied with extra options that
influence its behavior.
Since the first target is built by default, it is traditionally called
"all", and is typically a shorthand target for building all of
the interesting targets of which the makefile is capable. For
example, the following "all" target is suitable for a core
makefile:
Other traditional targets are "clean", which deletes the
files that are made, presumably in order to remake them again, and
"install", which, for a sofware package such as needle
tools, causes the newly built programs to be moved to the correct place
in the system so that they may be used by users other than the person
who constructed the system. The clean target is usually
implemented with "rm -f ${files}", where
"${files}" is a macro (or series of macros) that names all of
the files to be deleted. The install target is not necessary
for needle tools files, since they are used "in place."
The dependency file generator programs (
make-core-depends.pl,
make-mrf-depends.pl, and
make-vvenv-depends.pl) support a large set of options, so
that most aspects of dependency file generation can be customized.
These arguments break down into three categories:
[need to include constructor macro name. -- rgr, 9-May-98.] [should
also tell which programs know how to make which files, since they don't
know about files they don't need to make by default. -- rgr,
18-Sep-98.]
name = value
Whitespace before the name and around the "=" is not
significant. Environment variables are implicitly defined as macros.
make predefines a few macros such as "CC", which
invokes the C compiler; most of these are standard across different
versions, but some are not. If a given name would be defined by more
than one of these ways, then explicitly defined macros take precedence
over environment variable values, which in turn take precedence over
"built-in" macros.
search-path = ../foo/ /usr/bar/ /usr/baz/
may be invoked with "${search-path}" or
"$(search-path)", and so
random-target: ${search-path}
make-core-depends.pl -search-path ${search-path} . . .
is understood as
random-target: ../foo/ /usr/bar/ /usr/baz/
make-core-depends.pl -search-path ../foo/ /usr/bar/ /usr/baz/ . . .
Macros that are not defined expand to the null string when invoked,
without causing an error.
target: dependency1 dependency2 . . .
In the example above, make would
decide that the random-target file was out of date and needed
to be reconstructed if it was missing, or if one or more of its
dependencies -- foo, bar, and baz -- was
newer (i.e. had a more recent file write date). In that case, make
would execute the associated command lines.
1bmtA-mrf.cnt: mrf-se-10efa-2ss-1bmtA.dat \
pairwise_environments_reference_1bmtA.dat ../cores/1bmtA.core
${MRF-COUNTS} -pair-env-file pairwise_environments_reference_1bmtA.dat \
-sing-env-file mrf-se-10efa-2ss-1bmtA.dat -core-file \
../cores/1bmtA.core -sequence-file ../cores/1bmtA.seq \
-write-counts 1bmtA-mrf.cnt
The dependency line (split into two physical lines for readability)
states that 1bmtA-mrf.cnt is to be made from
mrf-se-10efa-2ss-1bmtA.dat,
pairwise_environments_reference_1bmtA.dat, and
../cores/1bmtA.core. The command line immediately following
(also continued across several physical lines) dictates how; if
make determines that 1bmtA-mrf.cnt needs to be
rebuilt, then it will execute that command after replacing
"${MRF-COUNTS}" with the current value of the
MRF-COUNTS macro.
include filename
The first eight characters on the line must be "include "
(note the space), and the file name must occupy the rest of the line.
The named file is read as if it had been inserted in the
makefile in place of the include line.
Macros used to invoke commands
One broadly-used makefile convention is to employ macros to
supply command names and standard options. For example, the following
standard definition causes
make-mrf-depends.pl to use the sing-envs.pl script for
computing singleton environments.
MRF-SING-ENVS = sing-envs.pl
This is because make-mrf-depends.pl generates singleton
environment targets that look something like this:
mrf-se-f10vv-2ss-1aba.dat: ../cores/1aba.core ../vv-data/1aba_bb.singl \
../cores/1aba.seq
${MRF-SING-ENVS} -sing-env-file mrf-se-f10vv-2ss-1aba.dat -core-file \
../cores/1aba.core -sequence-file ../cores/1aba.seq \
-exposure-file ../vv-data/1aba_bb.singl
Instead, one can use any program that accepts the same pattern of
arguments and computes the same result (more or less) using the same
file formats, such as
mrf-envs:
MRF-SING-ENVS = mrf-envs
In order for this target to work, something must be assigned to this
macro so that expanding "${MRF-SING-ENVS}" results in a
complete command.
MRF-SING-ENVS = sing-envs.pl ${sing-env-opts}
Such macro-supplied options always appear before the file name options
produced by the dependency generator; it is not possible to reorder or
alter these (unless, of course, you write a script that parses all the
options, massages them, and passes them on to the appropriate program).
How make works
make operates by reading in the entire makefile and
all included files (including a system-defined file of standard
definitions), and then goes about constructing the target or targets
named on the command line, or the first target in the file if none were
given. In order to build a target, make first ensures that its
dependencies (if any) are up to date by building them recursively. If
the target file itself is not present, or any dependency file needed
rebuilding, or the target file is older than any of the dependency
files, then make considers the target out-of-date and would
execute the command lines associated with the target.
all: core-files segment-files sequence-files
Invoking "make" (or explicitly "make all") will
simply cause each of the dependency targets -- core-files,
segment-files, and sequence-files -- to be built.
These are shorthand targets for the three classes of files for which
make-core-depends.pl has presumably been told to generate
dependencies; each of them in turn depends on the actual files, which
are then created. There are no additional commands to execute for this
target, and make does not complain about the lack of a file
named "all" in the current directory.
Dependency file generator arguments
In any case, most options can be safely ignored on the first pass, since
using the defaults works well for most cases. The important arguments
are -core-list-file, which is required to identify the model
set; and -search-path, which allows the dependency file
generators to find files in other directories.
Standard arguments to dependency file generators
File-type-specific arguments
Each type of file (e.g. core or abbreviated DSSP files) has its own
standard file naming convention, macro name, and default for whether or
not it should be built for all loci. These are separately controllable
through the following type-specific arguments, where each type
may be replaced by any of the defined file types. These are listed in
the table below, together with the default values for these options.
Defined file types
The following table contains the complete alphabetical listing of all
file types known to any of the makefile generators, together with their
default prefix, suffix, and macro name values, and whether they are made
or sought on the search path (or both) by default. In cases where the
default is different for different makefile generators, the value
"***" is shown. (But note that, for instance, the
edge-env and edge-score values apply only to make-vvenv-depends.pl,
since that is the only program that knows how to make them.)
| File Type | File name prefix/suffix/macro args & defaults | Description (with file format) |
|---|---|---|
| abbrev-dssp | -abbrev-dssp-file-prefix '' -abbrev-dssp-file-suffix '.ent.out' -abbrev-dssp-macro abbrev-dssp-files -path-abbrev-dssp-files |
Abbreviated DSSP file; file naming is based on the PDB entry. |
| GENERATE-DSSP = generate-dssp | ||
| core | -core-file-prefix '' -core-file-suffix '.core' -core-macro core-files -***-core-files |
Core file. |
| MAKE-CORE = make-core.pl MAKE-CORE = make-domain-core.pl | ||
| count-env | -count-env-file-prefix 'vvenv_ce_' -count-env-file-suffix '.dat' -count-env-macro counting-environments -local-count-env-files |
Singleton environment file. |
VV-COUNT-ENVS = vv-envs.pl -ss6 \
-contact-defs ${contact-def-file}
| ||
| counts | -counts-file-prefix '' -counts-file-suffix *** -counts-macro core-counts -local-counts-files |
Counts file. |
| MRF-COUNTS = mrf-counts | ||
| edge-env | -edge-env-file-prefix 'vvenv_ee_' -edge-env-file-suffix '.dat' -edge-env-macro edge-environments -make-edge-env-files |
Edge environment file (pairwise environment format). |
| VVENV-PAIR-SCORES = vvenv-pair-scores.pl -ss6 -contact-defs ${contact-def-file} | ||
| edge-score | -edge-score-file-prefix 'vvenv_es_x_' -edge-score-file-suffix '.dat' -edge-score-macro edge-scores -make-edge-score-files |
Edge score file (pairwise score format). |
| VVENV-PAIR-SCORES = vvenv-pair-scores.pl -ss6 -contact-defs ${contact-def-file} | ||
| exposure | -exposure-file-prefix '' -exposure-file-suffix '.nexp' -exposure-macro exposure-files -***-exposure-files |
Eisenberg "fat ALA" exposure (.nexp) file. |
| GENERATE-EXPOSURE = efa.pl GENERATE-EXPOSURE = generate-exposure | ||
| gmt-env | -gmt-env-file-prefix 'singleton_environments_MRF_' -gmt-env-file-suffix '.dat' -gmt-env-macro gmt-environments -***-gmt-env-files |
GMT environment file (singleton environment format). |
| MRF-GMT-ENVS = mrf-envs | ||
| gmt-score | -gmt-score-file-prefix 'singleton_scores_x_MRF_' -gmt-score-file-suffix '.dat' -gmt-score-macro gmt-scores -***-gmt-score-files |
GMT score file (singleton score format). |
MRF-GMT-SCORES = mrf-scores \ -gmt-marginal-file mrf.msd \ -min-pair-count 4 | ||
| hyperenv | -hyperenv-file-prefix 'env_' -hyperenv-file-suffix '.pair' -hyperenv-macro line-of-sight-files -***-hyperenv-files |
Line-of-sight pairwise contact information (for VVenv); file naming is based on the PDB entry and chain ID. |
| LOS = los.pl -use-old-format | ||
| loop-score | -loop-score-file-prefix *** -loop-score-file-suffix '.dat' -loop-score-macro loop-scores -make-loop-score-files |
Loop score file. |
| MRF-LOOP-SCORES = mrf-scores -poisson -normalize | ||
| pairwise-env | -pairwise-env-file-prefix *** -pairwise-env-file-suffix '.dat' -pairwise-env-macro pairwise-environments -local-pairwise-env-files |
Pairwise environment file. |
| MRF-PAIR-ENVS = mrf-envs | ||
| pairwise-score | -pairwise-score-file-prefix *** -pairwise-score-file-suffix '.dat' -pairwise-score-macro pairwise-scores -***-pairwise-score-files |
Pairwise score file. |
| MRF-PAIR-SCORES = mrf-scores -pair-poisson 1 | ||
| pdb | -pdb-file-prefix '' -pdb-file-suffix '.ent' -pdb-macro pdb-files -path-pdb-files |
PDB file. |
| seg | -seg-file-prefix '' -seg-file-suffix '.dssp' -seg-macro segment-files -***-seg-files |
Segment definition file. |
| MAKE-SS-DESIGNATIONS = make-ss-designations | ||
| seq | -seq-file-prefix '' -seq-file-suffix '.seq' -seq-macro sequence-files -***-seq-files |
Sequence (IG) file. |
| MAKE-SEQ-FILE = make-seq-file.pl MAKE-SEQ-FILE = pdb-domain-seq.pl | ||
| singleton-env | -singleton-env-file-prefix 'mrf-se-10efa-2ss-' -singleton-env-file-suffix '.dat' -singleton-env-macro singleton-environments -local-singleton-env-files |
Singleton environment file. |
| MRF-SING-ENVS = mrf-envs MRF-SING-ENVS = sing-envs.pl | ||
| singleton-score | -singleton-score-file-prefix *** -singleton-score-file-suffix '.dat' -singleton-score-macro singleton-scores -make-singleton-score-files |
Singleton score file. |
| MRF-SING-SCORES = mrf-scores -poisson -normalize | ||
| vv-data | -vv-data-file-prefix '' -vv-data-file-suffix '_vv' -vv-data-macro vv-data-files -***-vv-data-files |
Raw "line-of-sight" pairwise contact information (for VVenv); file naming is based on the PDB entry and chain ID. |
| VV-DATA = calculate-vv-all ${NORMALISATION} -distance ${DISTANCE} . . . | ||
| vv-singleton | -vv-singleton-file-prefix '' -vv-singleton-file-suffix '_bb.singl' -vv-singleton-macro vv-singleton-files -***-vv-singleton-files |
VVenv singleton definitions; file naming is based on the PDB entry and chain ID. |
| VV-BB-SINGL = vv-bb-singl.pl |
[There is currently no way of specifying the total counts file name
-- it is always the standard counts file name constructed with a locus
of "total", e.g. total-mrf.cnt or
total-vvenv.cnt. -- rgr, 27-Jun-97.]
The search path is only used if the dependency file generator has the
option of searching. If -make-foo-files,
-never-make-foo-files, or -local-foo-files has been
specified (indeed, anything but
-path-foo-files), then these files are never sought on the
search path.
This is why it is not desirable to have a dependency file
generator search for files in the current directory. All files in the
current directory should either be constructed explicitly (via
program-generated rules, or human-written rules for exceptional files
identified to the dependency file generator with -but-not), or
declared explicitly as input files (via the -input-files
option). See the "Exceptional
cases" section below for more details. [NB: This is not
compatible with versions prior to Release 1.1, which always searched the
current directory. -- rgr, 6-May-98.]
Otherwise, if files are found in the current directory, then doing
The -use-plus option directs the dependency file generator
to use the "+" feature of some make implementations. By saying
As mentioned above in the "Only one
generator per directory" section, each dependency generator expects
to be completely responsible for a single directory. This can be
relaxed somewhat via appropriate use of the -input-files
option. Two makefile generators can coexist in the same directory if
the second generator only uses input files from the first generator that
are mentioned in an -input-files list (and the first generator
must not use any files from the second, of course). One can use the macro naming all files of that
type produced by the first generator in the -input-files
argument to the second. There is no reason why using two
makefile generators in the same directory would be necessary in
ordinary circumstances, but a user might wish to write his or her own
makefile generator for a new file type, or for a new method of
using (e.g.) secondary structure information in core generation.
[note that -but-not files can be specified for files that
cannot ordinarily be made (e.g. PDB files), in which case there will be
a macro and target defined for such file types. This makes it
possible to write a target by hand that produces a "fixed" copy of a PDB
file from a buggy original; the "${pdb-files}" macro can be put
in a "clean" target, and the copy will then behave as a normal
intermediate file. -- rgr, 8-May-98.]
In order to compute scores, the individual core counts files for each
of the cross-validation sets is summed together. Then, a "total counts"
file produced by adding all of the cross-validation sets together with
all the counts for cores not belonging to any cross-validation set; the
resulting sum is over the union of all cores in the core list file and
the cross-validation sets file. Finally, for a given core (or
cross-validation set), one computes the score files using the difference
between the total counts and that core's (resp., cross-validation set's)
counts.
The -search-path feature
The -search-path feature allows the dependency file generators
to figure out where scattered prerequisite files live without having to
explicitly enumerate their locations individually. The only catch is
that the files must already exist. For example, specifying
-search-path ../cores/ ../seqs/
will tell the dependency file generator to look for the file
1foo.seq first as ../cores/1foo.seq, and then as
../seqs/1foo.seq (where one would imagine it would be found).
If not found anywhere, the file must be made in the current directory.
Relative pathnames are preserved, and appear in the generated
makefile as shown, which of course means that
"make depend" needs to run in the same directory as
"make" so it can find the right relative files. Because of the
file naming conventions, the same path can be used to search for all
kinds of files (just don't leave random sequence files lying around in
the ../cores directory).
Only one generator per directory
The dependency file generators expect that each directory is dedicated
to a single makefile generator, which creates all targets for
automatically-generated files in a single invocation. This greatly
simplifies things, because just as each program assumes that all files
in other diretories are not to be touched, it can also assume that all
files in the current directory are up for grabs.
make depend
make
make depend
will give different results between the first
"make depend" (when any such file will not yet exist) and
the second (when it will). If this is acceptable, then you are free to
include "./" in the search path.
The -use-plus feature
[For technical reasons, this feature is no longer supported in the new
versions of the dependency file generators. Limited support may be
reintroduced in the future, but the need is not great, since
-use-plus is mostly just an efficiency hack anyway. -- rgr,
24-Apr-98.]
file1 + file2: depends
command-that-makes -out1 file1 -out2 file2 -from depends
one can tell make that the two "target" files (file1
and file2) are made at the same time from the same
"dependencies" (the depends file(s)) by a single invocation of
the command. This saves a certain amount of redundant computation
(about a third) when using a make that supports this feature.
The Sun Microsystems version supports this feature, but the GNU
gmake program surprisingly does not. (And that's all I know
about other make implementations. -- rgr, 12-Feb-97.)
Which files are constructed
Given a locus and file type, the dependency file generator uses the
following algorithm to decide whether and how to construct a target:
In the latter two cases, also put the file on the list of targets
for this file type.
Exceptional cases
The -but-not and -input-files options list files that
are to be treated as special cases by the dependency generator. In
neither case do we generate a target, even when the file is required for
something else. The distinction between the two is that for the
-but-not files, we do put the file on the list of targets for
this file type. Therefore, those specified with -input-files
are considered to be unchanging -- truly inputs -- and the
-but-not files are those that are made by explicit targets
elsewhere in the makefile, presumably because they were
constructed by hand.
Targets and macros
For each file type for which a target was generated (or used for a -but-not file), the
dependency file generators emit a macro and a target which includes all
of the type files. For instance, if any core files are
generated, the "core-files" macro may be used anywhere in the
makefile (or its include files) to name all such core files, and
building the target (e.g. "make core-files" at the command
line) will construct all core files. All macros include only those
files in the current directory for which targets have been created; some
may therefore be empty if all such files were found on the search path.
It is therefore safe to use these macros in a "make clean"
target.
MRF-specific dependency generator arguments
This section describes arguments particular to
make-mrf-depends.pl and
make-vvenv-depends.pl that are not shared by the other
dependency generators. Predictably, they have to do with score and
environment generation.
Cross-validation
Cross-validation sets are allowed to have members that do not appear
on the core list. In that case, make-mrf-depends.pl and
make-vvenv-depends.pl will insist on finding that core's counts
file on the search path. If the file is not found, a warning message is
generated, and the locus is ignored.
Sharing cross-validation set score files
What should happen for cross-validation sets is that the pairwise,
"true" singleton, and loop scores files should be shared. The code
doesn't do this by default, but if you supply the -use-links
option to
make-mrf-depends.pl or
make-vvenv-depends.pl, it will make each cross-validation
set's pair, true singleton, and loop files once, and create links to
them for each member of the set. GMT scores and filtered pairwise
scores, since they depend on each residues' particular set of pairwise
arcs, must still be made individually.
Bob Rogers
<rogers@darwin.bu.edu>
Last modified: Thu Apr 6 16:08:26 EDT 2000