Exposure computation programs

BMERC : needle tools : Programs : Exposure programs


The programs documented below produce Eisenberg "fat alanine" exposure file in .nexp (exposure) file format from PDB atomic coordinates. The original version, generate-exposure, has a fixed argument pattern and implements "standard" EFA exposure. efa.pl is a newer version that does the same thing by default, but provides more options that allow different variations on the EFA theme.

Table of contents

  1. Exposure computation exposures
    1. Table of contents
    2. efa.pl
    3. generate-exposure


efa.pl

The efa.pl script (for "Eisenberg fat alanine") takes a full (i.e. loops intact) PDB file, and produces a file describing the Eisenberg "fat alanine" exposure value for each residue. [need reference. -- rgr, 20-Dec-96.] For example, the command
    efa.pl < 2mhr.ent > 2mhr.nexp
writes 2mhr.nexp in the current directory, in the
.nexp (exposure) file format, which is the format the mrf-envs program expects.

Usage:

    efa.pl [-locus locus] [-pdb-file pdb-file-name] [-chain L]
		[-output-file output-file-name]
		[-dssp-file dssp-file-name] [-verbose]
		[-radius [ss] atom res radius]
		[-default-radius atom radius]

Arguments:

-pdb-file pdb-file-name
name of the PDB file. Only the PDB ATOM records are actually used. If not specified, the atom records are read from the standard input.
-output-file output-file-name
name of the output file, or "-" for the standard output. The default depends on the -locus argument; if not specified, the result is written to the standard output. If -locus is specified, for backward compatibility with generate-exposure, the output is written to a file called "locus.nexp".
-dssp-file dssp-file-name
name of the DSSP file. This argument is only required if there are exceptional radii that depend on secondary structure state.
-chain chain-spec
compute exposure only for the specified chain or chains in isolation; defaults to "all" to get all chains. If not the token "all" (in lower case), the chain-spec must be a comma-separated list of chain IDs in uppercase, followed by optional residue subranges (see the "Core chain specification syntax" section for detailed syntax). This argument is passed directly to the filter-pdb-atoms.pl script.
-radius [ ss ] atom res radius
defines an exceptional radius value to use for specified atoms, where: Note that ss, atom, and res are all case-sensitive, and must be in upper case to have an effect. The -radius argument may be repeated to define multiple exceptions. [This argument is intended to correspond to the -atom-radius argument of calculate-vv-all; one or both may need to change in order to bring them closer. -- rgr, 8-Jun-99.]
-default-radius atom radius
defines the standard radius value to use for "unexceptional" atoms, where: See below for a table of default values.
-locus locus
optional locus name. This is an arbitrary string that is put in warning messages, and used for constructing the default output file name.
-verbose
If specified, the list of exceptional radii is written to the standard error stream. If an error happens, the temporary files created by efa.pl are not deleted. And output from the hydro2 script is output live and unconditionally; normally it is saved up and output only if the script fails to produce output. If specified twice, it enables debugging output in addition to the above.

The first thing that efa.pl does is to pass the PDB data through filter-pdb-atoms.pl in order to standardize atom variants and catch anomalies, not to mention converting all residues to alanine and "hallucinating" beta carbons for native glycines. Accordingly, you may see error messages at the top of the transcript, such as those below:


    gamow% efa.pl -locus 3rub -pdb-file 3rub.ent
    filter-pdb-atoms.pl: 3rub.ent:  Chain break: 10.3110037338758 A between
    THR L  63  and VAL L  69 .
    filter-pdb-atoms.pl: 3rub.ent:  Missing beta carbon for MET L 405 .
    filter-pdb-atoms.pl: 3rub.ent:  Missing backbone atom  CA  for ASN L 468 
    filter-pdb-atoms.pl: 3rub.ent:  Missing backbone atom  C   for ASN L 468 
    filter-pdb-atoms.pl: 3rub.ent:  Missing backbone atom  O   for ASN L 468 
    filter-pdb-atoms.pl: 3rub.ent:  Missing beta carbon for ASN L 468 .
    . . .
If the -chain argument was specified, filter-pdb-atoms.pl will also extract the specified chain(s) or chain subrange(s). See the filter-pdb-atoms.pl documentation for more details.

After filtering, efa.pl next determines which atom radius to use for each selected atom, by using the following precedence rules.

  1. If ss is specified in a -radius specification, then an atom will be assigned that radius only if the secondary structure, atom name, and residue all match.
  2. If ss is omitted from a -radius specification, then that radius is used as the default for that atom/residue combination where the secondary structural state of the actual residue is undefined or does not match any of the explicitly specified exceptions.
  3. If no such exception exists, then any value specified by -default-radius for that species is used.
  4. Finally, if there was no -default-radius for that species, efa.pl uses the appropriate value from the table below.

By default, the exception "-radius CB ALA 2.1" is built in; this is the "fat" in "fat alanine", since the standard carbon radius is 1.9Å. One may explicitly specify "-radius CB ALA 1.9" to reinstate the standard value (obtaining "nonfat alanine", one presumes). [Using "-radius E CB ALA 2.5", which might be termed "high-fat alanine", may become the new standard. By contrast, the original recipe might better be called "lowfat alanine". -- rgr, 20-May-98.]

Note that if any exception specifies an ss, then the -dssp-file argument is required. Use "-dssp-file -" to read DSSP data from the standard input. Either "raw" abbreviated DSSP or "filtered" (dssp4.pl output in the default format) is acceptable. To use dssp4.pl, it is convenient to read the DSSP data (rather than the PDB file) from the standard input:


    gamow% dssp4.pl -clean 1rcf.ent.out \
		| efa.pl -radius E CB ALA 2.5 -dssp-file - \
			 -pdb-file 1rcf.ent > 1rcf.nexp

[Must add a note on the *.eng data files -- when I figure out what they mean. They are copied by the hydro script into the current directory, except for those that already exist, which allows some customization. -- rgr, 24-May-96.]

[Some programs used by efa.pl still insist that locus be exactly four characters long. For that reason, efa.pl uses the locus "tmp0" internally. This will become apparent if the script fails for any reason. -- rgr, 20-May-96.]

Default atom radii

These values are used when none of the exceptional values apply.

Atom Radius
C1.9
N1.7
O1.4
S1.8
P1.8
M1.7
I2.0

Compatibility with generate-exposure

efa.pl is backward compatible with generate-exposure; if you give it the same args, you get the same thing:

    gamow% efa.pl 1rcf 1rcf.ent
    gamow% cmp 1rcf.nexp 1rcf.nexp.orig
    gamow% 
Notice how all of the old generate-exposure output has been silenced. If the code had encountered an error (or if the -verbose option had been given), some of it would have been echoed to the standard error stream. In the normal course of events, this output is discarded.

Instead of specifying the locus and PDB file name arguments positionally, it is preferable to give them as keyword arguments:

    gamow% efa.pl -locus 1rcf -pdb-file 1rcf.ent
Done this way, either can be omitted. If the locus is omitted, the default for output is to write the standard output. If the PDB file name is omitted, it is read from the standard input. The following is therefore equivalent (except for the presence of the locus in any error messages):
    gamow% efa.pl > 1rcf.nexp < 1rcf.ent
It is also more readable, since it doesn't rely on hidden naming conventions.

Known bugs:

  1. *** If running on an extremely large core, some internal programs may blow cookies. For instance, the access program emits some obscure "INCREASE X" messages, then gets a segmentation violation when it tries to run anyway. When you increase the requested value, you then also have to increase the dimension of the array A, from which this FORTRAN program allocates its other data structures. (If you don't change the size of A, running it with just a new L, M, N, or ICT value, the program will kindly tell you what the new value for A should be before it blows up.)

  2. On the Alphas, the internal access program will sometimes die with the following obscure error message:
           forrtl: error (65): floating invalid
    
    The resulting exposure file will be incomplete (leading to further complaints). -- rgr, 18-Feb-98. [Fixed in Release 1.1. -- rgr, 8-Jun-98.]
  3. Sending the .nexp data to the standard output doesn't work; you just get the frens transcript. -- rgr, 20-May-98. [Fixed. -- rgr, 22-May-98.]


generate-exposure

The generate-exposure script takes the locus name and the full (i.e. loops intact) PDB file as its arguments, and produces a file describing the Eisenberg "fat alanine" exposure values for each residue. [need reference. -- rgr, 20-Dec-96.] For example, the command
    generate-exposure 2mhr 2mhr.ent
generates a file called 2mhr.nexp in the current directory, in the
.nexp (exposure) file format, which is the format mrf-envs program expects.

Note: The default behavior of efa.pl produces the same results, so generate-exposure is considered obsolescent.

Usage:

	generate-exposure locus pdb-file-name

Arguments:

locus (string)
this is just an arbitrary string that is used for constructing filenames. In particular, the result file is always called "locus.nexp".
pdb-file-name
name of the PDB file. Only the PDB ATOM records are actually used.

The first thing that generate-exposure does is to pass the PDB file through filter-pdb-atoms.pl in order to standardize atom variants and catch anomalies. Accordingly, you may see error messages at the top of the transcript, such as those below:

 
    generate-exposure 3rub 3rub.ent
    filter-pdb-atoms.pl: 3rub.ent:  Chain break: 10.3110037338758 A between
    THR L  63  and VAL L  69 .
    filter-pdb-atoms.pl: 3rub.ent:  Missing beta carbon for MET L 405 .
    filter-pdb-atoms.pl: 3rub.ent:  Missing backbone atom  CA  for ASN L 468 
    filter-pdb-atoms.pl: 3rub.ent:  Missing backbone atom  C   for ASN L 468 
    filter-pdb-atoms.pl: 3rub.ent:  Missing backbone atom  O   for ASN L 468 
    filter-pdb-atoms.pl: 3rub.ent:  Missing beta carbon for ASN L 468 .
    . . .
See the filter-pdb-atoms.pl documentation for more details.

[Must add a note on the *.eng data files -- when I figure out what they mean. They are copied by the hydro script into the current directory, except for those that already exist, which allows some customization. -- rgr, 24-May-96.]

[Some programs used by generate-exposure still insist that locus be exactly four characters long. For that reason, generate-exposure uses the locus "tmp0" internally. This will become apparent if the script fails for any reason. -- rgr, 20-May-96.]

Known bugs:

  1. *** If running on an extremely large core, some internal programs may blow cookies. For instance, the access program emits some obscure "INCREASE X" messages, then gets a segmentation violation when it tries to run anyway. When you increase the requested value, you then also have to increase the dimension of the array A, from which this FORTRAN program allocates its other data structures. (If you don't change the size of A, running it with just a new L, M, N, or ICT value, the program will kindly tell you what the new value for A should be before it blows up.)

  2. On the Alphas, the internal access program will sometimes die with the following obscure error message:
           forrtl: error (65): floating invalid
    
    The resulting exposure file will be incomplete (leading to further complaints). -- rgr, 18-Feb-98. [Fixed in Release 1.1. -- rgr, 8-Jun-98.]
  3. *** Sending the .nexp data to the standard output doesn't work. [fixed for efa.pl (q.v.). -- rgr, 22-May-98.


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Wed Dec 22 16:52:37 EST 1999