Core library construction examples

BMERC : needle tools : Introduction : Core library examples


This page gives examples showing how to use needle tools to build a core library. The makefile example is used as a test suite to verify the installation of needle tools.

For further documentation and instructions on how to obtain and install this software, please refer to the Overview of needle software page.

Table of contents

  1. Core library construction examples
    1. Table of contents
    2. Sample script
    3. Automatic generation of cores via makefiles
      1. The core list file
      2. The makefile
      3. Making the dependency file
      4. Making the core and sequence files
      5. Testing the core generation results

Sample script

[finish. -- rgr, 1-Jul-98.]

Automatic generation of cores via makefiles

For purposes of illustration, we will construct core and sequence files for a trivial set of 15 cores. Although this is far too small a set to yield anything resembling reasonable statistics, it serves admirably to illustrate the mechanics of core file generation. In the new core directory, we must start by creating the following three files:

We will discuss all of these files in turn below.

The core list file

The core list file in this example is conventionally called cores.loci, but the name is arbitrary (as long as the same name is used in the makefile's "depend" entry). The first tab-delimited field defines the core locus, the second names the PDB locus, and the third field defines which chains (and optionally which residues) comprise the domain. The format is described in detail in the
"Core definition file format" section.

Here are the entire contents of the cores.loci file as an HTML table:

1abrB11abrB:1-140d1abrb12.28.2.1.2Plant cytotoxin B-chain (lectin) [Abrus precatorius; Abrin]
1ad21ad2_d1ad2__5.20.1.1.1Ribosomal protein L1 [Thermus thermophilus]
1af51af5_d1af5__4.53.2.1.1DNA endonuclease I-CreI [Chlamydomonas reinhardtii]
1agrE1agrEd1agre_1.64.1.1.1Regulator of G-protein signalling 4, RGS4 [rat (Rattus norvegicus)]
1aihA1aihAd1aiha_4.93.1.1.1Integrase [Bacteriophage HP1]
1alo_11alo_:81-193d1alo_11.47.1.1.1Aldehyde oxidoreductase, domain 2 [(Desulfovibrio gigas)]
1alo_41alo_:311-442d1alo_44.77.1.1.1Aldehyde oxidoreductase, molybdemum cofactor-binding domain [(Desulfovibrio gigas)]
1aol1aol_d1aol__2.15.1.1.1F-MuLV receptor-binding domain [Friend murine leukemia virus, F-MuLV]
1arb1arb_d1arb__2.31.1.1.1Achromobacter protease [Achromobacter lyticus, strain m497-1]
1bco_11bco_:216-295d1bco_12.32.1.1.1Mu transposase, C-terminal domain [Bacteriophage mu]
1bp1_11bp1_:1-217d1bp1_14.43.1.1.1Bactericidal permeability-increasing protein, BPI [human (Homo sapiens)]
1burA11burA:137-463d1bura13.1.10.1.2Ribulose 1,5-bisphosphate carboxylase-oxygenase [spinach (Spinacia oleracea)]
1burA21burA:1-136d1bura24.33.9.1.2Ribulose 1,5-bisphosphate carboxylase-oxygenase [spinach (Spinacia oleracea)]
1bv11bv1_d1bv1__4.74.3.1.1Major birch pollen allergen Bet v 1 [white birch (Betula verrucosa)]
1cus1cus_d1cus__3.13.7.1.1Cutinase [fungus (Fusarium solani, subsp. pisi)]

Do not attempt to cut-and-paste this from your browser; it will not be able to recover the original tab-delimited fields from the HTML. Instead, take it from the distributed version of the test directory described on the needle tools installation page.

If you need to enter this file manually, note that only the first three fields (columns) are necessary. Note that it is important not to include any spaces in the first three fields, or make will be very confused by the file names generated.

The makefile

As mentioned above, the makefile must be named "makefile", and should look something like this (minus the numbered and hypertexted footnotes in the left margin):

    # makefile for .core and .seg file testing.
    #
    #    Modification history:
    #
    # created test core makefile.  -- rgr, 16-Jun-98.
    # generate DSSP files as well.  -- rgr, 1-Jul-98.
    # use make-domain-core.pl and pdb-domain-seq.pl.  -- rgr, 16-Sep-98.
    #

[1] GENERATE-DSSP = generate-dssp
[2] DSSP4 = dssp4.pl -dont-fill-e-gaps -turns-are-loops
    MAKE-SS-DESIGNATIONS = make-ss-designations
[3] MAKE-CORE = make-domain-core.pl
[4] MAKE-SEQ-FILE = pdb-domain-seq.pl

[5] core-file-name = cores.loci

[6] all: core-files sequence-files

[7] clean:
	    rm -f ${core-files} ${segment-files}
	    rm -f ${sequence-files} ${abbrev-dssp-files}

    # Search path for PDB files.
[8] search-path = ../pdb/pdb
    depend:		${core-file-name}
	    make-core-depends.pl -search-path ${search-path} \
		    -core-list-file ${core-file-name} > depends.make

    # Real rules -- generated by the above script.
    include depends.make
(If you cut & paste this out of your browser, be sure to (a) remove any leading whitespace and footnote numbers; (b) ensure that the nonblank lines immediately after the "depend" and "clean" targets start with ASCII tab characters rather than spaces; and (c) ensure that there are no spaces after the backslash characters at the ends of the make-core-depends.pl line.)

Notes:

  1. At the top of the makefile, a series of required macros are defined in the file.
  2. Notice how the makefile uses the DSSP4 macro to specify both a command name ("dssp4.pl") and a set of common options ("-dont-fill-e-gaps" and "-turns-are-loops") as well.
  3. This line defines the command that makes each of the core files. We must use make-domain-core.pl rather than make-core.pl because our core list contains cases where only a part of the chain is used.
  4. This line defines the command that makes each of the sequences files. Once again, we must use the more general pdb-domain-seq.pl [need hyperlink] rather than pdb-to-seq.pl because our core list contains cases where only a part of the chain is used.
  5. Although the ${core-file-name} macro is only used once here, it is often convenient to define and use it this way, especially in MRF score makefiles, where the core list file might live in another directory and be referenced more than once.
  6. The first target in a makefile is traditionally named "all"; being first, it is what gets made by default.
  7. While it is never necessary to have a "clean" target, it can come in handy when rebuilding everything. ${core-files} and ${segment-files} are predefined file name macros that are constructed by make-core-depends.pl and defined in the dependency file. Using these macros in the "clean" target ensures that the only things deleted are those that can be rebuilt.
  8. You may need to customize this search path for your site in order to use this makefile. At the very least, you will need to obtain the 13 PDB files in order to run the example.

Making the dependency file

To create the depends.make file, simply do "touch depends.make" (since the file must exist for make to read the makefile properly), then run the makefile's make-core-depends.pl command by doing "make depend". The transcript will look something like this:

    % touch depends.make
    % make depend
    make-core-depends.pl -search-path ../pdb/pdb \
    -core-list-file cores.loci > depends.make
    Generated targets for 13 abbrev-dssp files, 13 seg files, 15 core files, 15 seq files.
    % 
Note how the command is echoed with the "${search-path}" macro invocation filled out. Note also how make-core-depends.pl prints counts of the number of targets generated for each file type. (Although we have 15 cores and therefore 15 corresponding sequence files, there are only 13 unique PDB loci, and hence 13
abbreviated DSSP format and segment format files.

As shown in the initial fragment below, make-mrf-depends.pl also puts the arguments it sees, along with the date and time, at the head of the file it creates.


    # This file was generated automatically by make-core-depends.pl; do not edit!
    # Created on Thu Sep 17 17:50:09 EDT 1998 with the following options:
    #
    #	make-core-depends.pl -search-path ../pdb/pdb -core-list-file cores.loci
    #
    # perl libraries used:
    #
    #	/home2/staff/thread/bin/scripts/rule-based-make.pm
    #	/home2/staff/thread/bin/scripts/ppml.pm
    #
    # 15 loci were found in cores.loci.
    #

    . . .

After doing make depend, the directory will look something like this:


    % ls -l
    total 13
    -rw-rw-r--   1 thread   thread      1458 Sep 16 19:20 cores.loci
    -rw-rw-r--   1 thread   thread      8332 Sep 16 19:29 depends.make
    -rw-rw-r--   1 thread   thread       975 Sep 16 19:31 makefile
    % 
At this point, the directory is initialized. The make depend step will need to be redone only if either the make-core-depends.pl arguments, the location of the PDB files, or the contents of the core list file change.

Making the core and sequence files

Once the dependency file is all set up, the final step is to do "make", which will create the actual core and sequence files. This takes 33 minutes or so on my ancient Sparcstation, all but two of which is exposure computation, so it will probably be at least as fast on whatever you're running. We do not bother to show the transcript, which is a copy of the command lines in the depends.make file with warning messages interspersed.

    % ls -l
    total 960
    -rw-rw-r--   1 thread   thread       694 Sep 17 17:49 1abr.dssp
    -rw-rw-r--   1 thread   thread     27548 Sep 17 17:49 1abr.ent.out
    -rw-rw-r--   1 thread   thread     13392 Sep 17 17:49 1abrB1.core
    -rw-rw-r--   1 thread   thread       161 Sep 17 18:20 1abrB1.seq
    -rw-rw-r--   1 thread   thread     44184 Sep 17 17:50 1ad2.core
    -rw-rw-r--   1 thread   thread       317 Sep 17 17:50 1ad2.dssp
    -rw-rw-r--   1 thread   thread     11913 Sep 17 17:50 1ad2.ent.out
    -rw-rw-r--   1 thread   thread       247 Sep 17 18:20 1ad2.seq
    -rw-rw-r--   1 thread   thread     27560 Sep 17 17:50 1af5.core
    -rw-rw-r--   1 thread   thread       168 Sep 17 17:50 1af5.dssp
    -rw-rw-r--   1 thread   thread      6772 Sep 17 17:50 1af5.ent.out
    -rw-rw-r--   1 thread   thread       143 Sep 17 18:20 1af5.seq
    -rw-rw-r--   1 thread   thread      1059 Sep 17 17:54 1agr.dssp
    -rw-rw-r--   1 thread   thread     49808 Sep 17 17:54 1agr.ent.out
    -rw-rw-r--   1 thread   thread     33222 Sep 17 17:54 1agrE.core
    -rw-rw-r--   1 thread   thread       225 Sep 17 18:20 1agrE.seq
    -rw-rw-r--   1 thread   thread       876 Sep 17 17:57 1aih.dssp
    -rw-rw-r--   1 thread   thread     36028 Sep 17 17:57 1aih.ent.out
    -rw-rw-r--   1 thread   thread     34449 Sep 17 17:57 1aihA.core
    -rw-rw-r--   1 thread   thread       190 Sep 17 18:20 1aihA.seq
    -rw-rw-r--   1 thread   thread      1084 Sep 17 18:00 1alo.dssp
    -rw-rw-r--   1 thread   thread     48112 Sep 17 18:00 1alo.ent.out
    -rw-rw-r--   1 thread   thread     21475 Sep 17 18:00 1alo_1.core
    -rw-rw-r--   1 thread   thread       134 Sep 17 18:20 1alo_1.seq
    -rw-rw-r--   1 thread   thread     27968 Sep 17 18:00 1alo_4.core
    -rw-rw-r--   1 thread   thread       153 Sep 17 18:20 1alo_4.seq
    -rw-rw-r--   1 thread   thread     34862 Sep 17 18:01 1aol.core
    -rw-rw-r--   1 thread   thread       254 Sep 17 18:01 1aol.dssp
    -rw-rw-r--   1 thread   thread     12072 Sep 17 18:01 1aol.ent.out
    -rw-rw-r--   1 thread   thread       247 Sep 17 18:20 1aol.seq
    -rw-rw-r--   1 thread   thread     34066 Sep 17 18:02 1arb.core
    -rw-rw-r--   1 thread   thread       341 Sep 17 18:02 1arb.dssp
    -rw-rw-r--   1 thread   thread     13980 Sep 17 18:02 1arb.ent.out
    -rw-rw-r--   1 thread   thread       287 Sep 17 18:20 1arb.seq
    -rw-rw-r--   1 thread   thread       318 Sep 17 18:03 1bco.dssp
    -rw-rw-r--   1 thread   thread     15729 Sep 17 18:03 1bco.ent.out
    -rw-rw-r--   1 thread   thread     14597 Sep 17 18:03 1bco_1.core
    -rw-rw-r--   1 thread   thread       115 Sep 17 18:20 1bco_1.seq
    -rw-rw-r--   1 thread   thread       614 Sep 17 18:04 1bp1.dssp
    -rw-rw-r--   1 thread   thread     24209 Sep 17 18:04 1bp1.ent.out
    -rw-rw-r--   1 thread   thread     65239 Sep 17 18:04 1bp1_1.core
    -rw-rw-r--   1 thread   thread       240 Sep 17 18:20 1bp1_1.seq
    -rw-rw-r--   1 thread   thread      2654 Sep 17 18:19 1bur.dssp
    -rw-rw-r--   1 thread   thread    124856 Sep 17 18:18 1bur.ent.out
    -rw-rw-r--   1 thread   thread     71322 Sep 17 18:19 1burA1.core
    -rw-rw-r--   1 thread   thread       351 Sep 17 18:20 1burA1.seq
    -rw-rw-r--   1 thread   thread     21076 Sep 17 18:19 1burA2.core
    -rw-rw-r--   1 thread   thread       169 Sep 17 18:20 1burA2.seq
    -rw-rw-r--   1 thread   thread     41742 Sep 17 18:19 1bv1.core
    -rw-rw-r--   1 thread   thread       213 Sep 17 18:19 1bv1.dssp
    -rw-rw-r--   1 thread   thread      8468 Sep 17 18:19 1bv1.ent.out
    -rw-rw-r--   1 thread   thread       177 Sep 17 18:20 1bv1.seq
    -rw-rw-r--   1 thread   thread     36473 Sep 17 18:20 1cus.core
    -rw-rw-r--   1 thread   thread       189 Sep 17 18:20 1cus.dssp
    -rw-rw-r--   1 thread   thread     10482 Sep 17 18:20 1cus.ent.out
    -rw-rw-r--   1 thread   thread       218 Sep 17 18:20 1cus.seq
    -rw-rw-r--   1 thread   thread      1458 Sep 17 17:45 cores.loci
    -rw-rw-r--   1 thread   thread      7578 Sep 17 17:47 depends.make
    -rw-rw-r--   1 thread   thread       846 Sep 17 17:45 makefile

    % 

Testing the core generation results

After following this example, you should have duplicated the needle-tools-1.2/test-cores/ directory that came with the
needle tools distribution. If this directory is still available online, you can use diff to check your results.

    % diff . /usr/local/needle-tools-1.2/test-cores
    diff ./depends.make /usr/local/needle-tools-1.2/test-cores/depends.make
    2c2
    < # Created on Thu Jul  9 11:52:31 EDT 1998 with the following options:
    ---
    > # Created on Thu Jul  2 15:18:47 EDT 1998 with the following options:
    9c9
    < #	/home2/staff/thread/code/dist/test/bin/make-lib.pm
    ---
    > #	/home2/staff/thread/code/dist/test3/bin/make-lib.pm
    %
This shows that the depends.make files have different timestamps and use perl libraries loaded from different places, as expected. Any necessary makefile modifications will also show up, of course.

If there are discrepancies, then most likely it will be due to differences in PDB versions, or to roundoff errors. It may help to compare the messages in the transcript file in the needle-tools-1.5/test-cores/ directory to whatever messages were produced on your system.

If neither of these possibilities explains the differences between the two directories, then it should be reported to Bob Rogers <rogers@darwin.bu.edu> as a bug. Be sure to give full configuration details (harware type, operating system name and version, perl version, etc.), since that is the most likely source of the problem.


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Wed Dec 15 17:53:27 EST 1999