Alignment manipulation programs

BMERC : needle tools : Programs : Alignment programs


See the Threading alignment experiments page for more information on how needle uses alignments.

Table of contents

  1. Alignment manipulation programs
    1. Table of contents
    2. fssp-core-corr.pl
    3. correct-alignment.pl
    4. align.pl
    5. pima-multi-align.pl
    6. pima2fssp.pl
    7. make-align-depends.pl


fssp-core-corr.pl

The fssp-core-corr.pl program produces a map of "FSSP core correspondences" (hence the name) between two cores, given those cores plus an
FSSP-format alignment file. The resulting file is said to be in "ctimap" format because it maps between two sets of "core total indices".

Usage:

    fssp-core-corr.pl [-verbose] [-print-align]
                [-use-fssp-structural-equivalence] [-fssp fssp-file-name]
                -core1 core-file-name-1 -core2 core-file-name-2
                [-locus1 locus-name-1] [-locus2 locus-name-2]

Arguments:

-use-fssp-structural-equivalence
if specified, uses FSSP definitions of structural equivalence, indicated by case in the alignment. If this option is given, only columns in the alignment where both residues are in uppercase are considered; otherwise, case is ignored. The default is not to require structural equivalence (which is probably a mistake).
-core1 core-file-name-1
name of the first "core" format file, e.g. "2mhr.core", required for input.
-locus1 locus-name-1
specifies a locus name for the first core. If omitted, the file name minus the extension and directory (e.g. "2mhr" for "../2mhr.core") is used (but the name must contain at least one dot).
-core2 core-file-name-2
name of the second "core" format file, e.g. "2mhr.core", required for input.
-locus2 locus-name-2
specifies a locus name for the second core. If omitted, the file name minus the extension and directory (e.g. "2mhr" for "../2mhr.core") is used (but the name must contain at least one dot).
-fssp fssp-file-name
specifies the FSSP-format alignment file to use; if not specified, this is read from the standard input.
-print-align
if specified, this boolean option causes the original FSSP alignment to be printed in multiple alignment format, along with the secondary structures as defined by the core files. After this alignment is printed, fssp-core-corr.pl exits immediately, and does not produce ctimap output.
-verbose
if specified, causes the -print-align alignment to be printed, and uses a three-column output format which includes mappings where a given aligned position belongs to either core, instead of requiring both. This is useful only for debugging.
[probably need more verbiage here. -- rgr, 15-Sep-99.]


correct-alignment.pl

correct-alignment.pl takes an alignment between two sequences (in
FSSP format) and additional sequence files for one or both of them, and extends the alignment with residues in the "correct" sequences. Since correct-alignment.pl must never give a false appearance of alignment when both sequences happen to have residues inserted in the same place, it treats the sequences independently, i.e. by doing:
    insert1----------
    -------insertion2
rather than trying to deal with "insertion-in-both" as a special case. It does this even when inserting the identical residues in both aligned sequences.

Usage:

    correct-alignment.pl [-verbose] [-preserve-case] [-fssp fssp-file-name]
                [-seq1 seq-file-name-1] [-seq2 seq-file-name-2]

Arguments:

-fssp fssp-file-name
specifies the FSSP-format alignment file to read; if not specified, this is read from the standard input.
-seq1 seq-file-name-1
-seq2 seq-file-name-2
specifies the name of a sequence file to use to correct one of the sequences in the alignment; either or both may be specified. In fact, the sequence name stored in the file is what is used to decide which aligned sequence to correct, so it doesn't matter which is sequence 1 and which is sequence 2. (If neither sequence file is specified, the original alignment is regurgitated.)
-preserve-case
if specified, case is preserved for residues inserted from the additional sequence(s). Otherwise, the default is to apply the FSSP convention of lowercase to indicate "structurally nonequivalent" residues.
-verbose
if specified, causes extra debugging information to be printed.
[***finish***: an example would be good here. -- rgr, 29-Oct-99.]

Known bugs:

  1. *** Sequence names (labels) in sequence files must match the sequence names in the FSSP file, the way the thing is currently written. In a sense, this is a feature, because you don't have to know in what order the sequences appear in the alignment. But it's a bug if you do know and need to use that to override mismatched sequence names.


align.pl

The align.pl program takes two
IG format sequence files and produces a global alignment on the standard output in FSSP alignment format. This is just a simple user interface to the globalS program (which only understands FA sequence file format, and produces a different alignment format).

Usage:

	align.pl seq1 [ seq2 ]

Arguments:

seq1
name of the first IG format file, e.g. "2mhr.seq", required for input.
seq2
name of the second IG format file; this comes from the standard input if omitted.

Known problems:

  1. At present, align.pl converts the sequence to uppercase, which makes it unsuitable for patterns. -- rgr, 15-Sep-99. [I think this only applies if you use the unpatched version of the globalS program. -- rgr, 29-Oct-99.]


pima-multi-align.pl

Given a .tbl file of sequences and a PIMA .pScore pattern file, pima-multi-align.pl produces a consolidated multiple alignment of all of the sequences aligned to the pattern. It does this by running pima_profile on these two files with the -showAlign argument, and using the internally generated alignment file.

Alternatively, you can specify an alignment file with the -pima argument, in which case a -pscore argument is not necessary.

Arguments:

-verbose
If specified, produces debugging output. May be specified multiple times to increase the amount of output.
-help
Prints a brief help message, and exits.
-html-help
Prints a brief help message in HTML, and exits.
-tbl sequence-file
Name of a .tbl format file of sequences. Required, and cannot be '-', since pima-multi-align.pl must both read the file and pass it to pima_align as well.
-pima pima-alignment-file
Name of a PIMA .align file, in lieu of running pima_profile on sequence-file. If pima-alignment-file is specified, -pscore is not needed.
-pscore pima-pscore-file
.pScore file name to pass to pima_profile. Cannot be '-' (standard input) due to pima_profile limitations.
-markup markup-file-name
Reads HTML markup replacement instructions from the named file. This file should contain the initial text and the replacement, separated by tabs, one pair per line.
pima-multi-align.pl also takes .tbl format sequence file names on the command line.


pima2fssp.pl

The pima2fssp.pl program takes a PIMA alignment file (undocumented) and converts it to
FSSP alignment format on the standard output.

Usage:

	pima2fssp.pl [-locus1 new-locus1] [-locus2 new-locus2]
                [ pima-alignment-file ]

Arguments:

-locus1 new-locus1
optionally specifies a new locus name to override whatever pima2fssp.pl found in the first sequence file.
-locus2 new-locus2
optionally specifies a new locus name to override whatever pima2fssp.pl found in the second sequence file.
pima-alignment-file
name of an input PIMA alignment file; if not specified, standard input is read.


make-align-depends.pl

[***here***: finish this. -- rgr, 15-Sep-99.]


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Tue Jan 18 21:10:05 EST 2000