Alignment manipulation programs
BMERC : needle tools : Programs : Alignment programs
See the Threading
alignment experiments page for more information on how
needle uses alignments.
Table of contents
- Alignment manipulation programs
- Table of contents
- fssp-core-corr.pl
- correct-alignment.pl
- align.pl
- pima-multi-align.pl
- pima2fssp.pl
- make-align-depends.pl
fssp-core-corr.pl
The fssp-core-corr.pl program produces a map of "FSSP core
correspondences" (hence the name) between two cores, given those cores
plus an FSSP-format
alignment file. The resulting file is said to be in "ctimap" format because
it maps between two sets of "core total
indices".
Usage:
fssp-core-corr.pl [-verbose] [-print-align]
[-use-fssp-structural-equivalence] [-fssp fssp-file-name]
-core1 core-file-name-1 -core2 core-file-name-2
[-locus1 locus-name-1] [-locus2 locus-name-2]
Arguments:
- -use-fssp-structural-equivalence
- if specified, uses FSSP definitions of structural equivalence,
indicated by case in the alignment. If this option is given,
only columns in the alignment where both residues are in
uppercase are considered; otherwise, case is ignored. The
default is not to require structural equivalence (which is
probably a mistake).
- -core1 core-file-name-1
- name of the first "core" format file,
e.g. "2mhr.core", required for input.
- -locus1 locus-name-1
- specifies a locus name for the first core. If omitted, the file
name minus the extension and directory (e.g. "2mhr" for
"../2mhr.core") is used (but the name must contain at
least one dot).
- -core2 core-file-name-2
- name of the second "core" format file,
e.g. "2mhr.core", required for input.
- -locus2 locus-name-2
- specifies a locus name for the second core. If omitted, the file
name minus the extension and directory (e.g. "2mhr" for
"../2mhr.core") is used (but the name must contain at
least one dot).
- -fssp fssp-file-name
- specifies the FSSP-format
alignment file to use; if not specified, this is read from
the standard input.
- -print-align
- if specified, this boolean option causes the original FSSP
alignment to be printed in multiple alignment format, along with
the secondary structures as defined by the core files. After
this alignment is printed, fssp-core-corr.pl exits
immediately, and does not produce ctimap output.
- -verbose
- if specified, causes the -print-align alignment to be
printed, and uses a three-column output format which includes
mappings where a given aligned position belongs to either
core, instead of requiring both. This is useful only for
debugging.
[probably need more verbiage here. -- rgr, 15-Sep-99.]
correct-alignment.pl
correct-alignment.pl takes an alignment between two sequences
(in FSSP
format) and additional sequence files for one or both of them, and
extends the alignment with residues in the "correct" sequences. Since
correct-alignment.pl must never give a false appearance of
alignment when both sequences happen to have residues inserted in the
same place, it treats the sequences independently, i.e. by doing:
insert1----------
-------insertion2
rather than trying to deal with "insertion-in-both" as a special case.
It does this even when inserting the identical residues in both aligned
sequences.
Usage:
correct-alignment.pl [-verbose] [-preserve-case] [-fssp fssp-file-name]
[-seq1 seq-file-name-1] [-seq2 seq-file-name-2]
Arguments:
- -fssp fssp-file-name
- specifies the FSSP-format
alignment file to read; if not specified, this is read from
the standard input.
- -seq1 seq-file-name-1
- -seq2 seq-file-name-2
- specifies the name of a sequence file to use to correct one of
the sequences in the alignment; either or both may be specified.
In fact, the sequence name stored in the file is what is used to
decide which aligned sequence to correct, so it doesn't matter
which is sequence 1 and which is sequence 2. (If neither
sequence file is specified, the original alignment is
regurgitated.)
- -preserve-case
- if specified, case is preserved for residues inserted from the
additional sequence(s). Otherwise, the default is to apply the
FSSP convention of lowercase to indicate "structurally
nonequivalent" residues.
- -verbose
- if specified, causes extra debugging information to be printed.
[***finish***: an example would be good here. -- rgr, 29-Oct-99.]
Known bugs:
- *** Sequence names (labels) in sequence files must match the
sequence names in the FSSP file, the way the thing is currently
written. In a sense, this is a feature, because you don't have
to know in what order the sequences appear in the alignment. But
it's a bug if you do know and need to use that to override
mismatched sequence names.
align.pl
The align.pl program takes two IG format sequence files and
produces a global alignment on the standard output in FSSP alignment
format. This is just a simple user interface to the globalS program (which only
understands FA sequence file
format, and produces a different alignment format).
Usage:
align.pl seq1 [ seq2 ]
Arguments:
- seq1
- name of the first IG
format file, e.g. "2mhr.seq", required for input.
- seq2
- name of the second IG
format file; this comes from the standard input if omitted.
Known problems:
- At present, align.pl converts the sequence to uppercase,
which makes it unsuitable for patterns. -- rgr, 15-Sep-99. [I
think this only applies if you use the unpatched version of the
globalS program. --
rgr, 29-Oct-99.]
pima-multi-align.pl
Given a .tbl file of sequences and a PIMA .pScore pattern file,
pima-multi-align.pl produces a consolidated multiple alignment of all of the
sequences aligned to the pattern. It does this by running pima_profile
on these two files with the -showAlign argument, and using the
internally generated alignment file.
Alternatively, you can specify an alignment file with the -pima
argument, in which case a -pscore argument is not necessary.
Arguments:
- -verbose
- If specified, produces debugging output. May be
specified multiple times to increase the amount of output.
- -help
- Prints a brief help message, and exits.
- -html-help
- Prints a brief help message in HTML, and exits.
- -tbl sequence-file
- Name of a .tbl format file of sequences.
Required, and cannot be '-', since pima-multi-align.pl must both
read the file and pass it to pima_align as well.
- -pima pima-alignment-file
- Name of a PIMA .align file, in lieu of running
pima_profile on sequence-file. If pima-alignment-file is specified,
-pscore is not needed.
- -pscore pima-pscore-file
- .pScore file name to pass to pima_profile.
Cannot be '-' (standard input) due to pima_profile limitations.
- -markup markup-file-name
- Reads HTML markup replacement instructions from the named file.
This file should contain the initial text and the replacement, separated by
tabs, one pair per line.
pima-multi-align.pl also takes .tbl format sequence file names on the command line.
pima2fssp.pl
The pima2fssp.pl program takes a PIMA alignment file
(undocumented) and converts it to FSSP alignment
format on the standard output.
Usage:
pima2fssp.pl [-locus1 new-locus1] [-locus2 new-locus2]
[ pima-alignment-file ]
Arguments:
- -locus1 new-locus1
- optionally specifies a new locus name to override whatever
pima2fssp.pl found in the first sequence file.
- -locus2 new-locus2
- optionally specifies a new locus name to override whatever
pima2fssp.pl found in the second sequence file.
- pima-alignment-file
- name of an input PIMA alignment file; if not specified, standard
input is read.
make-align-depends.pl
[***here***: finish this. -- rgr, 15-Sep-99.]
Bob Rogers
<rogers@darwin.bu.edu>
Last modified: Tue Jan 18 21:10:05 EST 2000