Alignment file formats

BMERC : needle tools : File formats : Alignment file formats


Table of contents

  1. Alignment file formats
    1. Table of contents
    2. FSSP alignment format
    3. ctimap format
    4. Homolog file format


FSSP alignment format

[finish. -- rgr, 26-Sep-99.]


ctimap format

[finish. -- rgr, 26-Sep-99.]


Homolog file format

[This is essentially obsolete. -- rgr, 9-Nov-99.]

The homolog or .hlg file contains homolog sequences aligned to a single core sequence, which may then be counted in that core's environments. The aligned sequences are kept in a structured sequence table format file where the first "sequence" encodes the core secondary structure, the second sequence is that of the core, and the third and subsequent sequences are the homologs. The alphabet for the secondary structure is "ehlt", where the letters stand for extended (strand), helix, loop, and turn respectively, plus "-" for gaps. [Not sure if case matters; there is no extant code that actually uses this information. -- rgr, 22-Apr-97.] The pattern of gaps in the secondary structure string must be identical to the pattern of gaps in the core sequence string.

The alignment is somewhat more constrained than a standard multiple alignment in that gaps are not permitted in core elements. [fill this out. -- rgr, 10-Jan-97.]

Here is an example, the original file for 1hoe (from the ~thread/alignment/1hoe.hlg file, dated July 1994). There are four lines in the file, which have been wrapped with backslashes ("\") that do not appear in the data.

1hoe	------------------------------lllllllllleeeeeettteeeeeeettt\
eeeeeeeettteeeeeeeettteeeeeellllllllleeeeeeel
1hoe	------------------------------DTTVSEPAPSCVTLYQSWRYSQADNGCAE\
TVTVKVVYEDDTEGLCYAVAPGQITTVGDGYIGSHGHARYLARCL
1hoe	------------------------------DTTVSEPAPSCVTLYQSWRYSQADNGCAE\
TVTVKVVYEDDTEGLCYAVAPGQITTVGDGYIGSHGHARYLARCL
1hoe	MRVRALRLAALVGAGAALALSPLAAGPASADTTVSEPAPSCVTLYQSWRYSQADNGCAQ\
TVTVKVVYEDDTEGLCYAVAPGQITTVGDGYIGSHGHARYLARCL


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Fri Nov 26 21:41:08 EST 1999