Glossary of BMERC jargon

BMERC : needle tools : Appendices : BMERC Glossary


[Just started this. -- rgr, 9-Oct-96.]

clique file
A clique file describes one or more sets of sequences (or structures, or other entities) that are related in some way. Clique files are used to specify cross-validation sets. See the "Cross-validation set file format" section for more details.
core total index (CTI)
the 1-based index of a residue in a core model, counting only those residues that appear in the core. In other words, if the last residue of a segment has CTI = n, then the first residue of the next segment has CTI = n+1. CTI's are used primarily in environment files.

geometric mean term (GMT)
for a given core residue in the MRF scoring scheme, the geometric mean of the pairwise marginals for all pairwise arcs involving that residue. Since score files deal in logarithms, the GMT score file contains the average (arithmetic mean) of the pairwise marginal -log(P) values. [am I missing a subscript? -- look this up. -- rgr, 23-Jan-97.]

MRF
Stands for "Markof random field." [Just for completeness. -- rgr, 4-Sep-97.]

pdb-index
This is an alphanumeric field that uniquely identifies a residue within a PDB model (or the whole file if it does not use models). It looks exactly as if taken from (1-based) columns 22 through 27 of the ATOM record of the PDB file, and hence is a composite of three consecutive PDB fields: the chain ID (1 character), the residue sequence number (four digits), and the insertion code (1 character). The latter two fields (index and insertion code) are also known as the pdbres field, but this requires an implied chain ID.

pdbres
This is an alphanumeric field that identifies a PDB residue within a chain. It looks exactly as if taken from (1-based) columns 23 through 27 of the ATOM record of the PDB file. In general, this will not be an integer, since there are sequence numbers like " 235A". The first four characters are digits, with blank padding on the left, and the fifth and last character is either a letter suffix or blank. See the PDB sequence number description for further caveats.

By prefixing the chain ID to a pdbres field, one can create a pdb-index. [The code has recently started evolving to use the more complete pdb-index fields instead of pdbres fields, since one needs an explicit chain for multichain cores. -- rgr, 27-Jul-98.]

segment type
Segment types are strings that convey the nature of the segment. [Note: Elsewhere this is called the "segment designator" or "segment descriptor"; the terminology should be standardized. -- rgr, 21-Aug-96.] All helices are of type "H"; all other segments are strands of various sorts. The labels fall into three classes:
  1. Helices are uniformly denoted by "H".
  2. Open-sheet strands are denoted "En", where n is the sheet number. (Sheet numbers are assigned arbitrarily in order of the first residue in the sheet.)
  3. Cyclical strands (i.e. strands comprising a barrel structure) are denoted by "Cxxyyzz", where xx is the number of this strand, yy is the number of a neighboring strand, and zz is the number of the other neighboring strand (the choice of neighbors is arbitrary at present). Numbers are assigned to strands according to the order of their first residue, whether or not they appear in a barrel.
Note that sheet indices and the strand indices used to name barrel segments are numbered for the PDB entry as a whole, and not chain by chain. This should work fine for most code, which only cares whether strand residues are part of the same sheet or not. (In principle, barrels are trickier, since one cannot assume that a core model includes only one barrel; one must trace the barrel around to find if two strands are on the same barrel. In practice, 1tie seems to be the only model with more than one barrel, and it appears to be pathological in any case. -- rgr, 21-Aug-96.)

[And many score functions care only whether a segment is a helix or a strand. The code that generates these labels is also under development; the assignment is not always unique, so our heuristics may change. -- rgr, 31-Jul-96.]


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Tue Dec 14 16:48:34 EST 1999