BMERC : needle tools : File formats : MRF file formats
Table of contents
Environment file format
The environment describes what set of scores to use at a given residue
position (or between a pair of residue positions, for pairwise
environments). The environment file maps a core total index (or two CTI's for
pairwise) to an environment number. Choice of environment numbers
depends on the score function; the environment-generating code for a
given score function is free to assign environment numbers arbitrarily
(though for practical reasons, they should be restricted to small
positive integers.
Note that there is no loop environment file; the definition of loop environments is implicit (and besides, there are no fixed features on which to hang environment assignments).
The first two numbers in an environment file appear on the first line of the file, and are:
The following example is taken from the start of a singleton file, mrf-se-10efa-2ss-2hpr.dat to be exact. Note that there are a total 47 entries in this file; there must be exactly one for each core position in a singleton file, so the CTI in the first column runs through all values from 1 to 47. (There is no requirement that they be in order, though.)
47 2
1 14
2 18
3 11
4 15
5 7
6 9
7 1
8 1
9 5
10 4
11 1
12 1
13 8
14 2
15 1
. . .
The following is an example of a pairwise environment file, pairwise_environments_MRF_4cpa.dat to be exact. Although there are a total of 198 entries in this file, we show only the lines involving the first 5 core positions. Note that there is no constraint on length or order, and a given core position need not appear in the file at all (though that would be suspicious).
198 3
1 3 17
1 21 6
1 23 5
2 4 17
11 2 9
14 2 9
15 2 9
2 22 6
2 27 5
2 29 5
3 5 17
3 23 5
3 28 5
7 4 9
8 4 11
11 4 9
4 22 6
4 24 6
4 29 5
5 30 5
31 5 9
. . .
[explain about symmetry issues. -- rgr, 14-Jan-97.]
Note that the GMT environment file is of a special form. Since each core position gets its own set of GMT values (which may not be distinct, but we don't bother checking, since the chance is small), each core position also needs its own environment. Therefore, the environment file is an identity mapping, as shown in the singleton_environments_MRF_4fxn.dat file below:
74 2
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
. . .
68 68
69 69
70 70
71 71
72 72
73 73
74 74
The associated singleton score file with the GMT probabilities must
therefore have 74 environments.
Score values are usually floating point, but may be integers. In either case, larger values are interpreted as unfavorable, and smaller values as favorable. Since needle converts all scores to integers for speed, all values should be in the same range in order to avoid loss of precision. (But needle has run parameters that can be used to control the scaling.)
Each score file consists of the following elements:
The following singleton example is from the start of the singleton_scores_x_MRF_4fxn.dat file. These are GMT scores, which is why there are scores defined for each of 74 environments. The backslashes ("\" characters) do not actually appear in score files -- they are there to indicate where line breaks have been added for readability. (On my browser, these examples will fit in the window when the browser width is at least 700 pixels.)
ALA CYS ASP GLU PHE GLY HIS ILE LYS LEU MET ASN PRO GLN ARG SER THR VAL TRP TYR
74
1 2.759 3.548 3.609 3.500 2.813 2.958 3.692 2.416 3.416 2.304 \
3.416 3.653 3.950 3.634 3.366 2.973 2.900 2.088 3.652 2.821
2 2.405 3.635 3.799 3.684 2.858 3.015 3.838 2.274 3.722 2.148 \
3.435 3.792 3.907 3.816 3.650 3.068 2.979 1.965 3.786 2.991
3 2.609 3.695 3.824 3.778 2.690 2.869 3.930 2.199 3.692 2.084 \
3.497 3.872 4.298 3.866 3.595 3.048 2.931 1.864 3.851 2.832
4 2.604 3.477 3.553 3.433 2.920 3.028 3.591 2.496 3.440 2.387 \
3.337 3.558 3.640 3.567 3.401 3.006 2.952 2.226 3.576 2.956
5 2.399 3.703 3.859 3.783 2.783 2.944 3.927 2.196 3.791 2.066 \
3.474 3.869 4.095 3.894 3.697 3.082 2.974 1.877 3.871 2.959
6 2.031 3.975 3.713 3.421 2.883 3.084 3.914 2.421 3.556 1.901 \
3.329 3.700 4.193 3.511 3.307 3.198 3.165 2.297 3.817 3.166
. . .
This is the pairwise singleton_scores_x_MRF_4fxn.dat file that corresponds to the singleton example (more heavily edited because pairwise scores have vastly more numbers). The environment numbers are in bold for readability.
ALA CYS ASP GLU PHE GLY HIS ILE LYS LEU MET ASN PRO GLN ARG SER THR VAL TRP TYR
20
1
-0.014 0.180 0.358 -0.118 0.101 -0.092 0.353 -0.026 0.137 -0.026 \
0.247 0.224 0.135 0.156 0.254 -0.041 -0.187 -0.228 0.145 0.121
0.180 -1.227 -0.356 0.064 0.035 0.020 -0.362 -0.111 -0.278 0.461 \
-0.152 -0.441 -0.356 -0.187 -0.245 -0.079 0.274 0.403 -0.569 -0.339
0.358 -0.356 -0.871 0.242 0.213 -0.159 -0.407 -0.205 -0.100 0.233 \
0.180 -0.669 -0.178 -0.232 -0.537 -0.307 -0.241 0.581 -0.209 -0.161
. . .
0.145 -0.569 -0.209 0.434 -0.489 0.257 0.009 0.087 0.093 0.256 \
-0.321 -0.071 -0.997 0.184 -0.211 -0.115 0.057 0.243 -0.487 0.149
0.121 -0.339 -0.161 0.153 -0.228 0.138 -0.454 0.170 -0.083 -0.050 \
-0.091 -0.614 0.650 -0.279 -0.626 0.010 -0.001 0.453 0.149 0.091
2
-0.755 0.226 0.015 -0.362 0.226 0.226 0.226 -0.131 -0.262 -0.040 \
0.226 0.226 0.226 0.238 -0.131 0.226 0.250 0.238 0.226 0.226
0.226 -0.046 -0.034 0.177 -0.046 -0.046 -0.046 0.002 0.026 0.093 \
-0.046 -0.046 -0.046 -0.034 0.002 -0.046 -0.022 -0.034 -0.046 -0.046
0.226 -0.046 -0.034 0.177 -0.046 -0.046 -0.046 0.002 0.026 0.093 \
-0.046 -0.046 -0.046 -0.034 0.002 -0.046 -0.022 -0.034 -0.046 -0.046
. . .
0.226 -0.046 -0.034 0.177 -0.046 -0.046 -0.046 0.002 0.026 0.093 \
-0.046 -0.046 -0.046 -0.034 0.002 -0.046 -0.022 -0.034 -0.046 -0.046
0.015 -0.034 -0.022 0.189 -0.034 -0.034 -0.034 0.015 0.038 0.106 \
-0.034 -0.034 -0.034 -0.022 0.015 -0.034 -0.009 -0.022 -0.034 -0.034
3
-0.171 0.052 0.077 0.077 0.052 0.052 0.052 0.113 0.052 -0.378 \
0.052 0.052 0.052 0.052 0.052 0.052 0.052 -0.053 0.052 0.065
0.282 -0.054 -0.029 -0.029 -0.054 -0.054 -0.054 0.006 -0.054 0.326 \
-0.054 -0.054 -0.054 -0.054 -0.054 -0.054 -0.054 0.064 -0.054 -0.042
0.282 -0.054 -0.029 -0.029 -0.054 -0.054 -0.054 0.006 -0.054 0.326 \
-0.054 -0.054 -0.054 -0.054 -0.054 -0.054 -0.054 0.064 -0.054 -0.042
. . .
Loop scores are vastly simpler. Here is the loop_scores_x_MRF_4fxn.dat file in its entirety.
ALA CYS ASP GLU PHE GLY HIS ILE LYS LEU MET ASN PRO GLN ARG SER THR VAL TRP TYR
1
0 2.636 4.097 2.568 2.961 3.401 2.141 3.759 3.374 2.842 2.826 \
4.101 2.790 2.595 3.348 3.188 2.588 2.750 3.058 4.423 3.499