| Host name | Host description | Benchmark results | |||||
|---|---|---|---|---|---|---|---|
| OS name | Release | CPU | new-stat | stat-test | make-core | 1bmt-cva | |
| darcy | ULTRIX | 4.4 | RISC | 0.37 (77%) | 0.74 (85%) | ||
| darwin | SunOS | 4.1.3_U1 | sun4c | 0.70 (81%) | 0.60 (90%) | ||
| hodgkin | SunOS | 4.1.4 | sun4m | 1.1 (62%) | 1.2 (86%) | ||
| hinshelwood | SunOS | 4.1.4 | sun4m | 1.1 (62%) | 1.1 (81%) | ||
| dobzhansky | SunOS | 4.1.4 | sun4m | 0.90 (50%) | 1.2 (85%) | 0.99 (96%) | 1.1 (99%) |
| gamow | SunOS | 4.1.4 | sun4m | 1.0 (55%) | 1.0 (74%) | 1.0 (97%) | 1.0 (98%) |
| pauling | SunOS | 4.1.4 | sun4m | 1.1 (60%) | 1.2 (86%) | 1.0 (97%) | |
| monod | SunOS | 4.1.4 | sun4m | 1.2 (64%) | 1.2 (85%) | 1.1 (99%) | |
| cyrus | Solaris | 5.6 | sun4m | 1.6 (70%) | 1.9 (79%) | 6.4 (90%) | 2.0 (99%) |
| sewall | Solaris | 5.6 | sun4m | 1.6 (72%) | 2.0 (78%) | 6.0 (84%) | 2.1 (99%) |
| mcclintock | Solaris | 5.6 | sun4m | 2.3 (63%) | 2.9 (72%) | 11 (91%) | 2.8 (99%) |
| feynman | Solaris | 5.6 | sun4u | 5.8 (33%) | 6.2 (41%) | 11 (98%) | |
| mbcrrc | Solaris | 5.7 | sun4u | 5.8 (35%) | 6.6 (46%) | 28 (68%) | 9.7 (99%) |
| delbrueck | Solaris | 5.6 | sun4u | 9.1 (51%) | 12 (72%) | 35 (83%) | 11 (98%) |
| jrsadler | OSF1 | V4.0 | alpha | 0.25 (91%) | 0.85 (85%) | 6.4 (90%) | 1.9 (99%) |
| dayhoff | OSF1 | V4.0 | alpha | 1.6 (58%) | 2.5 (74%) | 3.5 (90%) | 3.7 (99%) |
| huxley | OSF1 | V4.0 | alpha | 4.6 (34%) | 6.9 (46%) | 29 (64%) | 16 (98%) |
| miescher | OSF1 | V4.0 | alpha | 3.4 (15%) | 8.9 (30%) | 27 (98%) | |
| vavilov | OSF1 | V4.0 | alpha | 3.4 (14%) | 8.9 (29%) | 27 (99%) | |
| amdk6 | Linux | 2.2.5-15 | i586 | 6.4 (77.0%) | 6.2 (97%) | ||
| sigler | Linux | 2.2.16-22 | i686 | 4.1 (12.5%) | 8.9 (14.9%) | 15 (24.6%) | 35 (97.8%) |
The next-to-last host in the list, amdk6, is my vintage 1999 home computer, a 300MHz Linux PC with an AMD K6 Pentium-clone processor and 28MB of RAM. The last host, sigler, is a much newer (March 2001) PC with a 1200MHz AMD processor and 512MB of RAM.
For reference, here are miescher and vavilov before they were upgraded (i.e. the performance of the old boxes):
| Host name | Host description | Benchmark results | |||||
|---|---|---|---|---|---|---|---|
| OS name | Release | CPU | new-stat | stat-test | make-core | 1bmt-cva | |
| vavilov | OSF1 | V4.0 | alpha | 3.2 (27%) | 6.6 (44%) | 28 (65%) | |
| miescher | OSF1 | V4.0 | alpha | 4.6 (37%) | 6.3 (42%) | 30 (68%) | 15 (98%) |
Results of testing the new vavilov and miescher are shown in the main
table.
Timings were produced on an unloaded machine (to the extent
possible), as they are intended to give an idea of the best realistic
performance for that machine on that class of problem. The shorter
tasks (specifically, new-stat and stat-test) were repeated twice, and the faster
time (almost always the second one) was used.
The values given for each benchmark are of the form
I use gamow as the standard mostly because it is the machine on my
desk; since the speed value for this example is 2.3, that means it runs
2.3 times as fast on that machine as it does on gamow (which took 1:03
on this problem). It also helps that gamow is not a server machine,
which improves the reliability and reproducibility of timings done
there. Finally, picking a slow machine as a standard makes the speed
multiples mostly greater than one, which make comparing them more
intuitive.
The table still has a few blank spots; it has taken quite a while to
fill it out even this far, especially since I feel no pressure. Some
machines (e.g. pauling and gamow) are almost identical in configuration,
and should produce almost identical results.
General comments
The "Host description" columns were produced by the uname
program on each machine:
2.3 (63%)
The first number gives the relative speed of the machine in terms of
elapsed time. The second number is the percentage of the CPU that was
used during that time. These are from the third and fourth numbers
given by the time command of csh, e.g.
12.0u 6.0s 0:28 63% 0+0k 0+0io 0pf+0w
The first two numbers are user and system CPU time, respectively; these
are not reported in the table since elapsed time is more meaningful for
figuring out how long it will take to run a given program on a given
machine. Notice, however, that the sum of the CPU times is
approximately equal to the elapsed time divided by the processor
utilization fraction. Consult the csh man page if you want to
know more about these values.
A note on I/O traffic
Some of these benchmarks (new-stat and stat-test in particularl) tend to be I/O-bound,
leading to lower numbers on the faster machines than for the
compute-bound benchmarks (e.g. 1bmt-cva). That
is why the percentage of processor utilization is also given along with
the speed multiple. For a process that is not page-bound on machine
that is not doing much else, processor utilization gives an indication
of the degree to which performance is limited by I/O. For an I/O-bound
job, this number will be lower, because a greater percentage of time is
spent waiting for file access, usually over the network; these jobs are
therefore more sensitive to file server loading. (All of these tests
use files served by delbrueck exclusively, so the degree of processor
loading on delbrueck can introduce error. I have tried to avoid running
benchmarks at times when delbrueck is unusually busy, but since
transient loading can be hard to detect, some noise of this nature is
inevitable.)
It is interesting to compare the times for delbrueck to those of the other machines in its class, feynman and mbcrrc. Since delbrueck is the file server, its speed multiples for the I/O-bound tasks are more in line with the speed multiple for 1bmt-cva. This implies that gamow is not actually I/O-bound for these tasks; delbrueck can supply gamow with data as fast as gamow can accept it (though the low processor utilization for gamow is puzzling). feynman and mbcrrc have 1bmt-cva multiples similar to delbrueck, but, although the data are ambiguous, appear to lose a factor of two when they need to rely heavily on delbrueck for file service.
There may also be some vendor-dependence in file I/O performance, in that Sun will tune their NFS client using Sun servers, and DEC will tune their NFS client using DEC servers, so a DEC machine talking to a Sun server may experience a larger performance hit for an I/O-bound job than a Sun machine performing the same job. This might explain why the Alphas do proportionally less well than the Solaris boxes on this task (though data are still scant).
cd ~thread/code/stat/new-stat/
uptime; make clean; time make install; uptime
The standard time for this task on gamow is 1:03. The elapsed
times range from 7 seconds to over four minutes (the latter on
jrsadler, on which gcc seems to run painfully slowly). The
roundoff error can be more than 10% on the faster machines, since
those times only have one significant digit.
cd ~thread/code/stat/test
uptime; make clean > /dev/null; time make check > foo.text; uptime
This takes 8:17 on gamow.
cd ~thread/code/stat/dist/test-cores
uptime; make clean > /dev/null; time make; uptime
This takes 32:34 on gamow. Since the perl scripts use internal
pipes, it runs markedly faster on machines with multiple
processors (the Solaris boxes each have four, as I understand
it). It could run even faster on such machines by using the
parallel processing capability of GNU make,
but I am not doing this (as far as I know).
cd ~thread/code/stat/test
uptime; rm -f new-env_1bmtA.pair; time make cmp-1bmtA; uptime
The calibration run on gamow takes fully 1:09:01, so the
resolution is quite good even for the fastest machines. Note how
the processor utilization figures are all up in the high 90's.
This is because calculate-vv-all does little I/O and
(apparently) doesn't need to page.