Benchmarks comparing Fractal Domains 2.0b5 and 2.0
Fractal Domains v2.0 detects the total number of cores available and allocates one calculation thread per core, allowing calculations to be performed in parallel. This yields performance improvements on all dual-CPU PPC Macs and almost all Intel Macs (I cannot say "all" because Apple was briefly selling an Intel single-core Mac Mini model).
I ran some informal benchmarks on the "Christmas Tree" fractal (the parameter file can be found on the web site under the Fractal of the Week for December 16, 2006). I opened the parameter file itself and noted the generation time in the Statistics window. I then rendered the same image with 4x4 anti-aliasing. The following results were achieved on the available machines:
Note:
FD 2.0b5 was compiled with CodeWarrior and is
single-threaded.
FD 2.0 is compiled with gcc 4.0 and is multi-threaded.
PowerMac MDD Dual G4 1GHz
FD 2.0b5: No Anti-Alias, 36 s; With
Anti-Alias, 9 m 33 s (573 s)
FD 2.0: No Anti-Alias, 18 s; With Anti-Alias, 5 m
32 s (332 s)
MacBook Pro (Core Duo 1.67 GHz)
FD 2.0b5: No Anti-Alias, 37 s; With
Anti-Alias:, 9 m 57 s (597 s)
FD 2.0: No Anti-Alias, 6.7 s; With Anti-Alias, 2 m
1 s (121 s)
Mac Pro (Xeon 2.66Ghz x 4)
FD 2.0b5: No Anti-Alias, 19 s; With
Anti-Alias, 5 m 9 s (309 s)
FD 2.0: No Anti-Alias, 2.4 s; With Anti-Alias, 45
s
The results are shown below in graphical form. After the
graphs I have some additional remarks.
Elapsed
time to render fractal image (shorter is better)
Let's look for a moment at the results for G4 only -- since
both the old and new FD (Fractal Domains) run natively
there, we can compare "apples to apples" so to speak and
see the effects of the implementation of multithreading.
For the "no anti-aliasing" case we see that the ratio of
rendering times for FD 2.0b5 and FD 2.0 is 36 seconds to 18
seconds or exactly two to one. This is exactly what you
would expect since 2.0 is using two threads for calculation
whereas 2.0b5 uses one, and the computer has two
processors.
For the "anti-aliasing" case, however, the ratio of 573 s
to 332 s is only about 1.7 to 1.
In both cases, the optimum performance is actually not
being achieved because Fractal Domains is based on a
framework in which threading was inherently "cooperative"
due to the old pre-OS X way of doing things. In this
context, preemptive threads were installed for computation,
but the design couldn't be made entirely preemptive due to
the way the old framework is designed. Therefore, there is
some overhead involved in supporting this old design, and
the overhead happens to have a greater effect in the code
that implements the anti-aliased rendering.
Moving on to the Intel machines, we can see that for the
MacBook Pro, non-anti-alias case, the ratio is 37 s to 6.7
s or 5.5 to 1. The speed up is due to two factors, because
2.0b5 is handicapped both by being single-threaded and by
needing to run in emulation under Rosetta. If we assume
that the speed up due to multi-threading is 2 to 1, there
is an additional speed-up of about 2.7 to 1 due to running
native on the Intel processor.
On the MacPro, non-anti-alias case, the ratio is 19 s to
2.4 s or 7.9 to 1. Although the Mac Pro has double the
number of cores as the MacBook Pro has, this ratio did not
itself double. This is due to the limitations of the old
design, as mentioned above, which prevents Fractal Domains
from utilizing all cores close to 100%. In the Activity
Monitor, which shows CPU usage of individual processes,
Fractal Domains 2.0 never achieves above 300% (the maximum
for a Mac Pro would be 4 x 100% = 400%).
Note that for the anti-alias case the performance ratio of
2.0 to 2.0b5 is 4.9 to 1 for the MacBook Pro and 6.9 to 1
for the Mac Pro. As in the case of the G4, the additional
oomph from multiple processors is somewhat less for
anti-alias rendering.
The redesigned program Fractal Domains X will not suffer
from these limitations and should achieve higher absolute
performance numbers and higher CPU utilization rates. This
will be especially important for the anti-alias case where
you need the extra speed the most.
Finally, we can look at the performance of the G4 vs.
Intel, but this is not very meaningful since we are
comparing a four-year old computer to the latest/greatest
Intel chips. It would be more interesting to run the
benchmark on a dual G5 tower, but unfortunately I don't
have one available at the time of this writing.
Since FD 2.0b5 had almost identical rendering times on the
1GHz G4 and the 1.86GHz MacBook Pro and almost half the
rendering time on the Mac Pro, you could at least draw the
conclusion that for someone with a machine in the class of
the Dual 1GHz G4, running PowerPC-only apps on MacBook Pro
will not result in much of a slowdown and will actually be
faster on the Mac Pro.
Even this conclusion is probably not justified based on
this one benchmark, since rendering fractals is far from an
ordinary application -- it is much more CPU-intensive than
most tasks and generally does very little in the way of
memory access. Of course, the latter fact would actually
favor the G4, since it suffers a large handicap compared to
the newer chips in accessing external memory, and Fractal
Domains doesn't make it do that much compared with more
typical applications.