ANZIAM J. 42 (E) ppC1328--C1355, 2000.
This paper describes the design, implementation and performance of
parallel direct dense symmetric-indefinite matrix factorisation
algorithms. These algorithms use the Bunch-Kaufman diagonal pivoting
method. The starting point is numerically identical to
LAPACK _sytrf() algorithm, but
out-performs zsytrf() by approximately 15% for large matrices on
the UltraSPARC family of processors. The first variant reduces
symmetric interchanges, particularly important for parallel
implementation, by taking into account the growth attained by any
preceding columns that did not require any interchanges. However, it
achieves the same growth bound.
The second variant uses a lookahead technique with heuristic methods to predict whether interchanges are required over the next block column; if so, the block column can be eliminated using modified Cholesky methods, which can yield both computational and communication advantages.
These algorithms yield best performance gains on `weakly indefinite' matrices (i.e. those which have generally large diagonal elements), which often arise from electro-magnetic field analysis applications. On UltraSPARC processors, the first variant generally achieves a 1--2% performance gain; the second is faster still for large matrices by 2% for complex double precision and 6% for double precision.
However, larger performance gains are observed on distributed memory machines, where symmetric interchanges are relatively more expensive. On a 16 node 300MHz UltraSPARC-based Fujitsu AP3000, the first variant achieved a 10-15% improvement for small-moderate sized matrices, decreasing to 7% for large matrices. For N=10000, it achieved a sustained speed of 5.6GFLOPs and a parallel speedup of 12.8.
Published 25 December, 2000
Last Modified: Wed Dec 20 15:15:33 2000