Tuning parameters¶
Hermitian to tridiagonal¶
Two different basic strategies are available for the reduction to tridiagonal form:
Run a pipelined algorithm designed for general (rectangular) process grids.
Redistribute the matrix so that it is owned by a perfect square number of processes, perform a fast reduction to tridiaogal form, and redistribute the data back to the original process grid. This algorithm is essentially an evolution of the HJS tridiagonalization approach (see “Towards an efficient parallel eigensolver for dense symmetric matrices” by Bruce Hendrickson, Elizabeth Jessup, and Christopher Smith) which is described in detail in Ken Stanley’s dissertation, “Execution time of symmetric eigensolvers”.
There is clearly a small penalty associated with the extra redistributions
necessary for the second approach, but the benefit from using a square process
grid is usually quite signficant. By default, HermitianTridiag()
will
run the standard algorithm (approach 1) unless the matrix is already distributed
over a square process grid. The reasoning is that good performance depends upon
a “good” ordering of the square (say, \(\hat p \times \hat p\)) subgrid,
though usually either a row-major or column-major ordering of the first
\(\hat p^2\) processes suffices.
-
type
HermitianTridiagApproach
¶ HERMITIAN_TRIDIAG_NORMAL
: Run the pipelined rectangular algorithm.HERMITIAN_TRIDIAG_SQUARE
: Run the square grid algorithm on the largest possible square process grid.HERMITIAN_TRIDIAG_DEFAULT
: If the given process grid is already square, run the square grid algorithm, otherwise use the pipelined non-square approach.
Note
A properly tuned
HERMITIAN_TRIDIAG_SQUARE
approach is almost always fastest, so it is worthwhile to test it with both theCOLUMN_MAJOR
andROW_MAJOR
subgrid orderings, as described below.Note
The first algorithm heavily depends upon the performance of distributed
Symv()
, so users interested in maximizing the performance of the first algorithm will likely want to investigate different values for the local blocksizes through the routineSetLocalSymvBlocksize<T>( int blocksize )
; the default value is 64.