07/06/2012: Elemental 0.75-p1¶
New functionality¶
SVD support through the bidiagonal QR algorithm. If libFLAME is linked, a high-performance QR algorithm will be used.
Pseudoinverses and polar decompositions through the new SVD routine
QR-based Dynamically-Weighted Halley iteration (QDWH) for computing the polar decomposition, with versions for both general and Hermitian matrices
Support for fast expansions of packed Householder reflectors for a few cases (i.e., those needed for QR and LQ decompositions)
Explicit QR and LQ decompositions
Cheap two-norm estimates
‘Norm’ now supports all DistMatrix distributions, instead of just [MC,MR]
DistMatrix now supports ‘viewing’ processes that do not actively own data; this makes temporarily distributing to a subset of processes (e.g., a perfect square) less of a hack
MakeHermitian, MakeSymmetric, and MakeReal were added
LUSolve was added for solving systems using an existing LU factorization, with or without partial pivoting
The routine Hetrmm, for forming one half of the Hermitian result L^H L or U U^H, was generalized to also support symmetric updates and the name was changed to Trtrmm
The routine Trdtrmm was added in order to aid in the inversion of symmetric/Hermitian-indefinite matrices and forms L^H inv(D) L or U inv(D) U^H (or the symmetric counterpart)
Performance improvements¶
Faster ApplyPackedReflectors implementations
Many variants of Gemm are now faster due to avoiding cache-unfriendly redistributions
Bug fixes¶
Fixed subtle issue in Householder reflection generation when the norm of the lower part of the vector was zero
Fixed namespacing complaints from new versions of GCC and Clang
Fixed mistakes in 1-2-1 and Wilkinson matrix generation
Fixed missing installation of FCMangle.h and cmake-dummy-lib
Fixed leakage of viewingGroup in the Grid destructor
Fixed mistake in parallel Adjoint and Transpose routines
Avoided bug in CMake’s enable_language OPTIONAL argument
Semantic changes¶
Shortened ‘SetLocalEntry’ and friends to the form ‘SetLocal’ in order to be more consistent with the distributed equivalent, ‘Set’
Expanded routines for extracting real and imaginary parts of complex data from the form ‘Real’ to ‘RealPart’
Shortened many redundant filenames
People involved¶
Robert van de Geijn, Field van Zee, and Gregorio Quintana Orti were involved in Elemental’s support for FLAME’s high-performance bidiagonal QR algorithm
Yuji Nakatsukasa, Gregorio Quintana Orti, and Robert van de Geijn were involved in the QDWH implementation
Bryan Marker noticed the cache-unfriendly redistributions in Gemm and Trsm
Jed Brown submitted patches for the FCMangle.h and cmake-dummy-lib installation issues, as well as the viewingGroup leakage in the Grid class
If I missed you, I apologize! Please let me know and I will add you!