TransposeAxpyContract

Perform \(B := \alpha \sum_i A_i^T + B\) or \(B := \alpha \sum_i A_i^H + B\), where the summation is performed over the local data of each member of the process team that was redundantly assigned entries of \(A\) but is not redundantly assigned entries of \(B\). Thus, in the general case where each column and row of \(A\) is respectively distributed over the process sets \(U_0 \times U_1\) and \(V_0 \times V_1\), while each column and row of \(B\) is respectively distributed over \(U_0\) and \(V_0\), then the result is of the form

\[B := \alpha \sum_{i \in U_1 \times V_1} A_i^T + B\]

or

\[B := \alpha \sum_{i \in U_1 \times V_1} A_i^H + B.\]

C++ API

void TransposeAxpyContract(T alpha, const ElementalMatrix<T> &A, ElementalMatrix<T> &B, bool conjugate = false)
void TransposeAxpyContract(T alpha, const BlockMatrix<T> &A, BlockMatrix<T> &B, bool conjugate = false)

C API

TODO

Python API

TODO