I am a computational physics postgrad student, working with libraries like ATLAS and MAGMA. I have a matrix which is upper-triangular, and is the result of a Cholesky decomposition. I need to convert the upper-triangular matrix in to a symmetric, square matrix where the elements of the lower triangle are
mat(j,i) = mat(i,j)
I have a naive routine in C++ that simply does the above in a loop, but it's incredibly slow. I believe it's so slow because the CPU cache is poorly utilised - element i,j is quite far from element j,i.
Is there a clever mathematical trick that I can use to optimise this loop? Or alternatively, does anyone know of any fuctions in BLAS/similar libraries that can do this kind of operation?
Many thanks in advance for any advice you can give.