Kandah, Farah; Ward, Michael
College of Engineering and Computer Science
University of Tennessee at Chattanooga
Place of Publication
The polyalgorithm library, originally designed in 1991-1993 by Robert Falgout, Jin Li, and Anthony Skjellum, includes fourteen dense matrix multiplication algorithms mapped onto two-dimensional process grids using the Message Passing Interface (MPI). This thesis' goal is to achieve optimized performance of parallel, dense linear algebra algorithms by varying the algorithm as a function of problem size, shape, data layout, concurrency, and architecture. We integrate these algorithms with an intra-node BLAS DGEMM kernel designed by Thomas Hines (Tennessee Tech), which improves the BLAS DGEMM performance in fat-by-thin dense matrix multiplication region. We add a rank-k-based SUMMA algorithm, which performs better than rank-1-based SUMMA. We studied performance on two cluster systems and results show the performance and improvements achieved. We compare and contrast our results with COSMA, a recent, highly optimized approach, and verify that COSMA, using optimal 3D grid decompositions, has significant advantages provided its preferred data layouts can be used.
M. S.; A thesis submitted to the faculty of the University of Tennessee at Chattanooga in partial fulfillment of the requirements of the degree of Master of Science.
Mathematical optimization; Multiplication, Complex
xv, 119 leaves
Nansamba, Grace, "Second-generation polyalgorithms for parallel dense-matrix multiplication" (2020). Masters Theses and Doctoral Dissertations.