Committee Chair
Skjellum, Anthony
Committee Member
Kandah, Farah; Ward, Michael
College
College of Engineering and Computer Science
Publisher
University of Tennessee at Chattanooga
Place of Publication
Chattanooga (Tenn.)
Abstract
The polyalgorithm library, originally designed in 1991-1993 by Robert Falgout, Jin Li, and Anthony Skjellum, includes fourteen dense matrix multiplication algorithms mapped onto two-dimensional process grids using the Message Passing Interface (MPI). This thesis' goal is to achieve optimized performance of parallel, dense linear algebra algorithms by varying the algorithm as a function of problem size, shape, data layout, concurrency, and architecture. We integrate these algorithms with an intra-node BLAS DGEMM kernel designed by Thomas Hines (Tennessee Tech), which improves the BLAS DGEMM performance in fat-by-thin dense matrix multiplication region. We add a rank-k-based SUMMA algorithm, which performs better than rank-1-based SUMMA. We studied performance on two cluster systems and results show the performance and improvements achieved. We compare and contrast our results with COSMA, a recent, highly optimized approach, and verify that COSMA, using optimal 3D grid decompositions, has significant advantages provided its preferred data layouts can be used.
Degree
M. S.; A thesis submitted to the faculty of the University of Tennessee at Chattanooga in partial fulfillment of the requirements of the degree of Master of Science.
Date
12-2020
Subject
Mathematical optimization; Multiplication, Complex
Document Type
Masters theses
DCMI Type
Text
Extent
xv, 119 leaves
Language
English
Rights
http://rightsstatements.org/vocab/InC/1.0/
License
http://creativecommons.org/licenses/by/4.0/
Recommended Citation
Nansamba, Grace, "Second-generation polyalgorithms for parallel dense-matrix multiplication" (2020). Masters Theses and Doctoral Dissertations.
https://scholar.utc.edu/theses/680
Department
Dept. of Computer Science and Engineering