Committee Chair

Skjellum, Anthony

Committee Member

Kandah, Farah; Ward, Michael

Department

Dept. of Computer Science and Engineering

College

College of Engineering and Computer Science

Publisher

University of Tennessee at Chattanooga

Place of Publication

Chattanooga (Tenn.)

Abstract

The polyalgorithm library, originally designed in 1991-1993 by Robert Falgout, Jin Li, and Anthony Skjellum, includes fourteen dense matrix multiplication algorithms mapped onto two-dimensional process grids using the Message Passing Interface (MPI). This thesis' goal is to achieve optimized performance of parallel, dense linear algebra algorithms by varying the algorithm as a function of problem size, shape, data layout, concurrency, and architecture. We integrate these algorithms with an intra-node BLAS DGEMM kernel designed by Thomas Hines (Tennessee Tech), which improves the BLAS DGEMM performance in fat-by-thin dense matrix multiplication region. We add a rank-k-based SUMMA algorithm, which performs better than rank-1-based SUMMA. We studied performance on two cluster systems and results show the performance and improvements achieved. We compare and contrast our results with COSMA, a recent, highly optimized approach, and verify that COSMA, using optimal 3D grid decompositions, has significant advantages provided its preferred data layouts can be used.

Degree

M. S.; A thesis submitted to the faculty of the University of Tennessee at Chattanooga in partial fulfillment of the requirements of the degree of Master of Science.

Date

12-2020

Subject

Mathematical optimization; Multiplication, Complex

Keyword

data distribution independence; distributed systems; fat-by-thin; high-performance computing; linear algebra libraries; polyalgorithms; parallel dense-matrix multiplication

Document Type

Masters theses

DCMI Type

Text

Extent

xv, 119 leaves

Language

English

Rights

http://rightsstatements.org/vocab/InC/1.0/

License

http://creativecommons.org/licenses/by/4.0/

Share

COinS