Publisher
University of Tennessee at Chattanooga
Place of Publication
Chattanooga (Tenn.)
Abstract
Kokkos provides in-memory advanced data structures, concurrency, and algorithm to support performance portable C++ parallel pro- gramming across CPUs and GPUs. MPI provides the most widely used message passing model for inter-node communication. Many programmers use both Kokkos and the Message Passing Interface (MPI) together. In this thesis, Kokkos is integrated within an MPI im- plementation to obtain performance and functionality benefits both for the MPI itself, and for applications that use both Kokkos+MPI. For instance, it will be possible in this model to pass first-class Kokkos objects directly to extended C++-based MPI APIs. In particular, efforts to achieve this type of integrated model is expressed using ExaMPI, a C++17-based subset implementation of MPI-4 developed at UTC with collaborators. Working with C++- friendly APIs, and Kokkos extensions, examples of the benefits of functionality and performance are shown. We explain why direct use of Kokkos within the certain parts of the MPI implementation are crucial to getting added performance in addition to expressivity. We also motivate why making Kokkos memory spaces visible to the MPI implementation generalizes the idea of “CPU memory” and “GPU memory” in ways that provide for further optimizations in heterogeneous Exascale architectures. Besides showing the current state of the prototype, we describe future goals, and show how these mesh both with a possible future C++ API for MPI-5 as well as the potential for accelerating MPI on architectures that incorporate accelerators
Document Type
posters
Language
English
Rights
http://rightsstatements.org/vocab/InC/1.0/
License
http://creativecommons.org/licenses/by-nc-nd/4.0/
Recommended Citation
Suggs, Evan Drake, "Kokkos-Enhanced ExaMPI: Modern Parallel Programming for Exascale". ReSEARCH Dialogues Conference proceedings. https://scholar.utc.edu/research-dialogues/2023/proceedings/4.
Kokkos-Enhanced ExaMPI: Modern Parallel Programming for Exascale
Kokkos provides in-memory advanced data structures, concurrency, and algorithm to support performance portable C++ parallel pro- gramming across CPUs and GPUs. MPI provides the most widely used message passing model for inter-node communication. Many programmers use both Kokkos and the Message Passing Interface (MPI) together. In this thesis, Kokkos is integrated within an MPI im- plementation to obtain performance and functionality benefits both for the MPI itself, and for applications that use both Kokkos+MPI. For instance, it will be possible in this model to pass first-class Kokkos objects directly to extended C++-based MPI APIs. In particular, efforts to achieve this type of integrated model is expressed using ExaMPI, a C++17-based subset implementation of MPI-4 developed at UTC with collaborators. Working with C++- friendly APIs, and Kokkos extensions, examples of the benefits of functionality and performance are shown. We explain why direct use of Kokkos within the certain parts of the MPI implementation are crucial to getting added performance in addition to expressivity. We also motivate why making Kokkos memory spaces visible to the MPI implementation generalizes the idea of “CPU memory” and “GPU memory” in ways that provide for further optimizations in heterogeneous Exascale architectures. Besides showing the current state of the prototype, we describe future goals, and show how these mesh both with a possible future C++ API for MPI-5 as well as the potential for accelerating MPI on architectures that incorporate accelerators