Publisher

University of Tennessee at Chattanooga

Place of Publication

Chattanooga (Tenn.)

Abstract

Kokkos provides in-memory advanced data structures, concurrency, and algorithm to support performance portable C++ parallel pro- gramming across CPUs and GPUs. MPI provides the most widely used message passing model for inter-node communication. Many programmers use both Kokkos and the Message Passing Interface (MPI) together. In this thesis, Kokkos is integrated within an MPI im- plementation to obtain performance and functionality benefits both for the MPI itself, and for applications that use both Kokkos+MPI. For instance, it will be possible in this model to pass first-class Kokkos objects directly to extended C++-based MPI APIs. In particular, efforts to achieve this type of integrated model is expressed using ExaMPI, a C++17-based subset implementation of MPI-4 developed at UTC with collaborators. Working with C++- friendly APIs, and Kokkos extensions, examples of the benefits of functionality and performance are shown. We explain why direct use of Kokkos within the certain parts of the MPI implementation are crucial to getting added performance in addition to expressivity. We also motivate why making Kokkos memory spaces visible to the MPI implementation generalizes the idea of “CPU memory” and “GPU memory” in ways that provide for further optimizations in heterogeneous Exascale architectures. Besides showing the current state of the prototype, we describe future goals, and show how these mesh both with a possible future C++ API for MPI-5 as well as the potential for accelerating MPI on architectures that incorporate accelerators

Document Type

posters

Language

English

Rights

http://rightsstatements.org/vocab/InC/1.0/

License

http://creativecommons.org/licenses/by-nc-nd/4.0/

COinS
 

Kokkos-Enhanced ExaMPI: Modern Parallel Programming for Exascale

Kokkos provides in-memory advanced data structures, concurrency, and algorithm to support performance portable C++ parallel pro- gramming across CPUs and GPUs. MPI provides the most widely used message passing model for inter-node communication. Many programmers use both Kokkos and the Message Passing Interface (MPI) together. In this thesis, Kokkos is integrated within an MPI im- plementation to obtain performance and functionality benefits both for the MPI itself, and for applications that use both Kokkos+MPI. For instance, it will be possible in this model to pass first-class Kokkos objects directly to extended C++-based MPI APIs. In particular, efforts to achieve this type of integrated model is expressed using ExaMPI, a C++17-based subset implementation of MPI-4 developed at UTC with collaborators. Working with C++- friendly APIs, and Kokkos extensions, examples of the benefits of functionality and performance are shown. We explain why direct use of Kokkos within the certain parts of the MPI implementation are crucial to getting added performance in addition to expressivity. We also motivate why making Kokkos memory spaces visible to the MPI implementation generalizes the idea of “CPU memory” and “GPU memory” in ways that provide for further optimizations in heterogeneous Exascale architectures. Besides showing the current state of the prototype, we describe future goals, and show how these mesh both with a possible future C++ API for MPI-5 as well as the potential for accelerating MPI on architectures that incorporate accelerators