Karman, Steve L., Jr.
Newman, James C. III; Park, Michael A.; Tanis, Craig R.
College of Engineering and Computer Science
University of Tennessee at Chattanooga
Place of Publication
Automatic and parallel mesh generation has been highlighted as a bottleneck for large scale automated Computational Fluid Dynamics analysis. The desire for large scale automated CFD is driven by the growing computational capabilities in large scale supercomputers. Unfortunately, as compute clusters grow in size, they also suffer more failures. Left unchecked, the increased frequency of failures may stymie any efforts to fully utilize these machines. This work aims to tackle one component required for automated large scale engineering analysis by developing a fault tolerant mesh generator. The mesh generator uses a novel com- munication layer written using the transport layer ZeroMQ and is made fault tolerant through an integrated in-memory checkpoint and recovery strategy. Benefits of using in-memory checkpoints vs traditional in-disk checkpoints are discussed. By relying on in-memory checkpointing, it is demonstrated that the mesh generator to be capable of generating Cartesian meshes in parallel. The generator continues to operate even while the compute cluster it is running suffers failures. The generator is shown to be high performing, including being capable of generating an 8.6 billion element mesh in just over 1 minute while creating multiple in-memory checkpoints.
This work represents not only long hours at the keyboard, but a rich array of conversations, encouragement, triumphs, and failures. No man is an island. First, I would like to thank my advisor, Dr. Steve Karman, for his inexhaustible patience, and guidance. Additionally, I would like to thank the committee members: Dr. Park, Dr. Newman, and Dr. Tanis. Each helped me keep an eye on target. I would like to thank the Computational Aerosciences Branch at NASA Langley Research Center for the support I received, including access to the compute hardware that was critical to conduct this research. But beyond funding and computer resources, this work would not have been possible without the invigorating conversations, mentoring, and culture of excellence of this fantastic group. I also want to thank the UTC SimCenter for the support throughout my graduate work. And to each of the professors with a contagious excitement for high fidelity simulations. A special thanks goes out to Kim Sapp for all the help navigating between two organizations. A big thank you to Judith Hill of the Scientific Computing Group at Oak Ridge National Laboratory for the conversations and insights into current and future hardware on leadership class computer systems.
Ph. D.; A dissertation submitted to the faculty of the University of Tennessee at Chattanooga in partial fulfillment of the requirements of the degree of Doctor of Philosophy.
Numerical grid generation (Numerical analysis)
xvii, 153 leaves
O'Connell, Matthew D., "A fault tolerant grid generation technique" (2016). Masters Theses and Doctoral Dissertations.