Perfector

EP Analytics’ Perfector (for Performance Inspector) tool helps maximize the utility of each dollar spent on large-scale HPC system procurements. Perfector achieves this goal by deploying ultra light-weight tools to automatically analyze the communication, computation and Input/Output (I/O) behavior of large HPC codes. The analyzed behavior is presented to the code developers and system administrators in the form of viewgraphs that are intuitive and actionable. Below we highlight the key aspects/features of Perfector.

Collection of performance behavior is transparent to the user and the process only requires making a few simple changes to the job script (e.g., loading a module).

Often load imbalance (i.e., disproportionate sharing of work across multiple compute resources) is one of the main reasons that impede the scalability of HPC applications. Perfector collects per-MPI-rank-level performance statistics to that provide load imbalance information by also taking into account the time spent in implicit synchronization during MPI collective operations.

MPI Profile

Perfector can generate an execution profile of HPC applications; shown here is the breakdown of overall application time into computational times and communication event times. The latter is further broken down to show time spent in implicit synchronization.

MPI Profile - Deep Dive

A deep-dive into the MPI profile of specific ranks. Two ranks (9 and 255) show vastly different profiles and Perfector's data helps analyze the root causes.

MPI "Hot" Sites

Perfector can collect information to characterize “hot” MPI call sites; i.e., specific call sites in source code that account for a majority of time spent in communication. In the Figure, func_10 accounts for 26% of the total communication time.