Phil Miller

Phil Miller

San Diego Metropolitan Area
1K followers 500+ connections

About

I'm available and eager to help with any of the following:

Software performance…

Services

Activity

Join now to see all activity

Experience

  • Lynker Graphic
  • -

  • -

    Urbana-Champaign, Illinois Area

  • -

    Champaign, IL

  • -

  • -

  • -

Education

Publications

  • Position Paper: A Multi-resolution Emulation + Simulation Methodology

    MODSIM 2013

    As we design exascale applications and machines, it becomes important to be able to analyze and experiment with alternate designs of both machines and applications. These experiments have to be done before the machines are built since it will be too expensive to build a large number of alternate designs. One of the challenges in this process is how to represent application behavior in such machines. For analyzing network performance via simulations, for example, one can use pre-designed…

    As we design exascale applications and machines, it becomes important to be able to analyze and experiment with alternate designs of both machines and applications. These experiments have to be done before the machines are built since it will be too expensive to build a large number of alternate designs. One of the challenges in this process is how to represent application behavior in such machines. For analyzing network performance via simulations, for example, one can use pre-designed injection patterns, but they do not capture the feedback that occurs naturally in applications: if an incoming message is late, the ordering of events may change, and outgoing message injection will also change. To achieve a high fidelity simulation is therefore challenging. One method that has shown promise is that of emulation- followed-by-simulation: one carries out a full-scale emulation of the application with the correct number of nodes and control threads, facilitated by some overdecomposition based system such as Charm++ [1], FG-MPI[2], or AMPI [3]. The emulation captures dependencies between sequential computations and remote data in traces. The traces generated by emulation can then be fed to a multi-component simulator, where a variable resolution simulation can be carried out to predict performance and other attributes. We advocate this methodology and elaborate on research challenges involved in following it in exascale design. At exascale, we expect the components, which are pluggable entities similar to those used in existing frame- works such as BigSim [4, 5], SST [6], to simulate network, resilience support, power management, thermal constraints, operating system and file system. In addition, the adaptive runtime system, essential for scalable execution at exascale, needs to be (and can be) simulated in detail, with realistic code and strategies, in order to attain high fidelity.

    Other authors
    See publication
  • `Cool' Load Balancing for High Performance Computing Data Centers

    IEEE Transactions on Computers

    As we move to exascale machines, both peak power demand and total energy consumption have become prominent challenges. A significant portion of that power and energy consumption is devoted to cooling, which we strive to minimize in this work. We propose a scheme based on a combination of limiting processor temperatures using Dynamic Voltage and Frequency Scaling (DVFS) and frequency-aware load balancing that reduces cooling energy consumption and prevents hot spot formation. Our approach is…

    As we move to exascale machines, both peak power demand and total energy consumption have become prominent challenges. A significant portion of that power and energy consumption is devoted to cooling, which we strive to minimize in this work. We propose a scheme based on a combination of limiting processor temperatures using Dynamic Voltage and Frequency Scaling (DVFS) and frequency-aware load balancing that reduces cooling energy consumption and prevents hot spot formation. Our approach is particularly designed for parallel applications, which are typically tightly coupled, and tries to minimize the timing penalty associated with temperature control. This paper describes results from experiments using five different CHARM++ and MPI applications with a range of power and utilization profiles. They were run on a 32-node (128-core) cluster with a dedicated air conditioning unit. The scheme is assessed based on three metrics: the ability to control processors’ temperature and hence avoid hot spots, minimization of timing penalty, and cooling energy savings. Our results show cooling energy savings of up to 63%, with a timing penalty of only 2–23%.

    Other authors
    See publication
  • Using Shared Arrays in Message-Driven Parallel Programs

    International Workshop on High-Level Parallel Programming Models and Supportive Environments at IPDPS (HIPS) 2011

    Superseded by journal version in Parallel Computing

    This paper describes a safe and efficient combination of the object-based message-driven execution and shared array parallel programming models. In particular, we demonstrate how this combination engenders the composition of loosely coupled parallel modules safely accessing a common shared array. That loose coupling enables both better flexibility in parallel execution and greater ease of implementing multi-physics simulations. As a…

    Superseded by journal version in Parallel Computing

    This paper describes a safe and efficient combination of the object-based message-driven execution and shared array parallel programming models. In particular, we demonstrate how this combination engenders the composition of loosely coupled parallel modules safely accessing a common shared array. That loose coupling enables both better flexibility in parallel execution and greater ease of implementing multi-physics simulations. As a case study, we describe how the parallelization of a new method for molecular dynamics simulation benefits from both of these advantages. We also describe a system of typed handle objects that embed some of the determinacy constraints of the Multiphase Shared Array programming model in the C++ type system, to catch some violations at compile time. The combined programming model communicates in terms of these handles as a natural means of detecting and preventing errors.

    Other authors
    See publication
  • PGAS in the message-driven execution model

    1st Workshop on Asynchrony in the PGAS Programming Model APGAS

    Asynchrony is increasingly important for high performance on modern parallel machines. A common approach to providing asynchrony in PGAS languages is to add additional language constructs to support asynchronous execution. In this paper we describe Multiphase Shared Arrays (MSA), a restricted PGAS programming model that takes the opposite approach, layering PGAS semantics over a fundamentally asynchronous runtime environment. We sidestep many of the difficulties of asynchronous programming…

    Asynchrony is increasingly important for high performance on modern parallel machines. A common approach to providing asynchrony in PGAS languages is to add additional language constructs to support asynchronous execution. In this paper we describe Multiphase Shared Arrays (MSA), a restricted PGAS programming model that takes the opposite approach, layering PGAS semantics over a fundamentally asynchronous runtime environment. We sidestep many of the difficulties of asynchronous programming through a discipline that offers desirable safety properties while exposing opportunities for optimization at multiple levels. We retain generality by offering composability with general purpose parallel programming models.

    Other authors
    See publication

Projects

  • National Water Model NextGen Framework

  • WarpX

    -

    Contribute to simulation and analysis capabilities desired by my client, Modern Electron. Focus on C++/Python interfacing, extending physics capabilities, and enhancing performance for their use cases that diverged substantially from the upstream team's focus.

    See project
  • CHIME

    -

    Contribute to accuracy, testing and validation, performance, and overall capabilities in a rapid response project to support hospitals and public health officials in forecasting the magnitude of short-term impacts they could expect in the early stages of the COVID pandemic.

    See project

Languages

  • English

    -

Organizations

  • ACM, IEEE, SIAM

    -

More activity by Phil

View Phil’s full profile

  • See who you know in common
  • Get introduced
  • Contact Phil directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses