On Heterogeneous Compute and Memory Systems

Local Download


Computer systems are at a crossroads. Instead of counting on low-level device improvements from the semiconductor industry, processors are increasingly heterogeneous, using specialized accelerators to increase performance and energy efficiency. Simultaneously, big-data is enabling new applications that consume terabytes of data in real-time, and the Internet of things is driving new data growth by enabling developers to consume data from millions of devices and users. The increasing hardware heterogeneity driven by these technology trends puts significant burdens on the application programmers. For instance, with current interfaces, developers must manually manage all aspects of accelerator computation including explicitly moving data even when the physical memory is shared between devices.

In this thesis, we increase the logical integration of physically integrated on-chip accelerators by extending conventional CPU properties to accelerators. We specifically target on-die graphics processing units (GPUs) which are found on most products from AMD, Intel, and mobile systems and are increasingly used for general-purpose computation. Logical integration between the CPU and GPU simplifies programming these heterogeneous devices. We leverage the logically integrated systems enabled by this research to improve the performance of big-data applications. We show that using integrated GPUs for analytic database workloads can increase performance and decrease energy consumption.

We also evaluate the efficacy of designing systems with heterogeneous memory in addition to heterogeneous computational units. We show that including high-bandwidth 3D-stacked DRAM can significantly improve the performance and reduce the energy consumed for analytic database workloads and other bandwidth-constrained workloads. Finally, we propose a new metric, access amplification, to help system designers reason about the best policies for using heterogeneous memory as a hardware-managed cache. Using access amplification, we design an adaptive victim cache policy which increases the performance of a DRAM cache compared to current designs.

    author = {Jason Lowe-Power},
    school = {University of Wisconsin, Madison},
    year = {2017},
    title = {On Heterogeneous Compute and Memory Systems},