HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads

Pouya Fotouhi, Marjan Fariborz, Roberto Proietti, Jason Lowe-Power, Venkatesh Akella, S. J. Ben Yoo ISC-HPC 2021.

Paper on ACM DL Local Download

Abstract

We propose a new architecture called HTA for high throughput irregular HPC applications with little data reuse. HTA reduces the contention within the memory system with the help of a partitioned memory controller that is amenable for 2.5D implementation using Silicon Photonics. In terms of scalability, HTA supports 4× higher number of compute units compared to the state-of-the-art GPU systems. Our simulation-based evaluation on a representative set of HPC benchmarks shows that the proposed design reduces the queuing latency by 10% to 30%, and improves the variability in memory access latency by 10% to 60%. Our results show that the HTA improves the L1 miss penalty by 2.3× to 5× over GPUs. When compared to a multi-GPU system with the same number of compute units, our simulation results show that the HTA can provide up to 2× speedup.

@inproceedings{10.1007/978-3-030-78713-4_10,
    author = {Fotouhi, Pouya and Fariborz, Marjan and Proietti, Roberto and   Lowe-Power, Jason and Akella, Venkatesh and Yoo, S. J. Ben},
    title = {HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads},
    year = {2021},
    isbn = {978-3-030-78712-7},
    publisher = {Springer-Verlag},
    address = {Berlin, Heidelberg},
    url = {https://doi.org/10.1007/978-3-030-78713-4_10},
    doi = {10.1007/978-3-030-78713-4_10},
    booktitle = {High Performance Computing: 36th International Conference, ISC High Performance 2021, Virtual Event, June 24 – July 2, 2021, Proceedings},
    pages = {176–194},
    numpages = {19}
}

Updated:

Comments