Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems
Paper on Springer Local Paper Download Code available on GitHub
We propose a new data structure called CachedEmbeddings for training large scale deep learning recommendation models (DLRM) efficiently on heterogeneous (DRAM + non-volatile) memory platforms. CachedEmbeddings implements an implicit software-managed cache and data movement optimization that is integrated with the Julia programming framework to optimize the implementation of large scale DLRM implementations with multiple sparse embedded tables operations. In particular we show an implementation that is 1.4X to 2X better than the best known Intel CPU based implementations on state-of-the-art DLRM benchmarks on a real heterogeneous memory platform from Intel, and 1.32X to 1.45X improvement over Intel’s 2LM implementation that treats the DRAM as a hardware managed cache.
CachedEmbeddings builds on top of CachedArrays
@inproceedings{hildebrand23cachedembeddings,
author = {Hildebrand, Mark and Lowe-Power, Jason and Akella, Venkatesh},
title = {Efficient Large Scale DLRM Implementation On Heterogeneous Memory Systems},
year = {2023},
isbn = {978-3-031-32040-8},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
url = {https://doi.org/10.1007/978-3-031-32041-5_3},
doi = {10.1007/978-3-031-32041-5_3},
booktitle = {High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings},
pages = {42–61},
numpages = {20},
location = {Hamburg, Germany}
}
Comments