# **Toward Rethinking the Management Techniques in Emerging Memory Systems**



### **Motivation**

- HPC systems rely on heterogeneous memories.
- Intel's Knights Landing, Cascade Lake, Sapphire Rapids
- □ In these systems, fast memories can be used as **DRAM cache** to slow memories.
- Disaggregated memory resources also will use local DRAM as a cache to a remote memory.
- We need to rethink the memory managements.
- □ However, there is not an accurate model in the research community.



We extend gem5 to enable design space exploration of future heterogeneous/disaggregated memory systems. We add support for:

- **1. DRAM Cache**
- 2. HBM2 interface and controller
- 3. Modular memory controller design

We refactored gem5's memory controller to extend modularity, as follows:



### A. Dedicated DRAM Cache Controller

#### 1. Unified Cache/Memory Controller (UDCC)

- Connection: shared bus
- Models Intel's Cascade Lake



# **B. Decoupled DRAM Cache Policy Manager**



# **Performance of HPC Applications**

- **NPB** and **GAPBS**
- Million instructions per second (MIPS) and the DRAM cache misses per thousand (MPKI) values reported.
- UDCC was configured as Cascade Lake.

Maryam Babaie, Ayaz Akram, Jason Lowe-Power DArchR Research Group, Computer Science Department, University of California Davis

gem5 DRAM Cache Support

• Tightly-coupled DRAM cache and main memory

Supports multiple cache architectures in parallel

Far

Memory

Controller

Far

Memory

Interface



#### 2. Disaggregated DRAM Cache Controller

- Flexible combination of interfaces
- Connection: configurable link



## Verification

- Left: Peak BW of the DRAM cache VS gem5's default memory controller (DMC).
- Right: Access amplification of our model VS the real hardware [1].
- > Write\_Misses Case: in real hardware write-fills and datawrites are not merged, where in our model they are.



instructions are

On DRAM cache, most workloads performed worse than DRAM and NVM main memory, due to high miss rate caused rigid cache architecture [1].





### **Performance of Alloy DRAM Cache**

- Decoupled DRAM Cache Policy Manager configured as the architecture of Intel's Cascade Lake, without partial-writes.
- The DRAM cache interfaces: Original Alloy vs Alloy using ECC bits
- Main memory interface: DDR4
- The results show up-to 20% BW degradation for Original Alloy compared to Alloy using ECC bits.



## Link Latency Case Study

• Using an HBM cache, backed by a (i) DDR4 and (ii) NVM main memory through a link, for a read-only miss-clean traffic. Ear Link Tot Avg Resp

|                                                                                                                                               | Far  | LINK    | lot.   | Avg. Resp. |
|-----------------------------------------------------------------------------------------------------------------------------------------------|------|---------|--------|------------|
|                                                                                                                                               | Mem. | Latency | BW     | Time       |
| On lower link<br>latency, far NVM<br>performs better<br>than far DDR4.<br>For higher link<br>latency, NVM<br>performs closely<br>to the DDR4. |      | (µs)    | (GB/s) | (μs)       |
|                                                                                                                                               | DDR4 | No Link | 6.31   | 1.484      |
|                                                                                                                                               |      | 0.2     | 5.86   | 1.599      |
|                                                                                                                                               |      | 1       | 5.61   | 1.833      |
|                                                                                                                                               |      | 2.5     | 5.20   | 2.985      |
|                                                                                                                                               |      | 5       | 3.04   | 5.346      |
|                                                                                                                                               | NVM  | No Link | 6.03   | 2.489      |
|                                                                                                                                               |      | 0.2     | 6.03   | 2.487      |
|                                                                                                                                               |      | 1       | 6.03   | 2.491      |
|                                                                                                                                               |      | 2.5     | 4.86   | 3.269      |
|                                                                                                                                               |      | 5       | 2.99   | 5.460      |

### Conclusion

- In this work, we introduced heterogenous memory modeling support in gem5.
- The models we described in this work enable research opportunities for next generation of heterogenous and disaggregated HPC systems.

#### Reference

[1] M. Hildebrand, J. T. Angeles, J. Lowe-Power, and V. Akella, "A Case Against Hardware Managed DRAM Caches for NVRAM Based Systems," in 2021 ISPASS.

