<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://arch.cs.ucdavis.edu/feed.xml" rel="self" type="application/atom+xml" /><link href="https://arch.cs.ucdavis.edu/" rel="alternate" type="text/html" /><updated>2026-03-10T13:45:44-07:00</updated><id>https://arch.cs.ucdavis.edu/feed.xml</id><title type="html">UC Davis Computer Architecture</title><subtitle>Computer Architecture Research Group at UC Davis ECE/CS</subtitle><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><entry><title type="html">Space-Control: Process-Level Isolation for Sharing CXL-based Disaggregated Memory</title><link href="https://arch.cs.ucdavis.edu/security/memory/cxl/2026/03/06/space-control.html" rel="alternate" type="text/html" title="Space-Control: Process-Level Isolation for Sharing CXL-based Disaggregated Memory" /><published>2026-03-06T00:00:00-08:00</published><updated>2026-03-06T00:00:00-08:00</updated><id>https://arch.cs.ucdavis.edu/security/memory/cxl/2026/03/06/space-control</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/security/memory/cxl/2026/03/06/space-control.html"><![CDATA[<p><a href="https://arxiv.org/abs/2603.06951" class="btn btn--primary btn--large">ArXiv</a></p>

<p>Memory disaggregation via Compute Express Link (CXL) enables multiple hosts to share remote memory, improving utilization for data-intensive workloads.
Today, virtual memory enables process-level isolation on a host and CXL enables host-level isolation.
This creates a critical security gap: the absence of process-level memory isolation in shared disaggregated memory.</p>

<p>We present Space-Control, a hardware-software co-design that provides fine-grained, process-level isolation for shared disaggregated memory.
Space-Control authenticates execution context in the hardware and enforces access control on every memory access and amortizes lookup times with a small cache.
We present Space-Control, a hardware-software co-design that provides fine-grained, process-level isolation for shared disaggregated memory.
Space-Control authenticates execution context in the hardware and enforces access control on every memory access and amortizes lookup times with a small cache.
Our design allows up to 127 processes running concurrently on 255 hosts to share memory with only 1.56% storage overhead.
In a gem5 + Structural Simulation Toolkit (SST) based CXL model, Space-Control incurs minimal performance overhead of 3.3%, making shared disaggregated memory isolation practical.</p>

<h2 id="citation">Citation</h2>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@misc</span><span class="p">{</span><span class="nl">goswami2026spacecontrolprocesslevelisolationsharing</span><span class="p">,</span>
      <span class="na">title</span><span class="p">=</span><span class="s">{Space-Control: Process-Level Isolation for Sharing CXL-based Disaggregated Memory}</span><span class="p">,</span> 
      <span class="na">author</span><span class="p">=</span><span class="s">{Kaustav Goswami and Sean Peisert and Venkatesh Akella and Jason Lowe-Power}</span><span class="p">,</span>
      <span class="na">year</span><span class="p">=</span><span class="s">{2026}</span><span class="p">,</span>
      <span class="na">eprint</span><span class="p">=</span><span class="s">{2603.06951}</span><span class="p">,</span>
      <span class="na">archivePrefix</span><span class="p">=</span><span class="s">{arXiv}</span><span class="p">,</span>
      <span class="na">primaryClass</span><span class="p">=</span><span class="s">{cs.AR}</span><span class="p">,</span>
      <span class="na">url</span><span class="p">=</span><span class="s">{https://arxiv.org/abs/2603.06951}</span><span class="p">,</span> 
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;security&quot;, &quot;memory&quot;, &quot;CXL&quot;]" /><summary type="html"><![CDATA[Kaustav Goswami, Sean Peisert, Venkatesh Akella, Jason Lowe-Power]]></summary></entry><entry><title type="html">NOVA: A Novel Vertex Management Architecture for Scalable Graph Processing</title><link href="https://arch.cs.ucdavis.edu/accelerator/2025/03/03/nova.html" rel="alternate" type="text/html" title="NOVA: A Novel Vertex Management Architecture for Scalable Graph Processing" /><published>2025-03-03T00:00:00-08:00</published><updated>2025-03-03T00:00:00-08:00</updated><id>https://arch.cs.ucdavis.edu/accelerator/2025/03/03/nova</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/accelerator/2025/03/03/nova.html"><![CDATA[<p><a href="/assets/papers/NOVA-HPCA-2025.pdf" class="btn btn--primary btn--large">Local Download</a></p>

<p>We propose a scalable graph processing hardware accelerator called NOVA that is based on a novel vertex management architecture that decouples the execution of reduction and propagation operations in the popular vertex-centric graph processing paradigm.
This allows us to store the working set in off-chip memory and utilize the available on-chip memory as a buffer to hide the latency of DRAM accesses instead of a traditional cache.
This overcomes one of the key drawbacks of almost all prior works which require temporal partitioning of graphs to scale to large graphs.
We develop a cycle-accurate model of the architecture in gem5 and demonstrate that NOVA exhibits near-perfect weak and strong scaling while scaling to large graphs by spatially tiling multiple nodes.
In addition, our simulations show that NOVA is 2.35x better that a state-of-the-art graph accelerator (PolyGraph) while using a fraction of the on-chip memory on a synthetic graph with 134M vertices and over 2.14B edges.</p>

<h2 id="citation">Citation</h2>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@article</span><span class="p">{</span><span class="nl">babaie2024tdram</span><span class="p">,</span>
  <span class="na">author</span>       <span class="p">=</span> <span class="s">{Marjan Fariborz and Mahyar Samani and Austin York and SJ Ben Yoo and Jason Lowe-Power and Venkatesh Akella}</span><span class="p">,</span>
  <span class="na">title</span>        <span class="p">=</span> <span class="s">{NOVA: A Novel Vertex Management Architecture for Scalable Graph Processing}</span><span class="p">,</span>
  <span class="na">year</span>         <span class="p">=</span> <span class="s">{2025}</span><span class="p">,</span>
  <span class="na">url</span>          <span class="p">=</span> <span class="s">{https://doi.org/10.1109/HPCA61900.2025.00072}</span><span class="p">,</span>
  <span class="na">doi</span>          <span class="p">=</span> <span class="s">{10.1109/HPCA61900.2025.00072}</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;accelerator&quot;]" /><summary type="html"><![CDATA[Marjan Fariborz, Mahyar Samani, Austin York, SJ Ben Yoo, Jason Lowe-Power, Venkatesh Akella]]></summary></entry><entry><title type="html">TDRAM: Tag-enhanced DRAM for Efficient Caching</title><link href="https://arch.cs.ucdavis.edu/memory/2025/02/27/tdram.html" rel="alternate" type="text/html" title="TDRAM: Tag-enhanced DRAM for Efficient Caching" /><published>2025-02-27T00:00:00-08:00</published><updated>2025-02-27T00:00:00-08:00</updated><id>https://arch.cs.ucdavis.edu/memory/2025/02/27/tdram</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/memory/2025/02/27/tdram.html"><![CDATA[<p><a href="/assets/papers/TDRAM-HPCA-2025.pdf" class="btn btn--primary btn--large">Local Download</a></p>

<p>As SRAM-based caches are hitting a scaling wall, manufacturers are integrating DRAM-based caches into system designs to continue increasing cache sizes. While DRAM caches can improve the performance of memory systems, existing DRAM cache designs suffer from high miss penalties, wasted data movement, and interference between misses and demands. In this paper, we propose TDRAM, a novel DRAM microarchitecture tailored for caching. TDRAM enhances existing DRAM, such as HBM3, by adding small, low-latency mats to store tags and metadata on the same die as the data mats. These mats enable tag and data access in lockstep, in-DRAM tag comparison, and conditional data response based on the comparison result (reducing wasted data transfers), akin to SRAM cache mechanisms. TDRAM further optimizes hit and miss latencies through opportunistic early tag probing. Moreover, TDRAM introduces
a flush buffer to store conflicting dirty data on write misses, eliminating data bus turnaround delays on write demands. We
evaluate TDRAM in a full-system simulation using a set of HPC workloads with large memory footprints, showing that TDRAM,
on average, provides 2.65× faster tag checks, 1.23× speedup, and 21% less energy consumption compared to state-of-the-art commercial and research designs.</p>

<h2 id="citation">Citation</h2>

<p>Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael R. Miller, Taeksang Song, Thomas Vogelsang, Steven C. Woo, and Jason Lowe-Power, “TDRAM: Tag-enhanced DRAM for Efficient Caching,” arXiv preprint arXiv:2404.14617, 2024. doi: 10.48550/arXiv.2404.14617.</p>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@article</span><span class="p">{</span><span class="nl">babaie2024tdram</span><span class="p">,</span>
  <span class="na">author</span>       <span class="p">=</span> <span class="s">{Maryam Babaie and Ayaz Akram and Wendy Elsasser and Brent Haukness and Michael R. Miller and Taeksang Song and Thomas Vogelsang and Steven C. Woo and Jason Lowe-Power}</span><span class="p">,</span>
  <span class="na">title</span>        <span class="p">=</span> <span class="s">{TDRAM: Tag-enhanced DRAM for Efficient Caching}</span><span class="p">,</span>
  <span class="na">year</span>         <span class="p">=</span> <span class="s">{2024}</span><span class="p">,</span>
  <span class="na">url</span>          <span class="p">=</span> <span class="s">{https://doi.org/10.48550/arXiv.2404.14617}</span><span class="p">,</span>
  <span class="na">doi</span>          <span class="p">=</span> <span class="s">{10.48550/arXiv.2404.14617}</span><span class="p">,</span>
  <span class="na">eprint</span><span class="p">=</span><span class="s">{2404.14617}</span><span class="p">,</span>
  <span class="na">archivePrefix</span><span class="p">=</span><span class="s">{arXiv}</span><span class="p">,</span>
  <span class="na">primaryClass</span><span class="p">=</span><span class="s">{cs.AR}</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;memory&quot;]" /><summary type="html"><![CDATA[Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael R. Miller, Taeksang Song, Thomas Vogelsang, Steven C. Woo, Jason Lowe-Power]]></summary></entry><entry><title type="html">Potential and Limitation of High-Frequency Cores and Caches</title><link href="https://arch.cs.ucdavis.edu/simulation/2024/08/06/potentiallimitationhighfreqcorescaches.html" rel="alternate" type="text/html" title="Potential and Limitation of High-Frequency Cores and Caches" /><published>2024-08-06T00:00:00-07:00</published><updated>2024-08-06T00:00:00-07:00</updated><id>https://arch.cs.ucdavis.edu/simulation/2024/08/06/potentiallimitationhighfreqcorescaches</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/simulation/2024/08/06/potentiallimitationhighfreqcorescaches.html"><![CDATA[<p><a href="/assets/papers/arxiv24-potentialhighfreq.pdf" class="btn btn--primary btn--large">Local Download</a>
<a href="https://arxiv.org/abs/2408.03308" class="btn btn--primary btn--large">arXiv Link</a>
<a href="/assets/papers/modsim2024-potentialhighfreq-poster.pdf" class="btn btn--primary btn--large">Poster Download</a>
<a href="/assets/papers/modsim2024-potentialhighfreq-presentation.pdf" class="btn btn--primary btn--large">Presentation Download</a></p>

<p>This paper explores the potential of cryogenic semiconductor computing and superconductor electronics as promising alternatives to traditional semiconductor devices. As semiconductor devices face challenges such as increased leakage currents and reduced performance at higher temperatures, these novel technologies offer high performance and low power computation. Conventional semiconductor electronics operating at cryogenic temperatures (below -150°C or 123.15 K) can benefit from reduced leakage currents and improved electron mobility. On the other hand, superconductor electronics, operating below 10 K, allow electrons to flow without resistance, offering the potential for ultra-low-power, high-speed computation. This study presents a comprehensive performance modeling and analysis of these technologies and provides insights into their potential benefits and limitations. We implement models of in-order and out-of-order cores operating at high clock frequencies associated with superconductor electronics and cryogenic semiconductor computing in gem5. We evaluate the performance of these components using workloads representative of real-world applications like NPB, SPEC CPU2006, and GAPBS. Our results show the potential speedups achievable by these components and the limitations posed by cache bandwidth. This work provides valuable insights into the performance implications and design trade-offs associated with cryogenic and superconductor technologies, laying the foundation for future research in this field using gem5.</p>

<h2 id="citation">Citation</h2>

<p>Kunal Pai, Anusheel Nand, Jason Lowe-Power, “Potential and Limitations of High-Frequency Cores and Caches,” arXiv preprint arXiv:2408.03308, 2024. doi: 10.48550/arXiv.2408.03308.</p>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@misc</span><span class="p">{</span><span class="nl">pai2024potentiallimitationhighfrequencycores</span><span class="p">,</span>
      <span class="na">title</span><span class="p">=</span><span class="s">{Potential and Limitation of High-Frequency Cores and Caches}</span><span class="p">,</span>
      <span class="na">author</span><span class="p">=</span><span class="s">{Kunal Pai and Anusheel Nand and Jason Lowe-Power}</span><span class="p">,</span>
      <span class="na">year</span><span class="p">=</span><span class="s">{2024}</span><span class="p">,</span>
      <span class="na">eprint</span><span class="p">=</span><span class="s">{2408.03308}</span><span class="p">,</span>
      <span class="na">archivePrefix</span><span class="p">=</span><span class="s">{arXiv}</span><span class="p">,</span>
      <span class="na">primaryClass</span><span class="p">=</span><span class="s">{cs.AR}</span><span class="p">,</span>
      <span class="na">url</span><span class="p">=</span><span class="s">{https://arxiv.org/abs/2408.03308}</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;simulation&quot;]" /><summary type="html"><![CDATA[Kunal Pai, Anusheel Nand, Jason Lowe-Power. ModSim 2024: Workshop on Modeling & Simulation of Systems and Applications.]]></summary></entry><entry><title type="html">CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems</title><link href="https://arch.cs.ucdavis.edu/memory/2024/05/29/cachedarrays.html" rel="alternate" type="text/html" title="CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems" /><published>2024-05-29T00:00:00-07:00</published><updated>2024-05-29T00:00:00-07:00</updated><id>https://arch.cs.ucdavis.edu/memory/2024/05/29/cachedarrays</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/memory/2024/05/29/cachedarrays.html"><![CDATA[<p><a href="https://doi.org/10.1109/IPDPS57955.2024.00055" class="btn btn--primary btn--large">Paper on IEEEXplore</a>
<a href="/assets/papers/ipdps24-cachedarrays.pdf" class="btn btn--primary btn--large">Local Paper Download</a>
<a href="/assets/papers/ipdps24-cachedarrays-presentation.pdf" class="btn btn--primary btn--large">IPDPS Presentation Download</a>
<a href="https://hmem-workshop.github.io/hmem2023/hmem-sc23/slides-papers/lowe-power-cachedarrays-hmem-workshop-sc23.pdf" class="btn btn--primary btn--large">SC HMEM Workshop Presentation</a>
<a href="https://github.com/darchr/CachedArrays.jl" class="btn btn--primary btn--large">Source Code</a></p>

<p>We propose a new framework called <em>CachedArrays</em>
and a set of APIs to address the data tiering problem in large scale
heterogeneous and disaggregated memory systems. The proposed
framework operates at a variable size object granularity and
allows the programmer to specify semantic hints about future
use of data via a Policy API, which are used by a Data Manager
to choose when and where to place a particular data object
using a data management API, thus bridging the semantic gap
between the programmer and the platform-specific hardware
details, and optimizing overall performance. We evaluate the
proposed framework on a real hardware platform with terabytes
of memory consisting of NVRAM and DRAM on large scale ML
training workloads such CNNs that exhibit different data access
and usage patterns. We show that <em>CachedArrays</em> outperforms
hardware caches, and can exploit many of the algorithmic-specific
optimizations of prior works.</p>

<p><a href="/memory/2023/05/22/cached-embeddings.html">CachedEmbeddings</a> builds on top of <em>CachedArrays</em>.</p>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@inproceedings</span><span class="p">{</span><span class="nl">hildebrand24cachedarrays</span><span class="p">,</span>
<span class="na">author</span> <span class="p">=</span> <span class="s">{Hildebrand, Mark and Lowe-Power, Jason and Akella, Venkatesh}</span><span class="p">,</span>
<span class="na">title</span> <span class="p">=</span> <span class="s">{ {CachedArrays}: Optimizing Data Movement for Heterogeneous Memory Systems}</span><span class="p">,</span>
<span class="na">year</span> <span class="p">=</span> <span class="s">{2024}</span><span class="p">,</span>
<span class="na">url</span> <span class="p">=</span> <span class="s">{https://doi.org/10.1109/IPDPS57955.2024.00055}</span><span class="p">,</span>
<span class="na">doi</span> <span class="p">=</span> <span class="s">{10.1109/IPDPS57955.2024.00055}</span><span class="p">,</span>
<span class="na">booktitle</span> <span class="p">=</span> <span class="s">{Proceeding of the 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)}</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;memory&quot;]" /><summary type="html"><![CDATA[Mark Hildebrand, Jason Lowe-Power, Venkatesh Akella. IPDPS 2024]]></summary></entry><entry><title type="html">TEGRA – Scaling Up Terascale Graph Processing with Disaggregated Computing</title><link href="https://arch.cs.ucdavis.edu/memory/2024/04/28/tegra.html" rel="alternate" type="text/html" title="TEGRA – Scaling Up Terascale Graph Processing with Disaggregated Computing" /><published>2024-04-28T00:00:00-07:00</published><updated>2024-04-28T00:00:00-07:00</updated><id>https://arch.cs.ucdavis.edu/memory/2024/04/28/tegra</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/memory/2024/04/28/tegra.html"><![CDATA[<p><a href="/assets/papers/arxiv24-tegra.pdf" class="btn btn--primary btn--large">Local Download</a>
<a href="https://arxiv.org/abs/2404.03155" class="btn btn--primary btn--large">arXiv Link</a></p>

<p>Graphs are essential for representing relationships in various domains, driving modern AI applications such as graph analytics and neural networks across science, engineering, cybersecurity, transportation, and economics. However, the size of modern graphs are rapidly expanding, posing challenges for traditional CPUs and GPUs in meeting real-time processing demands. As a result, hardware accelerators for graph processing have been proposed. However, the largest graphs that can be handled by these systems is still modest often targeting Twitter graph (1.4B edges approximately). This paper aims to address this limitation by developing a graph accelerator capable of terascale graph processing. Scale out architectures, architectures where nodes are replicated to expand to larger datasets, are natural for handling larger graphs. We argue that this approach is not appropriate for very large-scale graphs because it leads to under utilization of both memory resources and compute resources. Additionally, vertex and edge processing have different access patterns. Communication overheads also pose further challenges in designing scalable architectures. To overcome these issues, this paper proposes TEGRA, a scale-up architecture for terascale graph processing. TEGRA leverages a composable computing system with disaggregated resources and a communication architecture inspired by Active Messages. By employing direct communication between cores and optimizing memory interconnect utilization, TEGRA effectively reduces communication overhead and improves resource utilization, therefore enabling efficient processing of terascale graphs.</p>

<h2 id="citation">Citation</h2>

<p>William Shaddix, Mahyar Samani, Marjan Fariborz, S.J. Ben Yoo, Jason Lowe-Power, and Venkatesh Akella, “TEGRA – Scaling Up Terascale Graph Processing with Disaggregated Computing,” arXiv preprint arXiv:2404.03155, 2024. doi: 10.48550/arXiv.2404.03155.</p>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@article</span><span class="p">{</span><span class="nl">shaddix2024tegra</span><span class="p">,</span>
  <span class="na">author</span>       <span class="p">=</span> <span class="s">{William Shaddix and Mahyar Samani and Marjan Fariborz and S.J. Ben Yoo and Jason Lowe-Power and Venkatesh Akella}</span><span class="p">,</span>
  <span class="na">title</span>        <span class="p">=</span> <span class="s">{TEGRA -- Scaling Up Terascale Graph Processing with Disaggregated Computing}</span><span class="p">,</span>
  <span class="na">year</span>         <span class="p">=</span> <span class="s">{2024}</span><span class="p">,</span>
  <span class="na">url</span>          <span class="p">=</span> <span class="s">{https://doi.org/10.48550/arXiv.2404.03155}</span><span class="p">,</span>
  <span class="na">doi</span>          <span class="p">=</span> <span class="s">{10.48550/arXiv.2404.03155}</span><span class="p">,</span>
  <span class="na">eprint</span><span class="p">=</span><span class="s">{2404.03155}</span><span class="p">,</span>
  <span class="na">archivePrefix</span><span class="p">=</span><span class="s">{arXiv}</span><span class="p">,</span>
  <span class="na">primaryClass</span><span class="p">=</span><span class="s">{cs.AR}</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;memory&quot;]" /><summary type="html"><![CDATA[William Shaddix, Mahyar Samani, Marjan Fariborz, S.J. Ben Yoo, Jason Lowe-Power, Venkatesh Akella]]></summary></entry><entry><title type="html">Aragorn: A Privacy-Enhancing System for Mobile Cameras</title><link href="https://arch.cs.ucdavis.edu/security/2024/01/12/aragorn.html" rel="alternate" type="text/html" title="Aragorn: A Privacy-Enhancing System for Mobile Cameras" /><published>2024-01-12T00:00:00-08:00</published><updated>2024-01-12T00:00:00-08:00</updated><id>https://arch.cs.ucdavis.edu/security/2024/01/12/aragorn</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/security/2024/01/12/aragorn.html"><![CDATA[<p><a href="/assets/papers/imwut24-aragorn.pdf" class="btn btn--primary btn--large">Local Download</a>
<a href="https://dl.acm.org/doi/abs/10.1145/3631406" class="btn btn--primary btn--large">ACM DL Link</a></p>

<p>Mobile app developers often rely on cameras to implement rich features. However, giving apps unfettered access to the mobile camera poses a privacy threat when camera frames capture sensitive information that is not needed for the app’s functionality. To mitigate this threat, we present Aragorn, a novel privacy-enhancing mobile camera system that provides fine grained control over what information can be present in camera frames before apps can access them. Aragorn automatically sanitizes camera frames by detecting regions that are essential to an app’s functionality and blocking out everything else to protect privacy while retaining app utility. Aragorn can cater to a wide range of camera apps and incorporates knowledge distillation and crowdsourcing to extend robust support to previously unsupported apps. In our evaluations, we see that, with no degradation in utility, Aragorn detects credit cards with 89\% accuracy and faces with 100\% accuracy in context of credit card scanning and face recognition respectively. We show that Aragorn’s implementation in the Android camera subsystem only suffers an average drop of 0.01 frames per second in frame rate. Our evaluations show that the overhead incurred by Aragorn to system performance is reasonable.</p>

<h2 id="citation">Citation</h2>

<p>Hari Venugopalan, Zainul Abi Din, Trevor Carpenter, Jason Lowe-Power, Samuel T. King, and Zubair Shafiq. 2024. Aragorn: A Privacy-Enhancing System for Mobile Cameras. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 4, Article 181 (December 2023), 31 pages. https://doi.org/10.1145/3631406</p>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@article</span><span class="p">{</span><span class="nl">Venugopalan2024aragorn</span><span class="p">,</span>
<span class="na">author</span> <span class="p">=</span> <span class="s">{Venugopalan, Hari and Din, Zainul Abi and Carpenter, Trevor and Lowe-Power, Jason and King, Samuel T. and Shafiq, Zubair}</span><span class="p">,</span>
<span class="na">title</span> <span class="p">=</span> <span class="s">{Aragorn: A Privacy-Enhancing System for Mobile Cameras}</span><span class="p">,</span>
<span class="na">year</span> <span class="p">=</span> <span class="s">{2024}</span><span class="p">,</span>
<span class="na">issue_date</span> <span class="p">=</span> <span class="s">{December 2023}</span><span class="p">,</span>
<span class="na">publisher</span> <span class="p">=</span> <span class="s">{Association for Computing Machinery}</span><span class="p">,</span>
<span class="na">address</span> <span class="p">=</span> <span class="s">{New York, NY, USA}</span><span class="p">,</span>
<span class="na">volume</span> <span class="p">=</span> <span class="s">{7}</span><span class="p">,</span>
<span class="na">number</span> <span class="p">=</span> <span class="s">{4}</span><span class="p">,</span>
<span class="na">url</span> <span class="p">=</span> <span class="s">{https://doi.org/10.1145/3631406}</span><span class="p">,</span>
<span class="na">doi</span> <span class="p">=</span> <span class="s">{10.1145/3631406}</span><span class="p">,</span>
<span class="na">journal</span> <span class="p">=</span> <span class="s">{Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.}</span><span class="p">,</span>
<span class="na">month</span> <span class="p">=</span> <span class="s">{jan}</span><span class="p">,</span>
<span class="na">articleno</span> <span class="p">=</span> <span class="s">{181}</span><span class="p">,</span>
<span class="na">numpages</span> <span class="p">=</span> <span class="s">{31}</span><span class="p">,</span>
<span class="na">keywords</span> <span class="p">=</span> <span class="s">{Knowledge Distillation, Object Detection}</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;security&quot;]" /><summary type="html"><![CDATA[Hari Venugopalan, Zainul Abi Din, Trevor Carpenter, Jason Lowe-Power, Samuel T. King, and Zubair Shafiq. IMWUT 2024.]]></summary></entry><entry><title type="html">FP-Rowhammer: DRAM-Based Device Fingerprinting</title><link href="https://arch.cs.ucdavis.edu/security/2023/06/30/centauri.html" rel="alternate" type="text/html" title="FP-Rowhammer: DRAM-Based Device Fingerprinting" /><published>2023-06-30T00:00:00-07:00</published><updated>2023-06-30T00:00:00-07:00</updated><id>https://arch.cs.ucdavis.edu/security/2023/06/30/centauri</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/security/2023/06/30/centauri.html"><![CDATA[<p><a href="https://dl.acm.org/doi/10.1145/3708821.3733880" class="btn btn--primary btn--large">Paper on ACM DL</a>
<a href="/assets/papers/asiaccs-fp-rowhammer-presentation.pdf" class="btn btn--primary btn--large">AsiaCCS Presentation Download</a>
<a href="/assets/papers/asiaccs-fp-rowhammer.pdf" class="btn btn--primary btn--large">Local Download</a>
<a href="https://arxiv.org/abs/2307.00143" class="btn btn--primary btn--large">Older arXiv Version</a></p>

<p>Device fingerprinting leverages attributes that capture heterogeneity in hardware and software configurations to extract unique and stable fingerprints.
Fingerprinting countermeasures attempt to either present a uniform fingerprint across different devices through normalization or present different fingerprints for the same device each time through obfuscation.
We present FP-Rowhammer, a Rowhammer-based device fingerprinting approach that can build unique and stable fingerprints even across devices with normalized or obfuscated hardware and software configurations.
To this end, FP-Rowhammer leverages the DRAM manufacturing process variation that gives rise to unique distributions of Rowhammer-induced bit flips across different DRAM modules.
Our evaluation on a test bed of 98 DRAM modules shows that FP-Rowhammer achieves 99.91% fingerprinting accuracy. 
FP-Rowhammer’s fingerprints are also stable, with no degradation in fingerprinting accuracy over a period of ten days.
We also demonstrate that FP-Rowhammer is efficient, taking less than five seconds to extract a fingerprint.
FP-Rowhammer is the first Rowhammer fingerprinting approach to extract unique and stable fingerprints efficiently and at scale.</p>

<h2 id="citation">Citation</h2>

<p>Hari Venugopalan, Kaustav Goswami, Zainul Abi Din, Jason Lowe-Power, Samuel T. King, and Zubair Shafiq. 2025. FP-Rowhammer: DRAM-Based Device Fingerprinting. In Proceedings of the 20th ACM Asia Conference on Computer and Communications Security (ASIA CCS ‘25). Association for Computing Machinery, New York, NY, USA, 1141–1157. https://doi.org/10.1145/3708821.3733880</p>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@inproceedings</span><span class="p">{</span><span class="nl">10.1145/3708821.3733880</span><span class="p">,</span>
<span class="na">author</span> <span class="p">=</span> <span class="s">{Venugopalan, Hari and Goswami, Kaustav and Abi Din, Zainul and Lowe-Power, Jason and King, Samuel T. and Shafiq, Zubair}</span><span class="p">,</span>
<span class="na">title</span> <span class="p">=</span> <span class="s">{FP-Rowhammer: DRAM-Based Device Fingerprinting}</span><span class="p">,</span>
<span class="na">year</span> <span class="p">=</span> <span class="s">{2025}</span><span class="p">,</span>
<span class="na">isbn</span> <span class="p">=</span> <span class="s">{9798400714108}</span><span class="p">,</span>
<span class="na">publisher</span> <span class="p">=</span> <span class="s">{Association for Computing Machinery}</span><span class="p">,</span>
<span class="na">address</span> <span class="p">=</span> <span class="s">{New York, NY, USA}</span><span class="p">,</span>
<span class="na">url</span> <span class="p">=</span> <span class="s">{https://doi.org/10.1145/3708821.3733880}</span><span class="p">,</span>
<span class="na">doi</span> <span class="p">=</span> <span class="s">{10.1145/3708821.3733880}</span><span class="p">,</span>
<span class="na">abstract</span> <span class="p">=</span> <span class="s">{Device fingerprinting leverages attributes that capture heterogeneity in hardware and software configurations to extract unique and stable fingerprints. Fingerprinting countermeasures attempt to either present a uniform fingerprint across different devices through normalization or present different fingerprints for the same device each time through obfuscation. We present FP-Rowhammer, a Rowhammer-based device fingerprinting approach that can build unique and stable fingerprints even across devices with normalized or obfuscated hardware and software configurations. To this end, FP-Rowhammer leverages the DRAM manufacturing process variation that gives rise to unique distributions of Rowhammer-induced bit flips across different DRAM modules. Our evaluation on a test bed of 98 DRAM modules shows that FP-Rowhammer achieves 99.91\% fingerprinting accuracy. FP-Rowhammer’s fingerprints are also stable, with no degradation in fingerprinting accuracy over a period of ten days. We also demonstrate that FP-Rowhammer is efficient, taking less than five seconds to extract a fingerprint. FP-Rowhammer is the first Rowhammer fingerprinting approach to extract unique and stable fingerprints efficiently and at scale.}</span><span class="p">,</span>
<span class="na">booktitle</span> <span class="p">=</span> <span class="s">{Proceedings of the 20th ACM Asia Conference on Computer and Communications Security}</span><span class="p">,</span>
<span class="na">pages</span> <span class="p">=</span> <span class="s">{1141–1157}</span><span class="p">,</span>
<span class="na">numpages</span> <span class="p">=</span> <span class="s">{17}</span><span class="p">,</span>
<span class="na">location</span> <span class="p">=</span> <span class="s">{
}</span><span class="p">,</span>
<span class="na">series</span> <span class="p">=</span> <span class="s">{ASIA CCS '25}</span>
<span class="p">}</span>
</code></pre></div></div>

<h4 id="older-arxiv-version">Older ArXiv Version</h4>

<p>Hari Venugopalan, Zainul Abi Din, Jason Lowe-Power, Samuel T. King, and Zubair Shafiq. 2024. Centauri: Practical Rowhammer Fingerprinting. arXiv preprint arXiv:2307.00143, 2023. doi: 10.48550/arXiv.2307.00143.</p>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@misc</span><span class="p">{</span><span class="nl">venugopalan2023centauri</span><span class="p">,</span>
      <span class="na">title</span><span class="p">=</span><span class="s">{Centauri: Practical Rowhammer Fingerprinting}</span><span class="p">,</span> 
      <span class="na">author</span><span class="p">=</span><span class="s">{Hari Venugopalan and Kaustav Goswami and Zainul Abi Din and Jason Lowe-Power and Samuel T. King and Zubair Shafiq}</span><span class="p">,</span>
      <span class="na">year</span><span class="p">=</span><span class="s">{2023}</span><span class="p">,</span>
      <span class="na">eprint</span><span class="p">=</span><span class="s">{2307.00143}</span><span class="p">,</span>
      <span class="na">archivePrefix</span><span class="p">=</span><span class="s">{arXiv}</span><span class="p">,</span>
      <span class="na">primaryClass</span><span class="p">=</span><span class="s">{cs.CR}</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;security&quot;]" /><summary type="html"><![CDATA[Hari Venugopalan, Kaustav Goswami, Zainul Abi Din, Jason Lowe-Power, Samuel T. King, Zubair Shafiq]]></summary></entry><entry><title type="html">Validating Hardware and SimPoints with gem5: A RISC-V Board Case Study</title><link href="https://arch.cs.ucdavis.edu/simulation/2023/06/17/validating-riscvmatched.html" rel="alternate" type="text/html" title="Validating Hardware and SimPoints with gem5: A RISC-V Board Case Study" /><published>2023-06-17T00:00:00-07:00</published><updated>2023-06-17T00:00:00-07:00</updated><id>https://arch.cs.ucdavis.edu/simulation/2023/06/17/validating-riscvmatched</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/simulation/2023/06/17/validating-riscvmatched.html"><![CDATA[<p><a href="/assets/papers/validating-hardware-and-simpoints-with-gem5-poster.pdf" class="btn btn--primary btn--large">Local Download</a></p>

<p>This poster investigates methods for validating hardware and simulation points with the gem5 simulator. We used a RISC-V board as a case study.</p>

<ul>
  <li>SimPoints are used to test configuration changes without running the benchmark to completion. The weighted IPC from SimPoints has the potential to replace full gem5 runs.</li>
  <li>We find that iterative microbenchmark fine-tuning, when optimizing for IPC, systematically refines configurations for larger workloads.</li>
  <li>Future research will extend the methodology to out-of-order processors, LoopPoints, and different ISAs.</li>
</ul>

<h2 id="citation">Citation</h2>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@misc</span><span class="p">{</span><span class="nl">kunal2023matchedposter</span><span class="p">,</span>
  <span class="na">author</span> <span class="p">=</span> <span class="s">{Pai, Kunal and Qiu, Zhantong and Lowe-Power, Jason}</span><span class="p">,</span>
  <span class="na">title</span><span class="p">=</span> <span class="s">{Validating Hardware and SimPoints with gem5: A RISC-V Board Case Study}</span><span class="p">,</span>
  <span class="na">year</span><span class="p">=</span> <span class="s">{2023}</span><span class="p">,</span>
  <span class="na">booktitle</span> <span class="p">=</span> <span class="s">{gem5 Workshop, International Symposium on Computer Architecture 2023}</span><span class="p">,</span>
  <span class="na">url</span> <span class="p">=</span> <span class="s">{https://www.gem5.org/assets/files/workshop-isca-2023/posters/validating-hardware-and-simpoints-with-gem5-poster.pdf}</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;simulation&quot;]" /><summary type="html"><![CDATA[Kunal Pai, Zhantong Qiu, Jason Lowe-Power. ISCA 2023: gem5 Workshop.]]></summary></entry><entry><title type="html">Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems</title><link href="https://arch.cs.ucdavis.edu/memory/2023/05/22/cached-embeddings.html" rel="alternate" type="text/html" title="Efficient Large Scale DLRM Implementation on Heterogeneous Memory Systems" /><published>2023-05-22T00:00:00-07:00</published><updated>2023-05-22T00:00:00-07:00</updated><id>https://arch.cs.ucdavis.edu/memory/2023/05/22/cached-embeddings</id><content type="html" xml:base="https://arch.cs.ucdavis.edu/memory/2023/05/22/cached-embeddings.html"><![CDATA[<p><a href="https://doi.org/10.1007/978-3-031-32041-5_3" class="btn btn--primary btn--large">Paper on Springer</a>
<a href="/assets/papers/isc23-cached-embeddings.pdf" class="btn btn--primary btn--large">Local Paper Download</a>
<a href="https://github.com/darchr/CachedEmbeddings.jl" class="btn btn--primary btn--large">Code available on GitHub</a></p>

<p>We propose a new data structure called <em>CachedEmbeddings</em> for training large scale deep learning recommendation models (DLRM) efficiently on heterogeneous (DRAM + non-volatile) memory platforms. <em>CachedEmbeddings</em> implements an implicit software-managed cache and data movement optimization that is integrated with the Julia programming framework to optimize the implementation of large scale DLRM implementations with multiple sparse embedded tables operations. In particular we show an implementation that is 1.4X to 2X better than the best known Intel CPU based implementations on state-of-the-art DLRM benchmarks on a real heterogeneous memory platform from Intel, and 1.32X to 1.45X improvement over Intel’s 2LM implementation that treats the DRAM as a hardware managed cache.</p>

<p><em>CachedEmbeddings</em> builds on top of <a href="/memory/2024/05/29/cachedarrays.html">CachedArrays</a></p>

<div class="language-bibtex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">@inproceedings</span><span class="p">{</span><span class="nl">hildebrand23cachedembeddings</span><span class="p">,</span>
<span class="na">author</span> <span class="p">=</span> <span class="s">{Hildebrand, Mark and Lowe-Power, Jason and Akella, Venkatesh}</span><span class="p">,</span>
<span class="na">title</span> <span class="p">=</span> <span class="s">{Efficient Large Scale DLRM Implementation On Heterogeneous Memory Systems}</span><span class="p">,</span>
<span class="na">year</span> <span class="p">=</span> <span class="s">{2023}</span><span class="p">,</span>
<span class="na">isbn</span> <span class="p">=</span> <span class="s">{978-3-031-32040-8}</span><span class="p">,</span>
<span class="na">publisher</span> <span class="p">=</span> <span class="s">{Springer-Verlag}</span><span class="p">,</span>
<span class="na">address</span> <span class="p">=</span> <span class="s">{Berlin, Heidelberg}</span><span class="p">,</span>
<span class="na">url</span> <span class="p">=</span> <span class="s">{https://doi.org/10.1007/978-3-031-32041-5_3}</span><span class="p">,</span>
<span class="na">doi</span> <span class="p">=</span> <span class="s">{10.1007/978-3-031-32041-5_3}</span><span class="p">,</span>
<span class="na">booktitle</span> <span class="p">=</span> <span class="s">{High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings}</span><span class="p">,</span>
<span class="na">pages</span> <span class="p">=</span> <span class="s">{42–61}</span><span class="p">,</span>
<span class="na">numpages</span> <span class="p">=</span> <span class="s">{20}</span><span class="p">,</span>
<span class="na">location</span> <span class="p">=</span> <span class="s">{Hamburg, Germany}</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Jason Lowe-Power</name><email>jlowepower@ucdavis.edu</email></author><category term="[&quot;memory&quot;]" /><summary type="html"><![CDATA[Mark Hildebrand, Jason Lowe-Power, Venkatesh Akella High Performance Computing: 38th International Conference, ISC High Performance 2023.]]></summary></entry></feed>