Advanced search
2 files | 2.98 MB Add to list
Author
Organization
Abstract
Emerging GPU applications exhibit increasingly high computation demands which has led GPU manufacturers to build GPUs with an increasingly large number of streaming multiprocessors (SMs). Providing data to the SMs at high bandwidth puts significant pressure on the memory hierarchy and the Network-on-Chip (NoC). Current GPUs typically partition the memory-side last-level cache (LLC) in equally-sized slices that are shared by all SMs. Although a shared LLC typically results in a lower miss rate, we find that for workloads with high degrees of data sharing across SMs, a private LLC leads to a significant performance advantage because of increased bandwidth to replicated cache lines across different LLC slices. In this paper, we propose adaptive memory-side last-level GPU caching to boost performance for sharing-intensive workloads that need high bandwidth to read-only shared data. Adaptive caching leverages a lightweight performance model that balances increased LLC bandwidth against increased miss rate under private caching. In addition to improving performance for sharing-intensive workloads, adaptive caching also saves energy in a (co-designed) hierarchical two-stage crossbar NoC by power-gating and bypassing the second stage if the LLC is configured as a private cache. Our experimental results using 17 GPU workloads show that adaptive caching improves performance by 28.1% on average (up to 38.1%) compared to a shared LLC for sharing-intensive workloads. In addition, adaptive caching reduces NoC energy by 26.6% on average (up to 29.7%) and total system energy by 6.1% on average (up to 27.2%) when configured as a private cache. Finally, we demonstrate through a GPU NoC design space exploration that a hierarchical two-stage crossbar is both more power- and area-efficient than full and concentrated crossbars with the same bisection bandwidth, thus providing a low-cost cooperative solution to exploit workload sharing behavior in memory-side last-level caches.
Keywords
PERFORMANCE, REPLICATION, CAPACITY, CACHES, ENERGY

Downloads

  • isca2019.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 1.45 MB
  • paper-8625915.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 1.54 MB

Citation

Please use this url to cite or link to this publication:

MLA
Zhao, Xia, et al. “Adaptive Memory-Side Last-Level GPU Caching.” PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA ’19), 2019, pp. 411–23, doi:10.1145/3307650.3322235.
APA
Zhao, X., Adileh, A., Yu, Z., Wang, Z., Jaleel, A., & Eeckhout, L. (2019). Adaptive memory-side last-level GPU caching. In PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA ’19) (pp. 411–423). Phoenix, AZ. https://doi.org/10.1145/3307650.3322235
Chicago author-date
Zhao, Xia, Almutaz Adileh, Zhibin Yu, Zhiying Wang, Aamer Jaleel, and Lieven Eeckhout. 2019. “Adaptive Memory-Side Last-Level GPU Caching.” In PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA ’19), 411–23. https://doi.org/10.1145/3307650.3322235.
Chicago author-date (all authors)
Zhao, Xia, Almutaz Adileh, Zhibin Yu, Zhiying Wang, Aamer Jaleel, and Lieven Eeckhout. 2019. “Adaptive Memory-Side Last-Level GPU Caching.” In PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA ’19), 411–423. doi:10.1145/3307650.3322235.
Vancouver
1.
Zhao X, Adileh A, Yu Z, Wang Z, Jaleel A, Eeckhout L. Adaptive memory-side last-level GPU caching. In: PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA ’19). 2019. p. 411–23.
IEEE
[1]
X. Zhao, A. Adileh, Z. Yu, Z. Wang, A. Jaleel, and L. Eeckhout, “Adaptive memory-side last-level GPU caching,” in PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA ’19), Phoenix, AZ, 2019, pp. 411–423.
@inproceedings{8625915,
  abstract     = {{Emerging GPU applications exhibit increasingly high computation demands which has led GPU manufacturers to build GPUs with an increasingly large number of streaming multiprocessors (SMs). Providing data to the SMs at high bandwidth puts significant pressure on the memory hierarchy and the Network-on-Chip (NoC). Current GPUs typically partition the memory-side last-level cache (LLC) in equally-sized slices that are shared by all SMs. Although a shared LLC typically results in a lower miss rate, we find that for workloads with high degrees of data sharing across SMs, a private LLC leads to a significant performance advantage because of increased bandwidth to replicated cache lines across different LLC slices.

In this paper, we propose adaptive memory-side last-level GPU caching to boost performance for sharing-intensive workloads that need high bandwidth to read-only shared data. Adaptive caching leverages a lightweight performance model that balances increased LLC bandwidth against increased miss rate under private caching. In addition to improving performance for sharing-intensive workloads, adaptive caching also saves energy in a (co-designed) hierarchical two-stage crossbar NoC by power-gating and bypassing the second stage if the LLC is configured as a private cache. Our experimental results using 17 GPU workloads show that adaptive caching improves performance by 28.1% on average (up to 38.1%) compared to a shared LLC for sharing-intensive workloads. In addition, adaptive caching reduces NoC energy by 26.6% on average (up to 29.7%) and total system energy by 6.1% on average (up to 27.2%) when configured as a private cache. Finally, we demonstrate through a GPU NoC design space exploration that a hierarchical two-stage crossbar is both more power- and area-efficient than full and concentrated crossbars with the same bisection bandwidth, thus providing a low-cost cooperative solution to exploit workload sharing behavior in memory-side last-level caches.}},
  author       = {{Zhao, Xia and Adileh, Almutaz and Yu, Zhibin and Wang, Zhiying and Jaleel, Aamer and Eeckhout, Lieven}},
  booktitle    = {{PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19)}},
  isbn         = {{9781450366694}},
  keywords     = {{PERFORMANCE,REPLICATION,CAPACITY,CACHES,ENERGY}},
  language     = {{eng}},
  location     = {{Phoenix, AZ}},
  pages        = {{411--423}},
  title        = {{Adaptive memory-side last-level GPU caching}},
  url          = {{http://dx.doi.org/10.1145/3307650.3322235}},
  year         = {{2019}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: