Advanced search
1 file | 797.28 KB Add to list

Get Out of the Valley: Power-Efficient Address Mapping for GPUs

Author
Organization
Abstract
GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability. To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information content of each address bit of the memory requests that are likely to co-exist in the memory system at runtime. Using this metric, we find that GPU-compute workloads exhibit entropy valleys distributed throughout the lower order address bits. This indicates that efficient GPU-address mapping schemes need to harvest entropy from broad address-bit ranges and concentrate the entropy into the bits used for channel and bank selection in the memory subsystem. This insight leads us to propose the Page Address Entropy (PAE) mapping scheme which concentrates the entropy of the row, channel and bank bits of the input address into the bank and channel bits of the output address. PAE maps straightforwardly to hardware and can be implemented with a tree of XOR-gates. PAE improves performance by 1.31 x and power-efficiency by 1.25 x compared to state-of-the-art permutation-based address mapping.

Downloads

  • isca18.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 797.28 KB

Citation

Please use this url to cite or link to this publication:

MLA
Liu, Yuxi et al. “Get Out of the Valley: Power-Efficient Address Mapping for GPUs.” 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA). 2018. Print.
APA
Liu, Yuxi, Zhao, X., Jahre, M., Wang, Z., Wang, X., Luo, Y., & Eeckhout, L. (2018). Get Out of the Valley: Power-Efficient Address Mapping for GPUs. 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA). Presented at the 45th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA).
Chicago author-date
Liu, Yuxi, Xia Zhao, Magnus Jahre, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, and Lieven Eeckhout. 2018. “Get Out of the Valley: Power-Efficient Address Mapping for GPUs.” In 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA).
Chicago author-date (all authors)
Liu, Yuxi, Xia Zhao, Magnus Jahre, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, and Lieven Eeckhout. 2018. “Get Out of the Valley: Power-Efficient Address Mapping for GPUs.” In 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA).
Vancouver
1.
Liu Y, Zhao X, Jahre M, Wang Z, Wang X, Luo Y, et al. Get Out of the Valley: Power-Efficient Address Mapping for GPUs. 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA). 2018.
IEEE
[1]
Y. Liu et al., “Get Out of the Valley: Power-Efficient Address Mapping for GPUs,” in 2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), Los Angeles, CA, 2018.
@inproceedings{8625904,
  abstract     = {GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability.
To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information content of each address bit of the memory requests that are likely to co-exist in the memory system at runtime. Using this metric, we find that GPU-compute workloads exhibit entropy valleys distributed throughout the lower order address bits. This indicates that efficient GPU-address mapping schemes need to harvest entropy from broad address-bit ranges and concentrate the entropy into the bits used for channel and bank selection in the memory subsystem. This insight leads us to propose the Page Address Entropy (PAE) mapping scheme which concentrates the entropy of the row, channel and bank bits of the input address into the bank and channel bits of the output address. PAE maps straightforwardly to hardware and can be implemented with a tree of XOR-gates. PAE improves performance by 1.31 x and power-efficiency by 1.25 x compared to state-of-the-art permutation-based address mapping.},
  author       = {Liu, Yuxi and Zhao, Xia and Jahre, Magnus and Wang, Zhenlin and Wang, Xiaolin and Luo, Yingwei and Eeckhout, Lieven},
  booktitle    = {2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA)},
  isbn         = {9781538659847},
  language     = {eng},
  location     = {Los Angeles, CA},
  title        = {Get Out of the Valley: Power-Efficient Address Mapping for GPUs},
  url          = {http://dx.doi.org/10.1109/isca.2018.00024},
  year         = {2018},
}

Altmetric
View in Altmetric
Web of Science
Times cited: