Advanced search
2 files | 3.44 MB Add to list
Author
Organization
Abstract
GPUs continue to increase the number of streaming multiprocessors (SMs) to provide increasingly higher compute capabilities. To construct a scalable crossbar network-on-chip (NoC) that connects the SMs to the memory controllers, a cluster structure is introduced in modern GPUs in which several SMs are grouped together to share a network port. Because of network port sharing, clustered GPUs face severe NoC congestion, which creates a critical performance bottleneck. In this paper, we target redundant network traffic to mitigate GPU NoC congestion. In particular, we observe that in many GPU-compute applications, different SMs in a cluster access shared data. Issuing redundant requests to access the same memory location wastes valuable NoC bandwidth - we find on average 19.4% (and up to 48%) of the requests to be redundant. To reduce redundant NoC traffic, we propose intracluster coalescing (ICC) to merge memory requests from different SMs in a cluster. Our evaluation results show that ICC achieves an average performance improvement of 9.7% (and up to 33%) over a conventional design.

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.72 MB
  • paper-8585875.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 1.72 MB

Citation

Please use this url to cite or link to this publication:

MLA
Wang, Lu, et al. “Intra-Cluster Coalescing to Reduce GPU NoC Pressure.” 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), IEEE, 2018, pp. 990–99, doi:10.1109/ipdps.2018.00108.
APA
Wang, L., Zhao, X., Kaeli, D., Wang, Z., & Eeckhout, L. (2018). Intra-cluster coalescing to reduce GPU NoC pressure. 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 990–999. https://doi.org/10.1109/ipdps.2018.00108
Chicago author-date
Wang, Lu, Xia Zhao, David Kaeli, Zhiying Wang, and Lieven Eeckhout. 2018. “Intra-Cluster Coalescing to Reduce GPU NoC Pressure.” In 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 990–99. IEEE. https://doi.org/10.1109/ipdps.2018.00108.
Chicago author-date (all authors)
Wang, Lu, Xia Zhao, David Kaeli, Zhiying Wang, and Lieven Eeckhout. 2018. “Intra-Cluster Coalescing to Reduce GPU NoC Pressure.” In 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 990–999. IEEE. doi:10.1109/ipdps.2018.00108.
Vancouver
1.
Wang L, Zhao X, Kaeli D, Wang Z, Eeckhout L. Intra-cluster coalescing to reduce GPU NoC pressure. In: 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS). IEEE; 2018. p. 990–9.
IEEE
[1]
L. Wang, X. Zhao, D. Kaeli, Z. Wang, and L. Eeckhout, “Intra-cluster coalescing to reduce GPU NoC pressure,” in 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), Vancouver, CANADA, 2018, pp. 990–999.
@inproceedings{8585875,
  abstract     = {{GPUs continue to increase the number of streaming multiprocessors (SMs) to provide increasingly higher compute capabilities. To construct a scalable crossbar network-on-chip (NoC) that connects the SMs to the memory controllers, a cluster structure is introduced in modern GPUs in which several SMs are grouped together to share a network port. Because of network port sharing, clustered GPUs face severe NoC congestion, which creates a critical performance bottleneck.

In this paper, we target redundant network traffic to mitigate GPU NoC congestion. In particular, we observe that in many GPU-compute applications, different SMs in a cluster access shared data. Issuing redundant requests to access the same memory location wastes valuable NoC bandwidth - we find on average 19.4% (and up to 48%) of the requests to be redundant. To reduce redundant NoC traffic, we propose intracluster coalescing (ICC) to merge memory requests from different SMs in a cluster. Our evaluation results show that ICC achieves an average performance improvement of 9.7% (and up to 33%) over a conventional design.}},
  author       = {{Wang, Lu and Zhao, Xia and Kaeli, David and Wang, Zhiying and Eeckhout, Lieven}},
  booktitle    = {{2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)}},
  isbn         = {{9781538643686}},
  issn         = {{1530-2075}},
  language     = {{eng}},
  location     = {{Vancouver, CANADA}},
  pages        = {{990--999}},
  publisher    = {{IEEE}},
  title        = {{Intra-cluster coalescing to reduce GPU NoC pressure}},
  url          = {{http://dx.doi.org/10.1109/ipdps.2018.00108}},
  year         = {{2018}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: