Advanced search
2 files | 4.16 MB Add to list

CD-Xbar : a converge-diverge crossbar network for high-performance GPUs

(2019) IEEE TRANSACTIONS ON COMPUTERS. 68(9). p.1283-1296
Author
Organization
Abstract
Modern GPUs feature an increasing number of streaming multiprocessors (SMs) to boost system throughput. How to construct an efficient and scalable network-on-chip (NoC) for future high-performance GPUs is particularly critical. Although a mesh network is a widely used NoC topology in manycore CPUs for scalability and simplicity reasons, it is ill-suited to GPUs because of the many-to-few-to-many traffic pattern observed in GPU-compute workloads. Although a crossbar NoC is a natural fit, it does not scale to large SM counts while operating at high frequency. In this paper, we propose the converge-diverge crossbar (CD-Xbar) network with round-robin routing and topology-aware concurrent thread array (CTA) scheduling. CD-Xbar consists of two types of crossbars, a local crossbar and a global crossbar. A local crossbar converges input ports from the SMs into so-called converged ports; the global crossbar diverges these converged ports to the last-level cache (LLC) slices and memory controllers. CD-Xbar provides routing path diversity through the converged ports. Round-robin routing and topology-aware CTA scheduling balance network traffic among the converged ports within a local crossbar and across crossbars, respectively. Compared to a mesh with the same bisection bandwidth, CD-Xbar reduces NoC active silicon area and power consumption by 52.5 and 48.5 percent, respectively, while at the same time improving performance by 13.9 percent on average. CD-Xbar performs within 2.9 percent of an idealized fully-connected crossbar. We further demonstrate CD-Xbar's scalability, flexibility and improved performance perWatt (by 17.1 percent) over state-of-the-art GPU NoCs which are highly customized and non-scalable.
Keywords
Theoretical Computer Science, Hardware and Architecture, Computational Theory and Mathematics, Software, Graphics processing unit (GPU), network-on-chip (NoC), crossbar, CHIP, PROCESSORS, BANDWIDTH, NOC

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.80 MB
  • 8625880 accepted.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 2.36 MB

Citation

Please use this url to cite or link to this publication:

MLA
Zhao, Xia, et al. “CD-Xbar : A Converge-Diverge Crossbar Network for High-Performance GPUs.” IEEE TRANSACTIONS ON COMPUTERS, vol. 68, no. 9, 2019, pp. 1283–96, doi:10.1109/tc.2019.2906869.
APA
Zhao, X., Ma, S., Wang, Z., Jerger, N. E., & Eeckhout, L. (2019). CD-Xbar : a converge-diverge crossbar network for high-performance GPUs. IEEE TRANSACTIONS ON COMPUTERS, 68(9), 1283–1296. https://doi.org/10.1109/tc.2019.2906869
Chicago author-date
Zhao, Xia, Sheng Ma, Zhiying Wang, Natalie Enright Jerger, and Lieven Eeckhout. 2019. “CD-Xbar : A Converge-Diverge Crossbar Network for High-Performance GPUs.” IEEE TRANSACTIONS ON COMPUTERS 68 (9): 1283–96. https://doi.org/10.1109/tc.2019.2906869.
Chicago author-date (all authors)
Zhao, Xia, Sheng Ma, Zhiying Wang, Natalie Enright Jerger, and Lieven Eeckhout. 2019. “CD-Xbar : A Converge-Diverge Crossbar Network for High-Performance GPUs.” IEEE TRANSACTIONS ON COMPUTERS 68 (9): 1283–1296. doi:10.1109/tc.2019.2906869.
Vancouver
1.
Zhao X, Ma S, Wang Z, Jerger NE, Eeckhout L. CD-Xbar : a converge-diverge crossbar network for high-performance GPUs. IEEE TRANSACTIONS ON COMPUTERS. 2019;68(9):1283–96.
IEEE
[1]
X. Zhao, S. Ma, Z. Wang, N. E. Jerger, and L. Eeckhout, “CD-Xbar : a converge-diverge crossbar network for high-performance GPUs,” IEEE TRANSACTIONS ON COMPUTERS, vol. 68, no. 9, pp. 1283–1296, 2019.
@article{8625880,
  abstract     = {{Modern GPUs feature an increasing number of streaming multiprocessors (SMs) to boost system throughput. How to construct an efficient and scalable network-on-chip (NoC) for future high-performance GPUs is particularly critical. Although a mesh network is a widely used NoC topology in manycore CPUs for scalability and simplicity reasons, it is ill-suited to GPUs because of the many-to-few-to-many traffic pattern observed in GPU-compute workloads. Although a crossbar NoC is a natural fit, it does not scale to large SM counts while operating at high frequency. In this paper, we propose the converge-diverge crossbar (CD-Xbar) network with round-robin routing and topology-aware concurrent thread array (CTA) scheduling. CD-Xbar consists of two types of crossbars, a local crossbar and a global crossbar. A local crossbar converges input ports from the SMs into so-called converged ports; the global crossbar diverges these converged ports to the last-level cache (LLC) slices and memory controllers. CD-Xbar provides routing path diversity through the converged ports. Round-robin routing and topology-aware CTA scheduling balance network traffic among the converged ports within a local crossbar and across crossbars, respectively. Compared to a mesh with the same bisection bandwidth, CD-Xbar reduces NoC active silicon area and power consumption by 52.5 and 48.5 percent, respectively, while at the same time improving performance by 13.9 percent on average. CD-Xbar performs within 2.9 percent of an idealized fully-connected crossbar. We further demonstrate CD-Xbar's scalability, flexibility and improved performance perWatt (by 17.1 percent) over state-of-the-art GPU NoCs which are highly customized and non-scalable.}},
  author       = {{Zhao, Xia and Ma, Sheng and Wang, Zhiying and Jerger, Natalie Enright and Eeckhout, Lieven}},
  issn         = {{0018-9340}},
  journal      = {{IEEE TRANSACTIONS ON COMPUTERS}},
  keywords     = {{Theoretical Computer Science,Hardware and Architecture,Computational Theory and Mathematics,Software,Graphics processing unit (GPU),network-on-chip (NoC),crossbar,CHIP,PROCESSORS,BANDWIDTH,NOC}},
  language     = {{eng}},
  number       = {{9}},
  pages        = {{1283--1296}},
  title        = {{CD-Xbar : a converge-diverge crossbar network for high-performance GPUs}},
  url          = {{http://dx.doi.org/10.1109/tc.2019.2906869}},
  volume       = {{68}},
  year         = {{2019}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: