Advanced search
1 file | 507.80 KB
Author
Organization
Abstract
To achieve high throughput, core count in compute accelerators such as General-Purpose Graphics Processing Units (GPGPUs) increases continuously. The communication demand of these cores boosts the demand for a low-latency packet switched network. As packet latency is mainly composed of per-hop latency, contention latency and serialization latency, a favorable Network-on-Chip (NoC) design should efficiently decrease these three latency contributors to meet the communication demand while keeping hardware cost low. In this paper, we first make two observations about the NoC differences between CMPs and GPGPUs, and then design a Heterogeneous Ring-Chain network (HRCnet) for the GPGPU reply network. HRCnet eliminates conflicts in the network by proposing a ring-similar topology, using a novel node placement and introducing unidirectional channels. Eliminating conflicts reduces the per-hop latency and removes the contention latency, and exploiting the ring-similar topology reduces the serialization latency. Experimental results show the benefits of the low-cost low-latency design. With the same bisection bandwidth compared to the baseline mesh, our work yields a 45% performance improvement while reducing the area by 42% and reducing energy consumption by 60%. Compared to two state-of-the-art GPGPU NoCs, BENoC and DA2mesh, HRCnet achieves more than 42% performance gain at reduced hardware cost. Our work also achieves the highest power and area efficiency among the designs.

Downloads

  • zhao.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 507.80 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Zhao, Xia, Sheng Ma, Chen Li, Lieven Eeckhout, and Zhiying Wang. 2016. “A Heterogeneous Low-Cost and Low-Latency Ring-Chain Network for GPGPUs.” In PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 472–479. New york: Ieee.
APA
Zhao, Xia, Ma, S., Li, C., Eeckhout, L., & Wang, Z. (2016). A Heterogeneous Low-Cost and Low-Latency Ring-Chain Network for GPGPUs. PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD) (pp. 472–479). Presented at the IEEE 34th International Conference on Computer Design (ICCD), New york: Ieee.
Vancouver
1.
Zhao X, Ma S, Li C, Eeckhout L, Wang Z. A Heterogeneous Low-Cost and Low-Latency Ring-Chain Network for GPGPUs. PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD). New york: Ieee; 2016. p. 472–9.
MLA
Zhao, Xia, Sheng Ma, Chen Li, et al. “A Heterogeneous Low-Cost and Low-Latency Ring-Chain Network for GPGPUs.” PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD). New york: Ieee, 2016. 472–479. Print.
@inproceedings{8512588,
  abstract     = {To achieve high throughput, core count in compute accelerators such as General-Purpose Graphics Processing Units (GPGPUs) increases continuously. The communication demand of these cores boosts the demand for a low-latency packet switched network. As packet latency is mainly composed of per-hop latency, contention latency and serialization latency, a favorable Network-on-Chip (NoC) design should efficiently decrease these three latency contributors to meet the communication demand while keeping hardware cost low. In this paper, we first make two observations about the NoC differences between CMPs and GPGPUs, and then design a Heterogeneous Ring-Chain network (HRCnet) for the GPGPU reply network. HRCnet eliminates conflicts in the network by proposing a ring-similar topology, using a novel node placement and introducing unidirectional channels. Eliminating conflicts reduces the per-hop latency and removes the contention latency, and exploiting the ring-similar topology reduces the serialization latency. Experimental results show the benefits of the low-cost low-latency design. With the same bisection bandwidth compared to the baseline mesh, our work yields a 45\% performance improvement while reducing the area by 42\% and reducing energy consumption by 60\%. Compared to two state-of-the-art GPGPU NoCs, BENoC and DA2mesh, HRCnet achieves more than 42\% performance gain at reduced hardware cost. Our work also achieves the highest power and area efficiency among the designs.},
  author       = {Zhao, Xia and Ma, Sheng and Li, Chen and Eeckhout, Lieven and Wang, Zhiying},
  booktitle    = {PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD)},
  isbn         = {978-1-5090-5142-7},
  issn         = {1063-6404},
  language     = {eng},
  location     = {Scottsdale, AZ},
  pages        = {472--479},
  publisher    = {Ieee},
  title        = {A Heterogeneous Low-Cost and Low-Latency Ring-Chain Network for GPGPUs},
  year         = {2016},
}

Web of Science
Times cited: