Advanced search
1 file | 534.61 KB Add to list

Thread similarity matrix: visualizing branch divergence in GPGPU programs

Author
Organization
Abstract
Graphics processing units (GPUs) have recently evolved into popular accelerators for general-purpose parallel programs—so-called GPGPU computing. Although program- ming models such as CUDA and OpenCL significantly improve GPGPU programmability, optimizing GPGPU programs is still far from trivial. Branch divergence is one of the root causes reducing GPGPU performance. Existing approaches are able to calculate the branch divergence rate but are unable to reveal how the branches diverge in a GPGPU program. In this paper, we propose the Thread Similarity Matrix (TSM) to visualize how branches diverge and in turn help find optimization opportunities. TSM contains an element for each pair of threads, representing the difference in code being executed by the pair of threads. The darker the element, the more similar the threads are; the lighter, the more dissimilar. TSM therefore allows GPGPU programmers to easily understand an application’s branch divergence behavior and pinpoint performance anomalies. We present a case study to demonstrate how TSM can help optimize GPGPU programs: we improve the performance of a highly-optimized GPGPU kernel by 35% by reorganizing its thread organization to reduce its branch divergence rate.
Keywords
GPGPU, Workload Characterization, Performance Optimization

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 534.61 KB

Citation

Please use this url to cite or link to this publication:

MLA
Yu, Zhibin, et al. “Thread Similarity Matrix: Visualizing Branch Divergence in GPGPU Programs.” Proceedings of the International Conference on Parallel Processing, 2016, pp. 179–84, doi:10.1109/ICPP.2016.27.
APA
Yu, Z., Eeckhout, L., & Xu, C. (2016). Thread similarity matrix: visualizing branch divergence in GPGPU programs. Proceedings of the International Conference on Parallel Processing, 179–184. https://doi.org/10.1109/ICPP.2016.27
Chicago author-date
Yu, Zhibin, Lieven Eeckhout, and Chengzhong Xu. 2016. “Thread Similarity Matrix: Visualizing Branch Divergence in GPGPU Programs.” In Proceedings of the International Conference on Parallel Processing, 179–84. https://doi.org/10.1109/ICPP.2016.27.
Chicago author-date (all authors)
Yu, Zhibin, Lieven Eeckhout, and Chengzhong Xu. 2016. “Thread Similarity Matrix: Visualizing Branch Divergence in GPGPU Programs.” In Proceedings of the International Conference on Parallel Processing, 179–184. doi:10.1109/ICPP.2016.27.
Vancouver
1.
Yu Z, Eeckhout L, Xu C. Thread similarity matrix: visualizing branch divergence in GPGPU programs. In: Proceedings of the International Conference on Parallel Processing. 2016. p. 179–84.
IEEE
[1]
Z. Yu, L. Eeckhout, and C. Xu, “Thread similarity matrix: visualizing branch divergence in GPGPU programs,” in Proceedings of the International Conference on Parallel Processing, Philadelphia, PA, 2016, pp. 179–184.
@inproceedings{8173010,
  abstract     = {{Graphics processing units (GPUs) have recently evolved into popular accelerators for general-purpose parallel programs—so-called GPGPU computing. Although program- ming models such as CUDA and OpenCL significantly improve GPGPU programmability, optimizing GPGPU programs is still far from trivial. Branch divergence is one of the root causes reducing GPGPU performance. Existing approaches are able to calculate the branch divergence rate but are unable to reveal how the branches diverge in a GPGPU program. In this paper, we propose the Thread Similarity Matrix (TSM) to visualize how branches diverge and in turn help find optimization opportunities. TSM contains an element for each pair of threads, representing the difference in code being executed by the pair of threads. The darker the element, the more similar the threads are; the lighter, the more dissimilar. TSM therefore allows GPGPU programmers to easily understand an application’s branch divergence behavior and pinpoint performance anomalies. We present a case study to demonstrate how TSM can help optimize GPGPU programs: we improve the performance of a highly-optimized GPGPU kernel by 35% by reorganizing its thread organization to reduce its branch divergence rate.}},
  author       = {{Yu, Zhibin and Eeckhout, Lieven and Xu, Chengzhong}},
  booktitle    = {{Proceedings of the International Conference on Parallel Processing}},
  isbn         = {{978-1-5090-2823-8}},
  issn         = {{0190-3918}},
  keywords     = {{GPGPU,Workload Characterization,Performance Optimization}},
  language     = {{eng}},
  location     = {{Philadelphia, PA}},
  pages        = {{179--184}},
  title        = {{Thread similarity matrix: visualizing branch divergence in GPGPU programs}},
  url          = {{http://doi.org/10.1109/ICPP.2016.27}},
  year         = {{2016}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: