Advanced search
2 files | 5.68 MB Add to list
Author
Organization
Abstract
Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways - leading to suboptimal resource utilization when a single GPU is used to run a single application. One solution is to use the GPU in a multitasking fashion to improve utilization. Unfortunately, multitasking leads to destructive interference between co-running applications which causes fairness issues and Quality-of-Service (QoS) violations. We propose the Hybrid Slowdown Model (HSM) to dynamically and accurately predict application slowdown due to interference. HSM overcomes the low accuracy of prior white-box models, and training and implementation overheads of pure black-box models, with a hybrid approach. More specifically, the white-box component of HSM builds upon the fundamental insight that effective bandwidth utilization is proportional to DRAM row buffer hit rate, and the black-box component of HSM uses linear regression to relate row buffer hit rate to performance. HSM accurately predicts application slowdown with an average error of 6.8%, a significant improvement over the current state-of-the-art. In addition, we use HSM to guide various resource management schemes in multitasking GPUs: HSM-Fair significantly improves fairness (by 1.59x on average) compared to even partitioning, whereas HSM-QoS improves system throughput (by 18.9% on average) compared to proportional SM partitioning while maintaining the QoS target for the high-priority application in challenging mixed memory/compute-bound multi-program workloads.
Keywords
GPU, Multitasking, Slowdown Prediction, Performance Modeling, PERFORMANCE, FAIRNESS

Downloads

  • ASPLOS2020.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 1.86 MB
  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 3.82 MB

Citation

Please use this url to cite or link to this publication:

MLA
Zhao, Xia, et al. “HSM : A Hybrid Slowdown Model for Multitasking GPUs.” TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), Association for Computing Machinery (ACM), 2020, pp. 1371–85, doi:10.1145/3373376.3378457.
APA
Zhao, X., Jahre, M., & Eeckhout, L. (2020). HSM : a hybrid slowdown model for multitasking GPUs. In TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV) (pp. 1371–1385). Lausanne, SWITZERLAND: Association for Computing Machinery (ACM). https://doi.org/10.1145/3373376.3378457
Chicago author-date
Zhao, Xia, Magnus Jahre, and Lieven Eeckhout. 2020. “HSM : A Hybrid Slowdown Model for Multitasking GPUs.” In TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 1371–85. Association for Computing Machinery (ACM). https://doi.org/10.1145/3373376.3378457.
Chicago author-date (all authors)
Zhao, Xia, Magnus Jahre, and Lieven Eeckhout. 2020. “HSM : A Hybrid Slowdown Model for Multitasking GPUs.” In TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 1371–1385. Association for Computing Machinery (ACM). doi:10.1145/3373376.3378457.
Vancouver
1.
Zhao X, Jahre M, Eeckhout L. HSM : a hybrid slowdown model for multitasking GPUs. In: TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV). Association for Computing Machinery (ACM); 2020. p. 1371–85.
IEEE
[1]
X. Zhao, M. Jahre, and L. Eeckhout, “HSM : a hybrid slowdown model for multitasking GPUs,” in TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), Lausanne, SWITZERLAND, 2020, pp. 1371–1385.
@inproceedings{8676313,
  abstract     = {Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways - leading to suboptimal resource utilization when a single GPU is used to run a single application. One solution is to use the GPU in a multitasking fashion to improve utilization. Unfortunately, multitasking leads to destructive interference between co-running applications which causes fairness issues and Quality-of-Service (QoS) violations.

We propose the Hybrid Slowdown Model (HSM) to dynamically and accurately predict application slowdown due to interference. HSM overcomes the low accuracy of prior white-box models, and training and implementation overheads of pure black-box models, with a hybrid approach. More specifically, the white-box component of HSM builds upon the fundamental insight that effective bandwidth utilization is proportional to DRAM row buffer hit rate, and the black-box component of HSM uses linear regression to relate row buffer hit rate to performance. HSM accurately predicts application slowdown with an average error of 6.8%, a significant improvement over the current state-of-the-art. In addition, we use HSM to guide various resource management schemes in multitasking GPUs: HSM-Fair significantly improves fairness (by 1.59x on average) compared to even partitioning, whereas HSM-QoS improves system throughput (by 18.9% on average) compared to proportional SM partitioning while maintaining the QoS target for the high-priority application in challenging mixed memory/compute-bound multi-program workloads.},
  author       = {Zhao, Xia and Jahre, Magnus and Eeckhout, Lieven},
  booktitle    = {TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV)},
  isbn         = {9781450371025},
  keywords     = {GPU,Multitasking,Slowdown Prediction,Performance Modeling,PERFORMANCE,FAIRNESS},
  language     = {eng},
  location     = {Lausanne, SWITZERLAND},
  pages        = {1371--1385},
  publisher    = {Association for Computing Machinery (ACM)},
  title        = {HSM : a hybrid slowdown model for multitasking GPUs},
  url          = {http://dx.doi.org/10.1145/3373376.3378457},
  year         = {2020},
}

Altmetric
View in Altmetric
Web of Science
Times cited: