Advanced search
1 file | 2.57 MB Add to list

Classification-driven search for effective sm partitioning in multitasking GPUs

Xia Zhao (UGent) , Zhiying Wang and Lieven Eeckhout (UGent)
Author
Organization
Abstract
Graphics processing units (GPUs) feature an increasing number of streaming multiprocessors (SMs) with each successive generation. At the same time, GPUs are increasingly widely adopted in cloud services and data centers to accelerate general-purpose workloads. Running multiple applications on a GPU in such environments requires effective multitasking support. Spatial multitasking in which independent applications co-execute on different sets of SMs is a promising solution to share GPU resources. Unfortunately, how to effectively partition SMs is an open problem. In this paper, we observe that compared to widely-used even partitioning, dynamic SM partitioning based on the characteristics of the co-executing applications can significantly improve performance and power efficiency. Unfortunately, finding an effective SM partition is challenging because the number of possible combinations increases exponentially with the number of SMs and co-executing applications. Through offline analysis, we find that first classifying workloads, and then searching an effective SM partition based on the workload characteristics can significantly reduce the search space, making dynamic SM partitioning tractable. Based on these insights, we propose Classification-Driven search (CD-search) for low-overhead dynamic SM partitioning in multitasking GPUs. CD-search first classifies workloads using a novel off-SM bandwidth model, after which it enters the performance mode or power mode depending on the workload's characteristics. Both modes follow a specific search strategy to quickly determine the optimum SM partition. Our evaluation shows that CD-search improves system throughput by 10.4% on average (and up to 62.9%) over even partitioning for workloads that are classified for the performance mode. For workloads classified for the power mode, CD-search reduces power consumption by 25% on average (and up to 41.2%). CD-search incurs limited runtime overhead.
Keywords
GPU, multitasking, SM partitioning

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 2.57 MB

Citation

Please use this url to cite or link to this publication:

MLA
Zhao, Xia, et al. “Classification-Driven Search for Effective Sm Partitioning in Multitasking GPUs.” INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), ASSOC Computing Machinery, 2018, pp. 65–75, doi:10.1145/3205289.3205311.
APA
Zhao, X., Wang, Z., & Eeckhout, L. (2018). Classification-driven search for effective sm partitioning in multitasking GPUs. In INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018) (pp. 65–75). New York: ASSOC Computing Machinery. https://doi.org/10.1145/3205289.3205311
Chicago author-date
Zhao, Xia, Zhiying Wang, and Lieven Eeckhout. 2018. “Classification-Driven Search for Effective Sm Partitioning in Multitasking GPUs.” In INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 65–75. New York: ASSOC Computing Machinery. https://doi.org/10.1145/3205289.3205311.
Chicago author-date (all authors)
Zhao, Xia, Zhiying Wang, and Lieven Eeckhout. 2018. “Classification-Driven Search for Effective Sm Partitioning in Multitasking GPUs.” In INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 65–75. New York: ASSOC Computing Machinery. doi:10.1145/3205289.3205311.
Vancouver
1.
Zhao X, Wang Z, Eeckhout L. Classification-driven search for effective sm partitioning in multitasking GPUs. In: INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018). New York: ASSOC Computing Machinery; 2018. p. 65–75.
IEEE
[1]
X. Zhao, Z. Wang, and L. Eeckhout, “Classification-driven search for effective sm partitioning in multitasking GPUs,” in INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), Beijing, China, 2018, pp. 65–75.
@inproceedings{8589946,
  abstract     = {{Graphics processing units (GPUs) feature an increasing number of streaming multiprocessors (SMs) with each successive generation. At the same time, GPUs are increasingly widely adopted in cloud services and data centers to accelerate general-purpose workloads. Running multiple applications on a GPU in such environments requires effective multitasking support. Spatial multitasking in which independent applications co-execute on different sets of SMs is a promising solution to share GPU resources. Unfortunately, how to effectively partition SMs is an open problem.

In this paper, we observe that compared to widely-used even partitioning, dynamic SM partitioning based on the characteristics of the co-executing applications can significantly improve performance and power efficiency. Unfortunately, finding an effective SM partition is challenging because the number of possible combinations increases exponentially with the number of SMs and co-executing applications. Through offline analysis, we find that first classifying workloads, and then searching an effective SM partition based on the workload characteristics can significantly reduce the search space, making dynamic SM partitioning tractable.

Based on these insights, we propose Classification-Driven search (CD-search) for low-overhead dynamic SM partitioning in multitasking GPUs. CD-search first classifies workloads using a novel off-SM bandwidth model, after which it enters the performance mode or power mode depending on the workload's characteristics. Both modes follow a specific search strategy to quickly determine the optimum SM partition. Our evaluation shows that CD-search improves system throughput by 10.4% on average (and up to 62.9%) over even partitioning for workloads that are classified for the performance mode. For workloads classified for the power mode, CD-search reduces power consumption by 25% on average (and up to 41.2%). CD-search incurs limited runtime overhead.}},
  author       = {{Zhao, Xia and Wang, Zhiying and Eeckhout, Lieven}},
  booktitle    = {{INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018)}},
  isbn         = {{9781450357838}},
  keywords     = {{GPU,multitasking,SM partitioning}},
  language     = {{eng}},
  location     = {{Beijing, China}},
  pages        = {{65--75}},
  publisher    = {{ASSOC Computing Machinery}},
  title        = {{Classification-driven search for effective sm partitioning in multitasking GPUs}},
  url          = {{http://dx.doi.org/10.1145/3205289.3205311}},
  year         = {{2018}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: