Advanced search
1 file | 806.58 KB Add to list

Per-thread cycle accounting in multicore processors

Kristof Du Bois (UGent) , Stijn Eyerman (UGent) and Lieven Eeckhout (UGent)
Author
Organization
Abstract
While multicore processors improve overall chip throughput and hardware utilization, resource sharing among the cores leads to unpredictable performance for the individual threads running on a multicore processor. Unpredictable per-thread performance becomes a problem when considered in the context of multicore scheduling: system software assumes that all threads make equal progress, however, this is not what the hardware provides. This may lead to problems at the system level such as missed deadlines, reduced quality-of-service, non-satisfied service-level agreements, unbalanced parallel performance, priority inversion, unpredictable interactive performance, etc. This article proposes a hardware-efficient per-thread cycle accounting architecture for multicore processors. The counter architecture tracks per-thread progress in a multicore processor, detects how inter-thread interference affects per-thread performance, and predicts the execution time for each thread if run in isolation. The counter architecture captures the effects of additional conflict misses due to cache sharing as well as increased latency for other memory accesses due to resource and bandwidth contention in the memory subsystem. The proposed method accounts for 74.3% of the interference cycles, and estimates per-thread progress within 14.2% on average across a large set of multi-program workloads. Hardware cost is limited to 7.44KB for an 8-core processor, a reduction by almost 10× compared to prior work while being 63.8% more accurate. Making system software progress aware improves fairness by 22.5% on average over progress-agnostic scheduling.
Keywords
PERFORMANCE FAIRNESS, CHIP MULTIPROCESSORS, CACHES, MANAGEMENT, SYSTEMS, Design, Experimentation, Measurement, Performance, Multicore processors, hardware/software interface, scheduling, resource sharing, interference, performance analysis

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 806.58 KB

Citation

Please use this url to cite or link to this publication:

MLA
Du Bois, Kristof, Stijn Eyerman, and Lieven Eeckhout. “Per-thread Cycle Accounting in Multicore Processors.” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 9.4 (2013): 1–22. Print.
APA
Du Bois, K., Eyerman, S., & Eeckhout, L. (2013). Per-thread cycle accounting in multicore processors. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 9(4), 1–22.
Chicago author-date
Du Bois, Kristof, Stijn Eyerman, and Lieven Eeckhout. 2013. “Per-thread Cycle Accounting in Multicore Processors.” Acm Transactions on Architecture and Code Optimization 9 (4): 1–22.
Chicago author-date (all authors)
Du Bois, Kristof, Stijn Eyerman, and Lieven Eeckhout. 2013. “Per-thread Cycle Accounting in Multicore Processors.” Acm Transactions on Architecture and Code Optimization 9 (4): 1–22.
Vancouver
1.
Du Bois K, Eyerman S, Eeckhout L. Per-thread cycle accounting in multicore processors. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. 2013;9(4):1–22.
IEEE
[1]
K. Du Bois, S. Eyerman, and L. Eeckhout, “Per-thread cycle accounting in multicore processors,” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, vol. 9, no. 4, pp. 1–22, 2013.
@article{3106245,
  abstract     = {While multicore processors improve overall chip throughput and hardware utilization, resource sharing among the cores leads to unpredictable performance for the individual threads running on a multicore processor. Unpredictable per-thread performance becomes a problem when considered in the context of multicore scheduling: system software assumes that all threads make equal progress, however, this is not what the hardware provides. This may lead to problems at the system level such as missed deadlines, reduced quality-of-service, non-satisfied service-level agreements, unbalanced parallel performance, priority inversion, unpredictable interactive performance, etc. This article proposes a hardware-efficient per-thread cycle accounting architecture for multicore processors. The counter architecture tracks per-thread progress in a multicore processor, detects how inter-thread interference affects per-thread performance, and predicts the execution time for each thread if run in isolation. The counter architecture captures the effects of additional conflict misses due to cache sharing as well as increased latency for other memory accesses due to resource and bandwidth contention in the memory subsystem. The proposed method accounts for 74.3% of the interference cycles, and estimates per-thread progress within 14.2% on average across a large set of multi-program workloads. Hardware cost is limited to 7.44KB for an 8-core processor, a reduction by almost 10× compared to prior work while being 63.8% more accurate. Making system software progress aware improves fairness by 22.5% on average over progress-agnostic scheduling.},
  articleno    = {29},
  author       = {Du Bois, Kristof and Eyerman, Stijn and Eeckhout, Lieven},
  issn         = {1544-3566},
  journal      = {ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION},
  keywords     = {PERFORMANCE FAIRNESS,CHIP MULTIPROCESSORS,CACHES,MANAGEMENT,SYSTEMS,Design,Experimentation,Measurement,Performance,Multicore processors,hardware/software interface,scheduling,resource sharing,interference,performance analysis},
  language     = {eng},
  number       = {4},
  pages        = {29:1--29:22},
  title        = {Per-thread cycle accounting in multicore processors},
  url          = {http://dx.doi.org/10.1145/2400682.2400688},
  volume       = {9},
  year         = {2013},
}

Altmetric
View in Altmetric
Web of Science
Times cited: