Advanced search
1 file | 1.17 MB

Two-level hybrid sampled simulation of multithreaded applications

Author
Organization
Abstract
Sampled microarchitectural simulation of single-threaded applications is mature technology for over a decade now. Sampling multithreaded applications, on the other hand, is much more complicated. Not until very recently have researchers proposed solutions for sampled simulation of multithreaded applications. Time-Based Sampling (TBS) samples multithreaded application execution based on time—not instructions as is typically done for single-threaded applications—yielding estimates for a multithreaded application’s execution time. In this article, we revisit and analyze previously proposed TBS approaches (periodic and cantor fractal based sampling), and we obtain a number of novel and surprising insights, such as (i) accurately estimating fast-forwarding IPC, that is, performance in-between sampling units, is more important than accurately estimating sample IPC, that is, performance within the sampling units; (ii) fast-forwarding IPC estimation accuracy is determined by both the sampling unit distribution and how to use the sampling units to predict fast-forwarding IPC; and (iii) cantor sampling is more accurate at small sampling unit sizes, whereas periodic is more accurate at large sampling unit sizes. These insights lead to the development of Two-level Hybrid Sampling (THS), a novel sampling methodology for multithreaded applications that combines periodic sampling’s accuracy at large time scales (i.e., uniformly selecting coarse-grain sampling units across the entire program execution) with cantor sampling’s accuracy at small time scales (i.e., the ability to accurately predict fast-forwarding IPC in-between small sampling units). The clustered occurrence of small sampling units under cantor sampling also enables shortened warmup and thus enhanced simulation speed. Overall, THS achieves an average absolute execution time prediction error of 4% while yielding an average simulation speedup of 40× compared to detailed simulation, which is both more accurate and faster than the current state-of-the-art. Case studies illustrate THS’ ability to accurately predict relative performance differences across the design space.
Keywords
microarchitecture simulation, sampling, multicore processor, multithreaded workloads

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.17 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Jiang, Chuntao, Zhibin Yu, Lieven Eeckhout, Hai Jin, Xiaofei Liao, and Cheng-Zhong Xu. 2016. “Two-level Hybrid Sampled Simulation of Multithreaded Applications.” Acm Transactions on Architecture and Code Optimization 12 (4).
APA
Jiang, C., Yu, Z., Eeckhout, L., Jin, H., Liao, X., & Xu, C.-Z. (2016). Two-level hybrid sampled simulation of multithreaded applications. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 12(4).
Vancouver
1.
Jiang C, Yu Z, Eeckhout L, Jin H, Liao X, Xu C-Z. Two-level hybrid sampled simulation of multithreaded applications. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. 2016;12(4).
MLA
Jiang, Chuntao, Zhibin Yu, Lieven Eeckhout, et al. “Two-level Hybrid Sampled Simulation of Multithreaded Applications.” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 12.4 (2016): n. pag. Print.
@article{7053863,
  abstract     = {Sampled microarchitectural simulation of single-threaded applications is mature technology for over a decade now. Sampling multithreaded applications, on the other hand, is much more complicated. Not until very
recently have researchers proposed solutions for sampled simulation of multithreaded applications. Time-Based Sampling (TBS) samples multithreaded application execution based on time---not instructions as is
typically done for single-threaded applications---yielding estimates for a multithreaded application{\textquoteright}s execution time. In this article, we revisit and analyze previously proposed TBS approaches (periodic and cantor fractal based sampling), and we obtain a number of novel and surprising insights, such as (i) accurately estimating fast-forwarding IPC, that is, performance in-between sampling units, is more important than
accurately estimating sample IPC, that is, performance within the sampling units; (ii) fast-forwarding IPC estimation accuracy is determined by both the sampling unit distribution and how to use the sampling
units to predict fast-forwarding IPC; and (iii) cantor sampling is more accurate at small sampling unit sizes, whereas periodic is more accurate at large sampling unit sizes. These insights lead to the development of Two-level Hybrid Sampling (THS), a novel sampling methodology for multithreaded applications that combines periodic sampling{\textquoteright}s accuracy at large time scales (i.e., uniformly selecting coarse-grain sampling units across the entire program execution) with cantor sampling{\textquoteright}s accuracy
at small time scales (i.e., the ability to accurately predict fast-forwarding IPC in-between small sampling units). The clustered occurrence of small sampling units under cantor sampling also enables shortened warmup and thus enhanced simulation speed. Overall, THS achieves an average absolute execution time prediction error of 4\% while yielding an average simulation speedup of 40{\texttimes} compared to detailed simulation, which is both more accurate and faster than the current state-of-the-art. Case studies illustrate THS{\textquoteright} ability to accurately predict relative performance differences across the design space.},
  articleno    = {39},
  author       = {Jiang, Chuntao and Yu, Zhibin and Eeckhout, Lieven and Jin, Hai and Liao, Xiaofei and Xu, Cheng-Zhong},
  issn         = {1544-3566},
  journal      = {ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION},
  keyword      = {microarchitecture simulation,sampling,multicore processor,multithreaded workloads},
  language     = {eng},
  number       = {4},
  pages        = {25},
  title        = {Two-level hybrid sampled simulation of multithreaded applications},
  url          = {http://dx.doi.org/10.1145/2818353},
  volume       = {12},
  year         = {2016},
}

Altmetric
View in Altmetric
Web of Science
Times cited: