Advanced search
1 file | 1.63 MB Add to list

The forward slice core : a high-performance, yet low-complexity microarchitecture

Author
Organization
Project
Abstract
Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.
Keywords
Superscalar microarchitecture, slice-out-of-order, dynamic instruction, scheduling

Downloads

  • taco2022-FSC.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 1.63 MB

Citation

Please use this url to cite or link to this publication:

MLA
Lakshminarasimhan, Kartik, et al. “The Forward Slice Core : A High-Performance, yet Low-Complexity Microarchitecture.” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, vol. 19, no. 2, 2022, doi:10.1145/3499424.
APA
Lakshminarasimhan, K., Naithani, A., Feliu, J., & Eeckhout, L. (2022). The forward slice core : a high-performance, yet low-complexity microarchitecture. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 19(2). https://doi.org/10.1145/3499424
Chicago author-date
Lakshminarasimhan, Kartik, Ajeya Naithani, Josue Feliu, and Lieven Eeckhout. 2022. “The Forward Slice Core : A High-Performance, yet Low-Complexity Microarchitecture.” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 19 (2). https://doi.org/10.1145/3499424.
Chicago author-date (all authors)
Lakshminarasimhan, Kartik, Ajeya Naithani, Josue Feliu, and Lieven Eeckhout. 2022. “The Forward Slice Core : A High-Performance, yet Low-Complexity Microarchitecture.” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 19 (2). doi:10.1145/3499424.
Vancouver
1.
Lakshminarasimhan K, Naithani A, Feliu J, Eeckhout L. The forward slice core : a high-performance, yet low-complexity microarchitecture. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. 2022;19(2).
IEEE
[1]
K. Lakshminarasimhan, A. Naithani, J. Feliu, and L. Eeckhout, “The forward slice core : a high-performance, yet low-complexity microarchitecture,” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, vol. 19, no. 2, 2022.
@article{8764672,
  abstract     = {{Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.}},
  articleno    = {{17}},
  author       = {{Lakshminarasimhan, Kartik and Naithani, Ajeya and Feliu, Josue and Eeckhout, Lieven}},
  issn         = {{1544-3566}},
  journal      = {{ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION}},
  keywords     = {{Superscalar microarchitecture,slice-out-of-order,dynamic instruction,scheduling}},
  language     = {{eng}},
  number       = {{2}},
  pages        = {{25}},
  title        = {{The forward slice core : a high-performance, yet low-complexity microarchitecture}},
  url          = {{http://doi.org/10.1145/3499424}},
  volume       = {{19}},
  year         = {{2022}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: