The forward slice core : a high-performance, yet low-complexity microarchitecture
- Author
- Kartik Lakshminarasimhan (UGent) , Ajeya Naithani (UGent) , Josue Feliu and Lieven Eeckhout (UGent)
- Organization
- Project
-
- Load Slice Core (Load Slice Core: A Power and Cost-Efficient Microarchitecture for the Future)
- Dynamic Power Management in Heterogeneous Multi-Core Processors
- Abstract
- Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.
- Keywords
- Superscalar microarchitecture, slice-out-of-order, dynamic instruction, scheduling
Downloads
-
taco2022-FSC.pdf
- full text (Published version)
- |
- open access
- |
- |
- 1.63 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8764672
- MLA
- Lakshminarasimhan, Kartik, et al. “The Forward Slice Core : A High-Performance, yet Low-Complexity Microarchitecture.” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, vol. 19, no. 2, 2022, doi:10.1145/3499424.
- APA
- Lakshminarasimhan, K., Naithani, A., Feliu, J., & Eeckhout, L. (2022). The forward slice core : a high-performance, yet low-complexity microarchitecture. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 19(2). https://doi.org/10.1145/3499424
- Chicago author-date
- Lakshminarasimhan, Kartik, Ajeya Naithani, Josue Feliu, and Lieven Eeckhout. 2022. “The Forward Slice Core : A High-Performance, yet Low-Complexity Microarchitecture.” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 19 (2). https://doi.org/10.1145/3499424.
- Chicago author-date (all authors)
- Lakshminarasimhan, Kartik, Ajeya Naithani, Josue Feliu, and Lieven Eeckhout. 2022. “The Forward Slice Core : A High-Performance, yet Low-Complexity Microarchitecture.” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 19 (2). doi:10.1145/3499424.
- Vancouver
- 1.Lakshminarasimhan K, Naithani A, Feliu J, Eeckhout L. The forward slice core : a high-performance, yet low-complexity microarchitecture. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. 2022;19(2).
- IEEE
- [1]K. Lakshminarasimhan, A. Naithani, J. Feliu, and L. Eeckhout, “The forward slice core : a high-performance, yet low-complexity microarchitecture,” ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, vol. 19, no. 2, 2022.
@article{8764672, abstract = {{Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.}}, articleno = {{17}}, author = {{Lakshminarasimhan, Kartik and Naithani, Ajeya and Feliu, Josue and Eeckhout, Lieven}}, issn = {{1544-3566}}, journal = {{ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION}}, keywords = {{Superscalar microarchitecture,slice-out-of-order,dynamic instruction,scheduling}}, language = {{eng}}, number = {{2}}, pages = {{25}}, title = {{The forward slice core : a high-performance, yet low-complexity microarchitecture}}, url = {{http://doi.org/10.1145/3499424}}, volume = {{19}}, year = {{2022}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: