The forward slice core microarchitecture
- Author
- Kartik Lakshminarasimhan, Ajeya Naithani (UGent) , Josué Feliu and Lieven Eeckhout (UGent)
- Organization
- Abstract
- Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this paper, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. FSC improves performance by 9.7% on average compared to Freeway, the state-of-the-art sOoO core, across the SPEC CPU2017 benchmarks, while incurring reduced hardware complexity and a similar power budget.
Downloads
-
pact20.pdf
- full text (Accepted manuscript)
- |
- open access
- |
- |
- 1.86 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8692318
- MLA
- Lakshminarasimhan, Kartik, et al. “The Forward Slice Core Microarchitecture.” PACT’20 : Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, 2020, pp. 361–72, doi:10.1145/3410463.3414629.
- APA
- Lakshminarasimhan, K., Naithani, A., Feliu, J., & Eeckhout, L. (2020). The forward slice core microarchitecture. PACT’20 : Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, 361–372. https://doi.org/10.1145/3410463.3414629
- Chicago author-date
- Lakshminarasimhan, Kartik, Ajeya Naithani, Josué Feliu, and Lieven Eeckhout. 2020. “The Forward Slice Core Microarchitecture.” In PACT’20 : Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, 361–72. https://doi.org/10.1145/3410463.3414629.
- Chicago author-date (all authors)
- Lakshminarasimhan, Kartik, Ajeya Naithani, Josué Feliu, and Lieven Eeckhout. 2020. “The Forward Slice Core Microarchitecture.” In PACT’20 : Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, 361–372. doi:10.1145/3410463.3414629.
- Vancouver
- 1.Lakshminarasimhan K, Naithani A, Feliu J, Eeckhout L. The forward slice core microarchitecture. In: PACT’20 : Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 2020. p. 361–72.
- IEEE
- [1]K. Lakshminarasimhan, A. Naithani, J. Feliu, and L. Eeckhout, “The forward slice core microarchitecture,” in PACT’20 : Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, Virtual Event GA USA, 2020, pp. 361–372.
@inproceedings{8692318,
abstract = {{Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table.
In this paper, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. FSC improves performance by 9.7% on average compared to Freeway, the state-of-the-art sOoO core, across the SPEC CPU2017 benchmarks, while incurring reduced hardware complexity and a similar power budget.}},
author = {{Lakshminarasimhan, Kartik and Naithani, Ajeya and Feliu, Josué and Eeckhout, Lieven}},
booktitle = {{PACT'20 : Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques}},
isbn = {{9781450380751}},
issn = {{1089-795X}},
language = {{eng}},
location = {{Virtual Event GA USA}},
pages = {{361--372}},
title = {{The forward slice core microarchitecture}},
url = {{http://doi.org/10.1145/3410463.3414629}},
year = {{2020}},
}
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: