Advanced search
1 file | 513.36 KB Add to list
Author
Organization
Abstract
Runahead execution improves processor performance by accurately prefetching long-latency memory accesses. When a long-latency load causes the instruction window to fill up and halt the pipeline, the processor enters runahead mode and keeps speculatively executing code to trigger accurate prefetches. A recent improvement tracks the chain of instructions that leads to the long-latency load, stores it in a runahead buffer, and executes only this chain during runahead execution, with the purpose of generating more prefetch requests. Unfortunately, all prior runahead proposals have shortcomings that limit performance and energy efficiency because they release processor state when entering runahead mode and then need to re-fill the pipeline to restart normal operation. Moreover, runahead buffer limits prefetch coverage by tracking only a single chain of instructions that leads to the same long-latency load. We propose precise runahead execution (PRE) which builds on the key observation that when entering runahead mode, the processor has enough issue queue and physical register file resources to speculatively execute instructions. This mitigates the need to release and re-fill processor state in the ROB, issue queue, and physical register file. In addition, PRE pre executes only those instructions in runahead mode that lead to full-window stalls, using a novel register renaming mechanism to quickly free physical registers in runahead mode, further improving efficiency and effectiveness. Finally, PRE optionally buffers decoded runahead micro-ops in the front-end to save energy. Our experimental evaluation using a set of memory-intensive applications shows that PRE achieves an additional 18.2% performance improvement over the recent runahead proposals while at the same time reducing energy consumption by 6.8%.
Keywords
PROCESSOR RESOURCES, PERFORMANCE, CACHE

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 513.36 KB

Citation

Please use this url to cite or link to this publication:

MLA
Naithani, Ajeya, et al. “Precise Runahead Execution.” 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), IEEE, 2020, pp. 397–410, doi:10.1109/hpca47549.2020.00040.
APA
Naithani, A., Feliu Pérez, J., Adileh, A., & Eeckhout, L. (2020). Precise runahead execution. In 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020) (pp. 397–410). San Diego, CA: IEEE. https://doi.org/10.1109/hpca47549.2020.00040
Chicago author-date
Naithani, Ajeya, Josué Feliu Pérez, Almutaz Adileh, and Lieven Eeckhout. 2020. “Precise Runahead Execution.” In 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 397–410. IEEE. https://doi.org/10.1109/hpca47549.2020.00040.
Chicago author-date (all authors)
Naithani, Ajeya, Josué Feliu Pérez, Almutaz Adileh, and Lieven Eeckhout. 2020. “Precise Runahead Execution.” In 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 397–410. IEEE. doi:10.1109/hpca47549.2020.00040.
Vancouver
1.
Naithani A, Feliu Pérez J, Adileh A, Eeckhout L. Precise runahead execution. In: 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020). IEEE; 2020. p. 397–410.
IEEE
[1]
A. Naithani, J. Feliu Pérez, A. Adileh, and L. Eeckhout, “Precise runahead execution,” in 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), San Diego, CA, 2020, pp. 397–410.
@inproceedings{8668193,
  abstract     = {Runahead execution improves processor performance by accurately prefetching long-latency memory accesses. When a long-latency load causes the instruction window to fill up and halt the pipeline, the processor enters runahead mode and keeps speculatively executing code to trigger accurate prefetches. A recent improvement tracks the chain of instructions that leads to the long-latency load, stores it in a runahead buffer, and executes only this chain during runahead execution, with the purpose of generating more prefetch requests. Unfortunately, all prior runahead proposals have shortcomings that limit performance and energy efficiency because they release processor state when entering runahead mode and then need to re-fill the pipeline to restart normal operation. Moreover, runahead buffer limits prefetch coverage by tracking only a single chain of instructions that leads to the same long-latency load.

We propose precise runahead execution (PRE) which builds on the key observation that when entering runahead mode, the processor has enough issue queue and physical register file resources to speculatively execute instructions. This mitigates the need to release and re-fill processor state in the ROB, issue queue, and physical register file. In addition, PRE pre executes only those instructions in runahead mode that lead to full-window stalls, using a novel register renaming mechanism to quickly free physical registers in runahead mode, further improving efficiency and effectiveness. Finally, PRE optionally buffers decoded runahead micro-ops in the front-end to save energy. Our experimental evaluation using a set of memory-intensive applications shows that PRE achieves an additional 18.2% performance improvement over the recent runahead proposals while at the same time reducing energy consumption by 6.8%.},
  author       = {Naithani, Ajeya and Feliu Pérez, Josué and Adileh, Almutaz and Eeckhout, Lieven},
  booktitle    = {2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020)},
  isbn         = {9781728161495},
  issn         = {1530-0897},
  keywords     = {PROCESSOR RESOURCES,PERFORMANCE,CACHE},
  language     = {eng},
  location     = {San Diego, CA},
  pages        = {397--410},
  publisher    = {IEEE},
  title        = {Precise runahead execution},
  url          = {http://dx.doi.org/10.1109/hpca47549.2020.00040},
  year         = {2020},
}

Altmetric
View in Altmetric
Web of Science
Times cited: