Advanced search
1 file | 2.73 MB

Dataflow management, dynamic load balancing, and concurrent processing for real-time embedded vision applications using Quasar

Bart Goossens (UGent)
Author
Organization
Abstract
Programming modern embedded vision systems brings various challenges, due to the steep learning curve for programmers and the different characteristics of the devices. Quasar, a new high-level programming language and development environment, considerably simplifies the development. Quasar has a compiler that detects and optimizes parallel programming patterns and a heterogeneous runtime that distributes the computational load over the available compute devices (CPUs and Graphical Processing Unit [GPUs]). In this paper, we focus on runtime aspects of Quasar. We show that with good approximation, the execution time of a GPU kernel function can be factorized in a compile-time-specific component and a runtime-specific component. We show that this approximation leads to a computationally simple runtime load balancing rule. Moreover, the load balancing rule permits efficient implicit concurrency of kernel functions and automatic scaling to multiple compute devices (eg, multi-CPU/GPU systems). Based on an appropriate mathematical scheduling model, we investigate the command queue size trade-off between memory usage and device utilization. The result is a programming environment for embedded vision systems for which automatic parallelization and implicit concurrency detection allow scaling the program efficiently to multi-CPU/GPU systems. Finally, benchmark results are provided to demonstrate the performance of our approach compared with OpenACC and CUDA (Compute Unified Device Architecture).
Keywords
IMAGE-ANALYSIS, PARALLELISM, LANGUAGE, COMPILER, VIDEO, dynamic load balancing, GPGPU, heterogeneous processing, vision, applications

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 2.73 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Goossens, Bart. 2018. “Dataflow Management, Dynamic Load Balancing, and Concurrent Processing for Real-time Embedded Vision Applications Using Quasar.” International Journal of Circuit Theory and Applications 46 (9): 1733–1755.
APA
Goossens, B. (2018). Dataflow management, dynamic load balancing, and concurrent processing for real-time embedded vision applications using Quasar. INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 46(9), 1733–1755.
Vancouver
1.
Goossens B. Dataflow management, dynamic load balancing, and concurrent processing for real-time embedded vision applications using Quasar. INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS. Hoboken: Wiley; 2018;46(9):1733–55.
MLA
Goossens, Bart. “Dataflow Management, Dynamic Load Balancing, and Concurrent Processing for Real-time Embedded Vision Applications Using Quasar.” INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS 46.9 (2018): 1733–1755. Print.
@article{8575937,
  abstract     = {Programming modern embedded vision systems brings various challenges, due to the steep learning curve for programmers and the different characteristics of the devices. Quasar, a new high-level programming language and development environment, considerably simplifies the development. Quasar has a compiler that detects and optimizes parallel programming patterns and a heterogeneous runtime that distributes the computational load over the available compute devices (CPUs and Graphical Processing Unit [GPUs]). In this paper, we focus on runtime aspects of Quasar. We show that with good approximation, the execution time of a GPU kernel function can be factorized in a compile-time-specific component and a runtime-specific component. We show that this approximation leads to a computationally simple runtime load balancing rule. Moreover, the load balancing rule permits efficient implicit concurrency of kernel functions and automatic scaling to multiple compute devices (eg, multi-CPU/GPU systems). Based on an appropriate mathematical scheduling model, we investigate the command queue size trade-off between memory usage and device utilization. The result is a programming environment for embedded vision systems for which automatic parallelization and implicit concurrency detection allow scaling the program efficiently to multi-CPU/GPU systems. Finally, benchmark results are provided to demonstrate the performance of our approach compared with OpenACC and CUDA (Compute Unified Device Architecture).},
  author       = {Goossens, Bart},
  issn         = {0098-9886},
  journal      = {INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS},
  language     = {eng},
  number       = {9},
  pages        = {1733--1755},
  publisher    = {Wiley},
  title        = {Dataflow management, dynamic load balancing, and concurrent processing for real-time embedded vision applications using Quasar},
  url          = {http://dx.doi.org/10.1002/cta.2494},
  volume       = {46},
  year         = {2018},
}

Altmetric
View in Altmetric
Web of Science
Times cited: