Advanced search
1 file | 4.40 MB Add to list

Dual regularized policy updating and shiftpoint detection for automated deployment of reinforcement learning controllers on industrial mechatronic systems

Victor Vantilborgh (UGent) , Tom Staessens (UGent) , Wannes De Groote (UGent) and Guillaume Crevecoeur (UGent)
Author
Organization
Project
Abstract
We propose an algorithmic pipeline enabling deep reinforcement learning controllers to detect when a significant change in system characteristics has occurred and update the control policy accordingly to reattain performance. Reinforcement learning algorithms can learn a policy directly from input-output data and thus optimize for system-specific properties. Yet they face difficulties to adapt, after deployment, to varying operating conditions. Real-world industrial mechatronic systems however demand further levels of performance through adaptation while remaining safe. So far, methods that detect changes in environments exist but have never been studied and applied as a means to update control policies for time-varying systems. We benchmark several methods that detect significant changes in these systems, i.e. shiftpoint detection methods, and present a novel algorithm with a dual regularization architecture. This architecture exploits the prior policy while allowing sufficient flexibility to update for the safety-critical and time-varying system. We validate the method's performance through benchhmarking and study the effect of its different components and targeted ablation studies on mechatronic systems, both in simulations and experimentally. Results show that our algorithmic pipeline allows for rapid shiftpoint detection, followed by a policy update that reaches expert performance after convergence.
Keywords
Mechatronics, Motion control, Reinforcement learning, Shiftpoint detection, Policy updating, Uncertain systems, DYNAMIC ENVIRONMENTS, PERFORMANCE, DEGRADATION, SIMILARITY, DESIGN, CRANK

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 4.40 MB

Citation

Please use this url to cite or link to this publication:

MLA
Vantilborgh, Victor, et al. “Dual Regularized Policy Updating and Shiftpoint Detection for Automated Deployment of Reinforcement Learning Controllers on Industrial Mechatronic Systems.” CONTROL ENGINEERING PRACTICE, vol. 142, 2024, doi:10.1016/j.conengprac.2023.105783.
APA
Vantilborgh, V., Staessens, T., De Groote, W., & Crevecoeur, G. (2024). Dual regularized policy updating and shiftpoint detection for automated deployment of reinforcement learning controllers on industrial mechatronic systems. CONTROL ENGINEERING PRACTICE, 142. https://doi.org/10.1016/j.conengprac.2023.105783
Chicago author-date
Vantilborgh, Victor, Tom Staessens, Wannes De Groote, and Guillaume Crevecoeur. 2024. “Dual Regularized Policy Updating and Shiftpoint Detection for Automated Deployment of Reinforcement Learning Controllers on Industrial Mechatronic Systems.” CONTROL ENGINEERING PRACTICE 142. https://doi.org/10.1016/j.conengprac.2023.105783.
Chicago author-date (all authors)
Vantilborgh, Victor, Tom Staessens, Wannes De Groote, and Guillaume Crevecoeur. 2024. “Dual Regularized Policy Updating and Shiftpoint Detection for Automated Deployment of Reinforcement Learning Controllers on Industrial Mechatronic Systems.” CONTROL ENGINEERING PRACTICE 142. doi:10.1016/j.conengprac.2023.105783.
Vancouver
1.
Vantilborgh V, Staessens T, De Groote W, Crevecoeur G. Dual regularized policy updating and shiftpoint detection for automated deployment of reinforcement learning controllers on industrial mechatronic systems. CONTROL ENGINEERING PRACTICE. 2024;142.
IEEE
[1]
V. Vantilborgh, T. Staessens, W. De Groote, and G. Crevecoeur, “Dual regularized policy updating and shiftpoint detection for automated deployment of reinforcement learning controllers on industrial mechatronic systems,” CONTROL ENGINEERING PRACTICE, vol. 142, 2024.
@article{01HM60GATE5MNWNN4DMMNFXM2J,
  abstract     = {{We propose an algorithmic pipeline enabling deep reinforcement learning controllers to detect when a significant change in system characteristics has occurred and update the control policy accordingly to reattain performance. Reinforcement learning algorithms can learn a policy directly from input-output data and thus optimize for system-specific properties. Yet they face difficulties to adapt, after deployment, to varying operating conditions. Real-world industrial mechatronic systems however demand further levels of performance through adaptation while remaining safe. So far, methods that detect changes in environments exist but have never been studied and applied as a means to update control policies for time-varying systems. We benchmark several methods that detect significant changes in these systems, i.e. shiftpoint detection methods, and present a novel algorithm with a dual regularization architecture. This architecture exploits the prior policy while allowing sufficient flexibility to update for the safety-critical and time-varying system. We validate the method's performance through benchhmarking and study the effect of its different components and targeted ablation studies on mechatronic systems, both in simulations and experimentally. Results show that our algorithmic pipeline allows for rapid shiftpoint detection, followed by a policy update that reaches expert performance after convergence.}},
  articleno    = {{105783}},
  author       = {{Vantilborgh, Victor and Staessens, Tom and De Groote, Wannes and Crevecoeur, Guillaume}},
  issn         = {{0967-0661}},
  journal      = {{CONTROL ENGINEERING PRACTICE}},
  keywords     = {{Mechatronics,Motion control,Reinforcement learning,Shiftpoint detection,Policy updating,Uncertain systems,DYNAMIC ENVIRONMENTS,PERFORMANCE,DEGRADATION,SIMILARITY,DESIGN,CRANK}},
  language     = {{eng}},
  pages        = {{16}},
  title        = {{Dual regularized policy updating and shiftpoint detection for automated deployment of reinforcement learning controllers on industrial mechatronic systems}},
  url          = {{http://doi.org/10.1016/j.conengprac.2023.105783}},
  volume       = {{142}},
  year         = {{2024}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: