- Author
- Andreas Verleysen (UGent)
- Promoter
- Francis wyffels (UGent) and Joni Dambre (UGent)
- Organization
- Abstract
- Endowing robots with dexterous manipulation skills could spur economic welfare, create leisure time in society, reduce harmful manual labour, and provide care for an ageing population. However, while robots are producing our cars, we are still left to our own devices for doing the laundry at home. This shortcoming is due to the major difficulties in perceiving and handling the variability the real world presents. Robots in modern manufacturing require engineers to produce a safe and predictable environment: objects arrive at the same location, in the same orientation, and the robot is preprogrammed to perform a specific manipulation. Unfortunately, the need for a predictable environment is undesirable in dynamic environments that handle a wide range of objects, often in the presence of human activity. For example, should a human worker first unfold all clothes such that a robot can easily find the corner points and perform folding? Indeed, the high variability in modern production environments and households requires robots to handle objects that can take arbitrary shapes, weights, and configurations. This diversity renders traditional robotic control algorithms and grippers unsuitable for deployment in dynamic environments. To find methods that can handle the ever-changing nature of human environments, we study the perception and manipulation of objects that provide an infinite amount of variations: deformable objects. A deformable object changes shape on force interaction. Deformable objects are omnipresent in industry and society: food, paper, clothes, fruit, cables and sutures, among others. In particular, we study the task of automating the folding of clothes. Folding clothes is a common household task that will potentially be performed by service robots in the future. Handling cloth is also relevant in manufacturing, where technical textile is processed, and in the fashion industry. Dealing with the deformable nature of textiles requires fundamental improvements in both hardware and software. Mechanical engineering needs to incorporate actuators, links, joints and sensors into the limited space of a hand while using soft materials similar to the human skin. In addition to engineering more capable hands, control algorithms need to loosen assumptions about the environment in which robots operate. It is unattainable to expect highly deformable objects like cloth to always be in the same configuration before manipulating them with a robot. A solution for dealing with real-world variability can be found in the machine learning field. In particular, deep RLcombines the function approximation capabilities of deep neural networks with the learn by trial-and-error formalism provided by RL. Deep RL has shown to be capable of driving cars, flying helicopters and manipulating rigid objects. However, the data requirements for training highly parameterized functions, like neural networks, are considerable. This data-hungry property causes an incongruity between the representational learning features of deep neural networks and the high cost of generating real robotic trials. Our research focuses on reducing the required learning data for systems that perceive and manipulate clothing items. We implement a cloth simulation method to generate synthetic data, utilize smart textiles for state estimation of cloth, crowdsource a dataset of people folding clothing, and propose a method to estimate how well people are folding clothing without providing labels. Actuating a physical robot is slow, expensive and potentially dangerous. For this reason, roboticists resort to physics simulators that simulate the robot's and environment's dynamics. However, there exists no integrated robot and cloth simulator for use in learning experiments. Cloth simulators are built for offline render farms in the film industry, or for the game industry that sacrifices fidelity for real-time rendering. Unfortunately, cloth simulation for robotic learning requires performance characteristics similar to online rendering and accuracy aspects found in offline rendering. For this reason, we implement a custom cloth dynamics simulation on GPU and integrate it in the robotic simulation functionality of the Unity game engine. We found that we can utilize deep RL to train an agent in our simulation to fold a rectangular piece of cloth twice within 24 hours of wall time on standard computational hardware. The developed cloth simulation assumes full accessibility to the state of the cloth. However, state estimation of cloth in the real world relies on complex vision-based pipelines or high-cost sensing technology. We avoid this complexity and cost by integrating inexpensive tactile sensing technology into a cloth. The cloth becomes an active smart cloth by training a classifier that uses the tactile sensing data to estimate its state. We use this smart cloth to train a low-cost robotic platform to fold the cloth using RL. Our results demonstrate that it is possible to develop a smart cloth with off-the-shelf components and use it effectively for training on a real robotic platform. Our smart cloth bridges the gap between our cloth simulation on GPU and state estimation in the real world. However, it is still required to distil a scalar value that indicates task progress in order to acquire manipulation skills using RL. We believe that learning the reward function from demonstrations may overcome human bias in reward engineering. Unfortunately, when starting our research there existed no large dataset with people folding clothing. We fill this gap by crowdsourcing a dataset of people folding clothes. Our dataset consists of roughly 300000 multiperspective RGB-D frames, annotated with pose trajectories, quality labels and timestamps indicating substeps. This dataset can be used to benchmark research in action recognition and bootstrap learning by using example demonstrations. Learning from demonstrations is a prevalent domain in the robotics learning community. However, using our cloth folding dataset requires mapping the movements of demonstrators to the embodiment of a robot. Additionally, behavioural cloning is prone to blindly imitating trajectories instead of understanding how actions relate to solving the task. For this reason, estimating how well a process is being executed is preferred to learning the policy from demonstrations. Unfortunately, existing methods couple the learning of rewards with policy learning, thereby inheriting all problems associated with RL. To decouple reward and policy learning, we propose a method to learn the task progression from multiperspective videos of example demonstrations. We avoid incorporating human bias in the labelling process by using time as a self-supervised signal for learning. We demonstrate the first results on expressing task progression of people folding clothing, without labelling any data. General-purpose robots are not yet among us. Robots that are capable of working in dynamic environments will require a holistic view of software and hardware. We demonstrated the benefits of this approach by outsourcing the intelligence for state estimation to the cloth instead of the robot. By developing a smart cloth, we trained a robot to fold cloth in-vivo within a day. Extrapolating an integrated approach on hardware and software leads to embodied intelligence in which morphology closes the loop with control: co-optimizing the body and brain will allow evolving manipulators tailored to the tasks, and use them to build a representation of how the world works. Robots can use that feedback to understand how actions influence the environment and learn to solve tasks by using human examples, instrumented objects and their own experiences. This holistic process will enable future robots to understand human intent and solve a large repertoire of manipulation tasks.
Downloads
-
main.pdf
- full text (Published version)
- |
- open access
- |
- |
- 72.35 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8754430
- MLA
- Verleysen, Andreas. Learning Robotic Cloth Manipulation. Ghent University. Faculty of Engineering and Architecture, 2022.
- APA
- Verleysen, A. (2022). Learning robotic cloth manipulation. Ghent University. Faculty of Engineering and Architecture, Ghent, Belgium.
- Chicago author-date
- Verleysen, Andreas. 2022. “Learning Robotic Cloth Manipulation.” Ghent, Belgium: Ghent University. Faculty of Engineering and Architecture.
- Chicago author-date (all authors)
- Verleysen, Andreas. 2022. “Learning Robotic Cloth Manipulation.” Ghent, Belgium: Ghent University. Faculty of Engineering and Architecture.
- Vancouver
- 1.Verleysen A. Learning robotic cloth manipulation. [Ghent, Belgium]: Ghent University. Faculty of Engineering and Architecture; 2022.
- IEEE
- [1]A. Verleysen, “Learning robotic cloth manipulation,” Ghent University. Faculty of Engineering and Architecture, Ghent, Belgium, 2022.
@phdthesis{8754430, abstract = {{Endowing robots with dexterous manipulation skills could spur economic welfare, create leisure time in society, reduce harmful manual labour, and provide care for an ageing population. However, while robots are producing our cars, we are still left to our own devices for doing the laundry at home. This shortcoming is due to the major difficulties in perceiving and handling the variability the real world presents. Robots in modern manufacturing require engineers to produce a safe and predictable environment: objects arrive at the same location, in the same orientation, and the robot is preprogrammed to perform a specific manipulation. Unfortunately, the need for a predictable environment is undesirable in dynamic environments that handle a wide range of objects, often in the presence of human activity. For example, should a human worker first unfold all clothes such that a robot can easily find the corner points and perform folding? Indeed, the high variability in modern production environments and households requires robots to handle objects that can take arbitrary shapes, weights, and configurations. This diversity renders traditional robotic control algorithms and grippers unsuitable for deployment in dynamic environments. To find methods that can handle the ever-changing nature of human environments, we study the perception and manipulation of objects that provide an infinite amount of variations: deformable objects. A deformable object changes shape on force interaction. Deformable objects are omnipresent in industry and society: food, paper, clothes, fruit, cables and sutures, among others. In particular, we study the task of automating the folding of clothes. Folding clothes is a common household task that will potentially be performed by service robots in the future. Handling cloth is also relevant in manufacturing, where technical textile is processed, and in the fashion industry. Dealing with the deformable nature of textiles requires fundamental improvements in both hardware and software. Mechanical engineering needs to incorporate actuators, links, joints and sensors into the limited space of a hand while using soft materials similar to the human skin. In addition to engineering more capable hands, control algorithms need to loosen assumptions about the environment in which robots operate. It is unattainable to expect highly deformable objects like cloth to always be in the same configuration before manipulating them with a robot. A solution for dealing with real-world variability can be found in the machine learning field. In particular, deep RLcombines the function approximation capabilities of deep neural networks with the learn by trial-and-error formalism provided by RL. Deep RL has shown to be capable of driving cars, flying helicopters and manipulating rigid objects. However, the data requirements for training highly parameterized functions, like neural networks, are considerable. This data-hungry property causes an incongruity between the representational learning features of deep neural networks and the high cost of generating real robotic trials. Our research focuses on reducing the required learning data for systems that perceive and manipulate clothing items. We implement a cloth simulation method to generate synthetic data, utilize smart textiles for state estimation of cloth, crowdsource a dataset of people folding clothing, and propose a method to estimate how well people are folding clothing without providing labels. Actuating a physical robot is slow, expensive and potentially dangerous. For this reason, roboticists resort to physics simulators that simulate the robot's and environment's dynamics. However, there exists no integrated robot and cloth simulator for use in learning experiments. Cloth simulators are built for offline render farms in the film industry, or for the game industry that sacrifices fidelity for real-time rendering. Unfortunately, cloth simulation for robotic learning requires performance characteristics similar to online rendering and accuracy aspects found in offline rendering. For this reason, we implement a custom cloth dynamics simulation on GPU and integrate it in the robotic simulation functionality of the Unity game engine. We found that we can utilize deep RL to train an agent in our simulation to fold a rectangular piece of cloth twice within 24 hours of wall time on standard computational hardware. The developed cloth simulation assumes full accessibility to the state of the cloth. However, state estimation of cloth in the real world relies on complex vision-based pipelines or high-cost sensing technology. We avoid this complexity and cost by integrating inexpensive tactile sensing technology into a cloth. The cloth becomes an active smart cloth by training a classifier that uses the tactile sensing data to estimate its state. We use this smart cloth to train a low-cost robotic platform to fold the cloth using RL. Our results demonstrate that it is possible to develop a smart cloth with off-the-shelf components and use it effectively for training on a real robotic platform. Our smart cloth bridges the gap between our cloth simulation on GPU and state estimation in the real world. However, it is still required to distil a scalar value that indicates task progress in order to acquire manipulation skills using RL. We believe that learning the reward function from demonstrations may overcome human bias in reward engineering. Unfortunately, when starting our research there existed no large dataset with people folding clothing. We fill this gap by crowdsourcing a dataset of people folding clothes. Our dataset consists of roughly 300000 multiperspective RGB-D frames, annotated with pose trajectories, quality labels and timestamps indicating substeps. This dataset can be used to benchmark research in action recognition and bootstrap learning by using example demonstrations. Learning from demonstrations is a prevalent domain in the robotics learning community. However, using our cloth folding dataset requires mapping the movements of demonstrators to the embodiment of a robot. Additionally, behavioural cloning is prone to blindly imitating trajectories instead of understanding how actions relate to solving the task. For this reason, estimating how well a process is being executed is preferred to learning the policy from demonstrations. Unfortunately, existing methods couple the learning of rewards with policy learning, thereby inheriting all problems associated with RL. To decouple reward and policy learning, we propose a method to learn the task progression from multiperspective videos of example demonstrations. We avoid incorporating human bias in the labelling process by using time as a self-supervised signal for learning. We demonstrate the first results on expressing task progression of people folding clothing, without labelling any data. General-purpose robots are not yet among us. Robots that are capable of working in dynamic environments will require a holistic view of software and hardware. We demonstrated the benefits of this approach by outsourcing the intelligence for state estimation to the cloth instead of the robot. By developing a smart cloth, we trained a robot to fold cloth in-vivo within a day. Extrapolating an integrated approach on hardware and software leads to embodied intelligence in which morphology closes the loop with control: co-optimizing the body and brain will allow evolving manipulators tailored to the tasks, and use them to build a representation of how the world works. Robots can use that feedback to understand how actions influence the environment and learn to solve tasks by using human examples, instrumented objects and their own experiences. This holistic process will enable future robots to understand human intent and solve a large repertoire of manipulation tasks.}}, author = {{Verleysen, Andreas}}, isbn = {{9789463555906}}, language = {{eng}}, pages = {{XXX, 276}}, publisher = {{Ghent University. Faculty of Engineering and Architecture}}, school = {{Ghent University}}, title = {{Learning robotic cloth manipulation}}, year = {{2022}}, }