LOCEN Research Focus: Modular Reinforcement-Learning Controllers for Robots to Learn Multiple Skills

Synopsis

 

Authors: Daniele Caligiore, Paolo Tommasino, Annalisa Ciancio, Valentina Meola, Gianluca Baldassarre

Topic and its relevance. Building architectures that allow robots to learn multiple sensorimotor skills, possibly transferring knowledge between them, is a central open challenge for autonomous robotics. In particular, it is paramount to produce autonomous cumulative learning robots. This is also important to suggest possible architectures and processes through which brain solves the same problems.

Questions and goals. How to build robot architectures that can learn multiple discrete and rhythmic sensorimotor motor skills? How can they learn multiple skills, both in interleaved and sequential ways, by exploiting knowledge transfer from learned skills to new skills while avoiding catastrophic interference?
Methods. We faced these problems proposing various reinforcement learning (RL) robot architectures, for discrete or rhythimic movements, using either actor critic methods together with linear approximators, or policy search methods together with dynamic movement primitives (DMP), for discrete movements, and central pattern generators (CPG), for rhythmic movements.

Results. Proposal of: an architecuture integrating actor-critic/linear approximators with the idea of the mixture-of-experts supervised neural networks (TERL-Transfer Expert RL) capable of exploiting transfer RL, while limiting catastrophic inteference, to learn multiple reaching skills with simulated robot arms/iCub robot; an architecture using actor-critic or policy search methods (PI^BB), and central pattern generators (CPGs), to learn rhythmic manipulation skills with simulated robotic hands (iCub, Kuka) and real hands (iCub); an architecture to learn throwing-a-ball-to-a-bottle-target skills with policy search methods (POWER) and DMPs, capable of generating new first-guess skills on the basis of information on the target position, with the real iCub.

Conclusions. The group has proposed a number of architectures for learning multiple discrete/rhythmic skills, in simulated and real robots, that can be used as an important foundation for building cumulative learning robots.

Key references

  • Meola, V. C., Caligiore, D., Sperati, V., Zollo, L., Ciancio, A. L., Taffoni, F., Guglielmelli, E. & Baldassarre, G. Interplay of Discrete and Rhythmic Manipulation Movements During Development: A Policy-Search Reinforcement-Learning Robot Model accepted on Transactions on Autonomous Mental Development
  • Ciancio, A.L., Zollo, L., Baldassarre, G., Caligiore, D., & Guglielmelli, E. (2012). The Role of Thumb Opposition in Cyclic Manipulation: A Study with Two Different Robotic Hands. IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob)
  • Ciancio, A. L., Zollo, L., Baldassarre, G., Caligiore, D., & Guglielmelli, E. (2013). The role of learning and kinematic features in dexterous manipulation: a comparative study with two robotic hands. International Journal of Advanced Robotic Systems, DOI:10.5772/56479
  • Castro da Silva, B.; Baldassarre, G.; Konidaris, G. & Barto, A. (2014). Learning Parameterized Motor Skills on a Humanoid Robot. In: Xi, N.; Hamel, W. R.; Tan, J.; Krovi, V. N.; Buss, M.; Elhajj, I. & Fu, L.-C. (eds.), IEEE International Conference on Robotics and Automation (ICRA2014). pp. 1-6. Piscataway, NJ: IEEE. (Hong Kong, China, May 31 -- June 7 2014) 

 

Work 1: Rhythmic manipulation movements acquired through a policy-search reinforcement-learning robot model

 

 

Work 2: Learning Parameterized Motor Skills on a Humanoid Robot Throwing a Ball to Knock Down a Bottle

 

Video on a presentation of the work:  http://www.youtube.com/watch?v=-kULO9yAhSs

Video of robot throwing the ball to the bottle (http://www.youtube.com/watch?v=BLt3GmjDN1o): 

Video of a humanoid robot (iCub) learning to throw a ball to a target based on a hierarchical reinforcement learning system with sophisticated generalisation capabilities (generation of new dynamic movement primitives on the fly based on the similarity of the new goal with respect to previously acquired goals). This work was carried out in collaboration with Bruno Castro da Silva and Andrew Barto, from the University of Amherst Massachusetts.