LOCEN Research Focus: Project IM-CLeVeR -- Bio-constrained models of intrinsically-motivated cumulative learning in organisms and robots

Synopsis

 

Authors: Francesco Mannella, Vincenzo Fiore, Valerio Sperati, Marco Mirolli, Gianluca Baldassarre

Topic and its relevance. We describe here research works, funded by the EU project IM-CLeVeR, directed to investigate what is the architecture and mechanisms of brain that allow primates (e.g., monkeys and children) to learn multiple skills in an cumulative fashion on the basis of intrinsic motivations. The overall architecture of brain relevant for this topic is the same as the one illustrated above in the research thread on goal-directed behaviour and habits, with the addiction of further structures important for intrinsic motivations such as the superior colliculus (important to detect changes in the environmetn caused by the organism), hippocampus (important to detect novel patterns and events), and prefrontal cortex (important to detect violation of expectations). 

Questions and goals. What is the architecture of brain that underlies the cumulative acquisition of multiple skills? What are the intrinsic motivation mechanisms that drive primates to acquire goals and skills in a cumulative fashion? How do extrinsic motivation allow primates to exploit the goals and skills aquired with intrinsic motivations? 

Methods. We built bio-constrained models reproducing these aspects of brain: (a) the hierachical organisation of skills based on basal ganglia-cortical loops and cortico-cortical pathways; (b) the encoding of habitual skills and goal-directed behaviour within such architecture; (c) the intrinsic motivation (IM) systems guiding cumulative learning, in particular based on `novelty-based IMs' (hipocampus) and `prediction-based IMs' (superior colliculus, cortex) generating dopamine-based learning and motivational signals; (d) the extrinsic motivations, e.g. to get food, relying on amygdala and again on the dopamine system (ventral tegmental area and substantia nigra pars compacta). The models are bio-constrained in the sense that their macro-architecture and macro-functions reflect those of brain. The models are tested within simulated humanoid robots (iCub) reproducing the behaviours observed in the `Mechatronic Board Experiment', an experiment carried out with monkeys and children freely interacting with a `mechatronic board' to acquire skills on the basis of IMs (the mechatronic board, built in IM-CLeVeR, has manipulanda, buttons, lights switching on and off, opening boxes, and sounds).  

Results. The result of the research is the  proposal of the possible components of brain, and their connections and learning and functioning mechanisms, sufficient to produce IM-based acquisition of skills. 

Conclusions. To our knowledge, the proposed models represent the first sufficient hypotheses on how brain allows primates to acquire skills on the basis of intrinsic motivations and later recall them on the basis of extrinsic motivations.

Videos from the project IM-CLeVeR

 

This page presents the video of the model ''CLEVER-B'' developped within the project ''IM-CLeVeR -- Intrinsically Motivated Cumulative Learning Versatile Robots'' coordinated by LOCEN.

IM-CLeVeR studied how intrinsic motivations and hierarchical sensorimotor architectures allows organisms and robots to autonomously learn repertoires of sensorimotor behaviours by interacting with the environment.

The next video shows the robot iCub learning to interact with  a mechathronic board based on intrinsic motivations. The video describes what happens and how the robot learns.

 

 

The next video shows a situation where a coloured card-board is put in front of a box. The card-board represents a reward that is given to the robot if the robot opens the box:

 

 

The next video shows a situation where two coloured card-boards are put in front of two boxes: the card-boards represent rewards that are given to the robot if the robot opens the related boxes. At a certain point, one reward is diminished of value (is ``devalued''): the video shows how the robot does not try to open the corresponding box anymore.

 

 

The next video shows a situation that integrates the previous two tests: