NeurIPS 2019 - Robot open-Ended Autonomous Learning
Robots that learn to interact with the environment autonomously
Robots that learn to interact with the environment autonomously.
Open-ended learning, also named ‘life-long learning’, ‘autonomous curriculum learning’, ‘no-task learning’) aims to build learning machines and robots that are able to acquire skills and knowledge in an incremental fashion. The REAL competition, which is part of NeurIPS 2019 competition track, addresses open-ended learning with a focus on ‘Robot open-Ended Autonomous Learning’ (REAL), that is on systems that: (a) acquire sensorimotor competence that allows them to interact with objects and physical environments; (b) learn in a fully autonomous way, i.e. with no human intervention, on the basis of mechanisms such as curiosity, intrinsic motivations, task-free reinforcement learning, self-generated goals, and any other mechanism that might support autonomous learning. The competition will have a two-phase structure where during a first ‘intrinsic phase’ the system will have a certain time to freely explore and learn in the environment, and then during an `extrinsic phase’ the quality of the autonomously acquired knowledge will be measured with tasks unknown at design time. The objective of REAL is to: (a) track the state-of-the-art in robot open-ended autonomous learning; (b) foster research and the proposal of new solutions to the many problems posed by open-ended learning; (c) favour the development of benchmarks in the field.
In this challenge, you will have to develop an algorithm to control a multi-link arm robot interacting with a table, a shelf and a few objects. The robot is supposed to interact with the environment and learn in autonomous manner, i.e. no reward is provided from the environment to direct its learning. The robot has access to the state of its joint angle and to the output of a fixed camera seeing the table from above. By interacting with the environment, the robot should learn how to achieve different states of the environment: e.g. how to push objects around, how to bring them on top of the shelf and how to place them one on top of the other.
The evaluation of the algorithm is split in two phases: the intrinsic phase and the extrinsic phase. - In the first phase, the algorithm will be able to interact with the environment, without being provided any reward. In this intrinsic phase, the algorithm is supposed to learn the dynamics of the environment and how to interact with it. - In the second phase, a goal will be given to the algorithm that it needs to achieve within a strict time limit. The goal will be provided to the robot as an image of the state of the environment it has to reach. This goal might require, for example, to push an object in a certain position or move one object on top of another.
How to do it
While the robot is given no reward for the environment, it is perfectly reasonable (and expected) that the algorithm controlling the robot will use some kind of “intrinsic” motivation derived from its interaction with the environment. Below, we provide some of the approach to this problem found in the current literature. On the other hand, it would be “easy” for a human knowing the environment (and the final tasks) as described in this page to develop a reward function tailored to this challenge so that the robot specifically learns to grasp objects and move them around. This last approach is discouraged and it is not eligible to win the competition (see the rules below). The spirit of the challenge is that the robot initially does not know anything about the environment and what it will be asked to do. So the approach should be as general as possible.
- Carlos Florensa, David Held, Xinyang Geng, Pieter Abbeel Automatic Goal Generation for Reinforcement Learning Agents
- Tianhe Yu, Gleb Shevchuk, Dorsa Sadigh, Chelsea Finn Unsupervised Visuomotor Control through Distributional Planning Networks
- Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell Curiosity-driven Exploration by Self-supervised Prediction
- Ashvin Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine Visual Reinforcement Learning with Imagined Goals
more to follow
Note: rules might be updated while the competition is run “on Beta”
The rules of the competition will be as follows:
- Overview The competition focuses on autonomous open-ended learning with a simulated robot. The setup features a simulated robot that in a first intrinsic phase interacts autonomously with a partially unknown environment and learns how to interact with it, and then in a second extrinsic phase has to solve a number of tasks on the basis of the knowledge acquired in the first phase. Importantly, in the intrinsic phase the system does not know the tasks it will have to solve in the extrinsic phase.
- Simulator To this purpose, the competitors will be given a software kit with which they will be able to install the simulator of the robot and environment on their machines (see below).
- Robot The robot will be formed by: an arm; a gripper; one camera.
- Environment The environment used will be a simplified kitchen-like scenario formed by: a table with; a shelf; some kitchen objects.
- Training and testing phases During the development of the system, and during its evaluation (performance scoring), the competitor systems will have to undergo two phases: an intrinsic phase of training; an extrinsic phase of testing.
- Intrinsic phase During the intrinsic phase, the robot will have to autonomously interact with an environment for a certain period of time during which it should acquire as much knowledge and skills as possible, to best solve the tasks in the extrinsic phase. Importantly, during the intrinsic phase the robot will not be aware of the tasks it will have to solve in the extrinsic phase.
Extrinsic phase During the extrinsic phase the system will be tested for the quality of the knowledge acquired during the intrinsic phase.
During the extrinsic phase, the robot will undergo 3 challenges (see below) to be solved on the basis of the knowledge acquired during the intrinsic phase.
For every task the robot is given during these challenges, the environment will be put in a different starting state and the robot will be given a camera image of how the environment has to look like when he has achieved the goal of the task.
Learning time budget The time available for learning in the intrinsic phase is limited to Dint minutes of simulated time. Learning in the extrinsic phase will be possible but its utility will be strongly limited by the short time available to solve each task, consisting in Dext seconds of simulated time for solving each task.
- Three challenges During the extrinsic phase, there will be three kind of challenges. The three challenges involve tasks drawn from the following classes of possible problems defined on the basis of the nature of the goal to accomplish. For each task, the agent is given an image of the configuration of the objects it has to reach. Each time the agent is given a task, the objects are placed in a different starting position. The challenges are as follows:
- 2D challenge: goal defined in terms of the configuration of 3 objects on the table plane; objects will not be placed on the shelf.
- 2.5D challenge: goal defined in terms of the configuration of 3 objects on the table plane and on the shelf; one or more objects will have to be moved from the table to the shelf and vice-versa.
For both 2D and 2.5D challenges, the objects will start in different positions for each task but they will have a fixed orientation, both in the initial positions and in the final configuration they are required to reach.
- 3D challenge: goal defined in terms of a configuration of 3 objects with no restrictions (objects can assume any orientation and be in any part of the table and on the shelf).
Repetitions of the challenges Each challenge will be repeated multiple times with different goals..
Knowledge transfer The only regularities (`structure’) that are shared between the intrinsic and the extrinsic phase are related to the environment and objects; in particular in the intrinsic phase the robot has no knowledge about which tasks it will be called to solve in the extrinsic phase. Therefore, in the intrinsic phase the robot should undergo an autonomous open-ended learning process that should lead it to acquire, in the available time, as much knowledge and as many skills as possible to be ready to best face the unknown tasks of the following extrinsic phase.
- Competition structure The competition will be divided into two rounds.
- Round 1: During the first round, submissions will be evaluated by running only the extrinsic phase. Participants will have to pre-train their robot controllers on their machines before submission. Top 20 ranked participants whose submissions follow the spirit of the rules will be able to participate to Round 2 (see also Spirit of the Rules and Code inspection below).
- Round 2: during the second round, submissions will be evaluated by running both the intrinsic and extrinsic phase. All final submissions will be checked for coherence with the spirit of the rules.
Spirit of the rules As also explained above, the spirit of the rules is that during the intrinsic phase the robot is not explicitly given any task to learn and it does not know of the future extrinsic tasks, but it rather learns in a fully autonomous way.
As such, the Golden Rule is that it is explicitly forbidden to use the scoring function of the extrinsic phase or variants of it as a reward function to train the agent. Participants should give as little information as possible to the robot, rather the system should learn from scratch to interact with the objects using curiosity, intrinsic motivations, self-generated goals, etc.
However, given the difficulty of the competition and the many challenges that it contains and to encourage a wide participation, in Round 1 it will be possible to violate in part the aspects of the spirit of the competition, except the Golden Rule above. For example, it will be possible to use hardwired or pre-trained models for recognising the identity of objects and their position in space.
All submissions, except those violating the Golden Rule, will be considered valid and ranked for Round 1. However, only submissions fully complying with the spirit of the rules will access Round 2 and take part to the final ranking.
Code inspection To be eligible for ranking, participants are required to open the source code of their submissions to the competition monitoring check. Submitted systems will be sampled for checking their compliance with the competition rules and spirit during the competition. Top ranked submission of Round 1 will be checked for admission to Round 2. All final submissions of Round 2 will be checked before announcing the final ranking and winners.
- Eligibility Participants belonging to the GOAL-Robots project, AIcrowd, or other parts of the Organization Team might participate to the competition to provide baselines for other participants but are ineligible for the final Round 1 and Round 2 competition ranking.
- 3rd June, 2019 - Competition start as “Beta” - Environment and Rules may be updated.
- ~~8th July, 2019~~ extended to 31st July 2019 - End of Beta. Leadeboards are reset. Rules and environment finalized.
- ~~30th September~~ extended to 25th October 2019 - End of Round 1
~~31th October, 2019~~ extended to 25th November 2019 - End of Round 2.
- 6th December, 2019 - Competition results are announced.
- 8-14th December - Competition results are presented on NeurIPS.
This competition is organized by the GOAL-Robots Consortium (www.goal-robots.eu) with the support of AICrowd.
Computational resources for online evaluations are offered by Google through the GCP research credits program.
If you have any questions, please feel free to contact us:
Emilio Cartoni, Gianluca Baldassarre