NeurIPS 2019 - Robot open-Ended Autonomous Learning
Robots that learn to interact with the environment autonomously
Robots that learn to interact with the environment autonomously. Starting June 3rd, 2019
Open-ended learning, also named ‘life-long learning’, ‘autonomous curriculum learning’, ‘no-task learning’) aims to build learning machines and robots that are able to acquire skills and knowledge in an incremental fashion. The REAL competition, which is part of NeurIPS 2019 competition track, addresses open-ended learning with a focus on ‘Robot open-Ended Autonomous Learning’ (REAL), that is on systems that: (a) acquire sensorimotor competence that allows them to interact with objects and physical environments; (b) learn in a fully autonomous way, i.e. with no human intervention, on the basis of mechanisms such as curiosity, intrinsic motivations, task-free reinforcement learning, self-generated goals, and any other mechanism that might support autonomous learning. The competition will have a two-phase structure where during a first ‘intrinsic phase’ the system will have a certain time to freely explore and learn in the environment, and then during an `extrinsic phase’ the quality of the autonomously acquired knowledge will be measured with tasks unknown at design time. The objective of REAL is to: (a) track the state-of-the-art in robot open-ended autonomous learning; (b) foster research and the proposal of new solutions to the many problems posed by open-ended learning; (c) favour the development of benchmarks in the field.
In this challenge, you will have to develop an algorithm to control a multi-link arm robot interacting with a table, a shelf and a few objects. The robot is supposed to interact with the environment and learn in autonomous manner, i.e. no reward is provided from the environment to direct its learning. The robot has access to the state of its joint angle and to the output of a fixed camera seeing the table from above. By interacting with the environment, the robot should learn how to achieve different states of the environment: e.g. how to push objects around, how to bring them on top of the shelf and how to place them one on top of the other.
The evaluation of the algorithm is split in two phases: the intrinsic phase and the extrinsic phase. - In the first phase, the algorithm will be able to interact with the environment, without being provided any reward. In this intrinsic phase, the algorithm is supposed to learn the dynamics of the environment and how to interact with it. - In the second phase, a goal will be given to the algorithm that it needs to achieve within a strict time limit. The goal will be provided to the robot as an image of the state of the environment it has to reach. This goal might require, for example, to push an object in a certain position or move one object on top of another.
How to do it
While the robot is given no reward for the environment, it is perfectly reasonable (and expected) that the algorithm controlling the robot will use some kind of “intrinsic” motivation derived from its interaction with the environment. Below, we provide some of the approach to this problem found in the current literature. On the other hand, it would be “easy” for a human knowing the environment (and the final tasks) as described in this page to develop a reward function tailored to this challenge so that the robot specifically learns to grasp objects and move them around. This last approach is discouraged and it is not eligble to win the competition (see the rules below). The spirit of the challenge is that the robot initially does not know anything about the environment and what it will be asked to do. So the approach should be as general as possible. Indeed some of the objects on which the robot will be tested will be kept secret until the final evaluation round (see below).
The participants will also be given one or more “baseline” algorithm that they can modify to get started or from which they can draw inspiration from.
Baseline algorithms and the necessary software to participate in the competition will be available on GitHub.
- Carlos Florensa, David Held, Xinyang Geng, Pieter Abbeel Automatic Goal Generation for Reinforcement Learning Agents
- Tianhe Yu, Gleb Shevchuk, Dorsa Sadigh, Chelsea Finn Unsupervised Visuomotor Control through Distributional Planning Networks
- Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell Curiosity-driven Exploration by Self-supervised Prediction
more to follow
Note: rules might be updated while the competition is run “on Beta” up to 8th July.
The rules of the competition will be as follows:
- Overview The competition focuses on autonomous open-ended learning with a simulated robot. The setup features a simulated robot that in a first intrinsic phase interacts autonomously with a partially unknown environment and learns how to interact with it, and then in a second extrinsic phase has to solve a number of tasks on the basis of the knowledge acquired in the first phase. Importantly, in the intrinsic phase the system does not know the tasks it will have to solve in the extrinsic phase.
- Simulator To this purpose, the competitors will be given a software kit with which they will be able to install the simulator of the robot and environment on their machines (see below).
- Robot The robot will be formed by: an arm; a gripper; one camera.
- Environment The environment used will be a simplified kitchen-like scenario formed by: a table with; a shelf; some kitchen objects.
- Training and testing phases During the development of the system, and during its evaluation (performance scoring), the competitor systems will have to undergo two phases: an intrinsic phase of training; an extrinsic phase of testing.
- Intrinsic phase During the intrinsic phase, the robot will have to autonomously interact with an environment for a certain period of time during which it should acquire as much knowledge and skills as possible, to best solve the tasks in the extrinsic phase. Importantly, during the intrinsic phase the robot will not be aware of the tasks it will have to solve in the extrinsic phase.
Extrinsic phase During the extrinsic phase the system will be tested for the quality of the knowledge acquired during the intrinsic phase.
During the extrinsic phase, the robot will undergo 3 challenges (see below) to be solved on the basis of the knowledge acquired during the intrinsic phase.
For every task the robot is given during these challenges, the environment will be put in a different starting state and the robot will be given a camera image of how the environment has to look like when he has achieved the goal of the task.
Learning time budget The time available for learning in the intrinsic phase is limited to Dint minutes of simulated time. Learning in the extrinsic phase will be possible but its utility will be strongly limited by the short time available to solve each task, consisting in Dext seconds of simulated time for solving each task.
- Three challenges During the extrinsic phase, there will be three kind of challenges.
The three challenges involve tasks drawn from the following classes of possible problems defined on the basis of the nature of the goal to accomplish:
- 2D challenge: goal defined in terms of the configuration of 3 objects on the table plane;
- 2.5D challenge: goal defined in terms of the configuration of 3 objects on the table plane and on the shelf;
- 3D challenge: goal defined in terms of 3 objects forming a certain 3D configuration (e.g., a tower or another structure)
Repetitions of the challenges Each challenge will be repeated multiple times with different objects.
- Knowledge transfer The only regularities (`structure’) that are shared between the intrinsic and the extrinsic phase are related to the environment and objects; in particular in the intrinsic phase the robot has no knowledge about which tasks it will be called to solve in the extrinsic phase. Therefore, in the intrinsic phase the robot should undergo an autonomous open-ended learning process that should lead it to acquire, in the available time, as much knowledge and as many skills as possible to be ready to best face the unknown tasks of the following extrinsic phase.
- Eligibility Participants belonging to the GOAL-Robots project, AIcrowd, or other parts of the Organization Team might participate to the competition but are ineligible for the final competition ranking.
- Code inspection To be eligible for scoring, participants are required to open source the code of their submissions to the competition monitoring check. Submitted systems will be sampled for checking their compliance with the competition rules and spirit during the competition. The top 10 systems of the final ranking will be checked for compliance with the competition rules and spirit before declaring the competition winners.
- 3rd June, 2019 - Competition start as “Beta” - Environment and Rules may be updated.
- 8th July, 2019 - End of Beta. Leadeboards are reset. Rules and environment finalized.
- 15th October, 2019 - Deadline for final submissions.
- 15th October - End of October, 2019 - Organizers evaluate submissions.
- End of October, 2019 - Competition results are announced.
- TBD - Competition results are presented on NeurIPS.
This competition is organized by the GOAL-Robots Consortium: www.goal-robots.eu.
If you have any questions, please feel free to contact us: