AWS DeepRacer Sim2Real Challenge #2
AWS DeepRacer provides hardware and cloud-based service for end-to-end experimentation with reinforcement learning (RL) and can be used to systematically investigate the key challenges in building intelligent control systems in real-world. As an educational and experimental tool for RL, it is the first successful customizable large-scale deployment of deep reinforcement learning (DRL) on a vision-based robotic control agent, 20 AWS Summit Races thus far with 1600 participants (that is on leaderboard so all can see) and in console, also visible to all, we have had 4500+ participants. The robocar uses only raw camera image as observations and a model-free learning method to perform robust path planning.
One of the major challenges for deployment in the real world of RL agents is in the simulation to real world transfer (sim2real). In the context of DeepRacer, we can divide the sim2real problem into two parts
Simulators may not have the visual fidelity as we do in the real world and not be able to capture the physics of the real world. Both of these factors can affect the representing the real environment in the simulator effectively. As a result, sometimes even after obtaining desired objective in simulation environment, when running the model in the real world, we may experience failures.
In DeepRacer, we realize that the simulation fidelity is not an exact match of the real world and the physics engine uses approximations of the friction coefficients and other inertial parameters. But the beauty of DRL is that we do not require everything to be perfect.
To mitigate large perceptual change affecting the car, we make two major assumptions
- instead of using an RGB image, we grayscale the image to make the perceptual differences between simulator and the real-world narrower
- we intentionally use a shallow image feature embedder, i.e. we only use a few CNN layers, this helps the network not learn the simulation environment entirely but enough to adjust to small variations in environment.
By default the DeepRacer model uses a clipped PPO algorithm that runs in asynchronous mode. Using these parameters, over 5000 developers around the world have built models in the simulator and successfully navigated a racing track in the real world.
In this challenge, we’ll take robustness in the real world to a new level. While the models so far have been trained in the simulator and tested on visually similar tracks in the real world, we introduce the following modifications
- Visual changes to the track, partial track to be made with tape
- Track dimensions to vary
- Variations in lighting
- (May be we can add snow like objects on the track to distract the cars?)
In this challenge, you will train a model which will be evaluated on a undisclosed test track, which will be significantly different from the tracks that will be available in the simulator. Each model will be evaluated on the test track, where it will have five attempts to complete a lap and the fastest completed lap will be recorded as the submission.
We’ll focus on both perception and dynamics. We’ll introduce lighting variations in the track environment such as directed head lights and LED light array. In addition to test model robustness in dynamics, we’ll inject random actions at inference time on the DeepRacer car. A good model is expected to overcome advanced perceptual changes, and random perturbations in the action space and navigate the track successfully. Each model will have three attempts to clock successful lap times, with the fastest lap being considered for the submission.
- Github code for Amazon SageMaker notebook environment
- Documentation on Amazon SageMaker DeepRacer notebook
- DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning
- Tutorial and documentation on DeepRacer
- Github code for AWS RoboMaker DeepRacer simulation application and documentation
- Track Image Dataset
Potential avenues to explore
- Optimize PPO algorithm
- Explore other RL algorithms
- Data Augmentation outside of the simulation
- Modifying the assets in simulation
- Regularization and other hyper parameter optimization
- Split the learning process ( Learn to see; learn to act )
Competition starts in the first week with March 2020 with the finals in May 2020.
To be announced soon
How to submit models and artifacts
You will need to prepare a list of model and log files to prepare your submission and participate in the challenge. The complete instructions will be added for this challenge soon.
For questions, please check AIcrowd AWS DeepRacer Discourse page