AIcrowd

Loading

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Apr 2022

NeurIPS 2021: MineRL BASALT Competition

We held the first-ever MineRL Benchmark for Agents that Solve Almost-Lifelike Tasks (MineRL BASALT) Competition at the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021). The goal of the competition was to promote research towards agents that use learning from human feedback (LfHF) techniques to solve open-world tasks. Rather than mandating the use of LfHF techniques, we described four tasks in natural language to be accomplished in the video game Minecraft, and allowed participants to use any approach they wanted to build agents that could accomplish the tasks. Teams developed a diverse range of LfHF algorithms across a variety of possible human feedback types. The three winning teams implemented significantly different approaches while achieving similar performance. Interestingly, their approaches performed well on different tasks, validating our choice of tasks to include in the competition. While the outcomes validated the design of our competition, we did not get as many participants and submissions as our sister competition, MineRL Diamond. We speculate about the causes of this problem and suggest improvements for future iterations of the competition.

Insights From the NeurIPS 2021 NetHack Challenge

Sharada Mohanty

Dipam Chakraborty

+24 more

Mar 2022

NeurIPS 2021 - The NetHack Challenge

In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challenge showcased community-driven progress in AI with many diverse approaches significantly beating the previously best results on NetHack. Furthermore, it served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems, demonstrating that on NetHack symbolic bots currently outperform deep RL by a large margin. Lastly, no agent got close to winning the game, illustrating NetHack's suitability as a long-term benchmark for AI research.

Flatland-RL : Multi-Agent Reinforcement Learning on Trains

Sharada Mohanty

Florian Laurent

Manuel Schneider

Christian Scheller

+9 more

Dec 2020

Flatland Challenge

Efficient automated scheduling of trains remains a major challenge for modern railway systems. The underlying vehicle rescheduling problem (VRSP) has been a major focus of Operations Research (OR) since decades. Traditional approaches use complex simulators to study VRSP, where experimenting with a broad range of novel ideas is time consuming and has a huge computational overhead. In this paper, we introduce a two-dimensional simplified grid environment called "Flatland" that allows for faster experimentation. Flatland does not only reduce the complexity of the full physical simulation, but also provides an easy-to-use interface to test novel approaches for the VRSP, such as Reinforcement Learning (RL) and Imitation Learning (IL). In order to probe the potential of Machine Learning (ML) research on Flatland, we (1) ran a first series of RL and IL experiments and (2) design and executed a public Benchmark at NeurIPS 2020 to engage a large community of researchers to work on this problem. Our own experimental results, on the one hand, demonstrate that ML has potential in solving the VRSP on Flatland. On the other hand, we identify key topics that need further research. Overall, the Flatland environment has proven to be a robust and valuable framework to investigate the VRSP for railway networks. Our experiments provide a good starting point for further research and for the participants of the NeurIPS 2020 Flatland Benchmark. All of these efforts together have the potential to have a substantial impact on shaping the mobility of the future.

The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

William H. Guss

Mario Ynocente Castro

Sam Devlin

Brandon Houghton

Noboru Sean Kuno

+10 more

Jan 2021

NeurIPS 2020: MineRL Competition

Although deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples, affording only a shrinking segment of the AI community access to their development. Resolution of these limitations requires new, sample-efficient methods. To facilitate research in this direction, we propose this second iteration of the MineRL Competition. The primary goal of the competition is to foster the development of algorithms which can efficiently leverage human demonstrations to drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments. To that end, participants compete under a limited environment sample-complexity budget to develop systems which solve the MineRL ObtainDiamond task in Minecraft, a sequential decision making environment requiring long-term planning, hierarchical control, and efficient exploration methods. The competition is structured into two rounds in which competitors are provided several paired versions of the dataset and environment with different game textures and shaders. At the end of each round, competitors submit containerized versions of their learning algorithms to the AIcrowd platform where they are trained from scratch on a hold-out dataset-environment pair for a total of 4-days on a pre-specified hardware platform. In this follow-up iteration to the NeurIPS 2019 MineRL Competition, we implement new features to expand the scale and reach of the competition. In response to the feedback of the previous participants, we introduce a second minor track focusing on solutions without access to environment interactions of any kind except during test-time. Further we aim to prompt domain agnostic submissions by implementing several novel competition mechanics including action-space randomization and desemantization of observations and actions.

NeurIPS 2019 Disentanglement Challenge: Improved Disentanglement through Learned Aggregation of Convolutional Feature Maps

Maximilian Seitzer

Andreas Foltyn

Felix P. Kemeth

Nov 2020

33rd Conference on Neural Information Processing Systems (NeurIPS) - NeurIPS 2019

NeurIPS 2019 : Disentanglement Challenge

This report to our stage 2 submission to the NeurIPS 2019 disentanglement challenge presents a simple image preprocessing method for learning disentangled latent factors. We propose to train a variational autoencoder on regionally aggregated feature maps obtained from networks pretrained on the ImageNet database, utilizing the implicit inductive bias contained in those features for disentanglement. This bias can be further enhanced by explicitly fine-tuning the feature maps on auxiliary tasks useful for the challenge, such as angle, position estimation, or color classification. Our approach achieved the 2nd place in stage 2 of the challenge.

MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

Feb 2022

Proceedings of Machine Learning Research

NeurIPS 2021: MineRL Diamond Competition

Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these restrictions come at a cost -- increased difficulty. If the barrier for entry is too high, many potential participants are demoralized. With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers. With this track and more extensive tutorials and support, we saw an increased number of submissions. The participants of this easier track were able to obtain a diamond, and the participants of the harder track progressed the generalizable solutions in the same task.

REAL-2019: Robot open-Ended Autonomous Learning competition

Francesco Mannella

Vieri Giuliano Santucci

Jochen Triesch

Elmar Rueckert

+1 more

Dec 2019

Proceedings of the NeurIPS 2019 Competition and Demonstration Track

REAL 2020 - Robot open-Ended Autonomous Learning

Open-ended learning, also called life-long learning or autonomous curriculum learning, aims to program machines and robots that autonomously acquire knowledge and skills in a cumulative fashion. We illustrate the first edition of the REAL-2019 – Robot open-Ended Autonomous Learning competition, prompted by the EU project GOAL-Robots – Goal-based Open-ended Autonomous Learning Robots. The competition was based on a simulated robot that: (a) acquires sensorimotor competence to interact with objects on a table; (b) learns autonomously based on mechanisms such as curiosity, intrinsic motivations, and self-generated goals. The competition featured a first intrinsic phase, where the robots learned to interact with the objects in a fully autonomous way (no rewards, predefined tasks or human guidance), and a second extrinsic phase, where the acquired knowledge was evaluated with tasks unknown during the first phase. The competition ran online on AIcrowd for six months, involved 75 subscribers and 6 finalists, and was presented at NeurIPS-2019. The competition revealed very hard as it involved difficult machine learning challenges usually tackled in isolation, such as exploration, sparse rewards, object learning, generalisation, catastrophic interference, and autonomous skill learning. Following the participant’s positive feedback, the preparation of a second REAL-2020 competition is underway, improving on the formulation of a relevant benchmark for open-ended learning.

Impact of Pretrained Networks For SnakeSpecies Classification

Moorthy Gokula Krishnan ID

Eloop Mobility Solutions

Jan 2020

Eloop Mobility Solutions

SnakeCLEF2021 - Snake Species Identification Challenge

A robust snake species classifier could aid in the treatment of snake bites. In this report, the technique of transfer learning is revisited to understand the significance of the underlying pre-trained network and the supervised datasets used for pre-training. In low data regime, the methodology of transfer learning has been instrumental in building reliable image classifiers. Comparisons are made between the pre-trained networks trained on datasets of different sizes and classes. Performance improves significantly when the pre-trained network is trained on a much larger supervised dataset. Using country metadata improves the performance considerably. In SnakeCLEF2020 challenge, an F1-score of 0.625 was achieved.

Adversarial Vision Challenge

Wieland Brendel

Jonas Rauber

Alexey Kurakin

Nicolas Papernot

Behar Veliqi

+3 more

Aug 2018

The NIPS 2018 Adversarial Vision Challenge is a competition to facilitate measurable progress towards robust machine vision models and more generally applicable adversarial attacks. This document is an updated version of our competition proposal that was accepted in the competition track of 32nd Conference on Neural Information Processing Systems (NIPS 2018).

Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

Łukasz Kidziński

Sharada Prasanna Mohanty

Carmichael Ong

Zhewei Huang

Shuchang Zhou

+24 more

Apr 2018

Synthesizing physiologically-accurate human movement in a variety of conditions can help practitioners plan surgeries, design experiments, or prototype assistive devices in simulated environments, reducing time and costs and improving treatment outcomes. Because of the large and complex solution spaces of biomechanical models, current methods are constrained to specific movements and models, requiring careful design of a controller and hindering many possible applications. We sought to discover if modern optimization methods efficiently explore these complex spaces. To do this, we posed the problem as a competition in which participants were tasked with developing a controller to enable a physiologically-based human model to navigate a complex obstacle course as quickly as possible, without using any experimental data. They were provided with a human musculoskeletal model and a physics-based simulation environment. In this paper, we discuss the design of the competition, technical difficulties, results, and analysis of the top controllers. The challenge proved that deep reinforcement learning techniques, despite their high computational cost, can be successfully employed as an optimization method for synthesizing physiologically feasible motion in high-dimensional biomechanical systems.

Overview of LifeCLEF location-based species prediction task 2020 (GeoLifeCLEF)

Benjamin Deneu

Titouan Lorieul

Elijah Cole

Maximilien Servajean

Christophe Botella

+2 more

Jan 2020

Proceedings of the CLEF 2020 - Conference and labs of the evaluation forum

ImageCLEF 2020 DrawnUI

Understanding the geographic distribution of species is a key concern in conservation. By pairing species occurrences with environmental features, researchers can model the relationship between an environment and the species which may be found there. To advance the stateof-the-art in this area, a large-scale machine learning competition called GeoLifeCLEF 2020 was organized. It relied on a dataset of 1.9 million species observations paired with high-resolution remote sensing imagery, land cover data, and altitude, in addition to traditional low-resolution climate and soil variables. This paper presents an overview of the competition, synthesizes the approaches used by the participating groups, and analyzes the main results. In particular, we highlight the ability of remote sensing imagery and convolutional neural networks to improve predictive performance, complementary to traditional approaches.

Variational Learning with Disentanglement-PyTorch

Amir H. Abdi

Purang Abolmaesumi

Sidney Fels

Dec 2019

NeurIPS 2019 : Disentanglement Challenge

Unsupervised learning of disentangled representations is an open problem in machine learning. The Disentanglement-PyTorch library is developed to facilitate research, implementation, and testing of new variational algorithms. In this modular library, neural architectures, dimensionality of the latent space, and the training algorithms are fully decoupled, allowing for independent and consistent experiments across variational methods. The library handles the training scheduling, logging, and visualizations of reconstructions and latent space traversals. It also evaluates the encodings based on various disentanglement metrics. The library, so far, includes implementations of the following unsupervised algorithms VAE, Beta-VAE, Factor-VAE, DIP-I-VAE, DIP-II-VAE, Info-VAE, and Beta-TCVAE, as well as conditional approaches such as CVAE and IFCVAE. The library is compatible with the Disentanglement Challenge of NeurIPS 2019, hosted on AICrowd, and achieved the 3rd rank in both the first and second stages of the challenge.

Results of SemTab 2020?

Ernesto Jimenez-Ruiz

Oktie Hassanzadeh

Vasilis Efthymiou

Jiaoyan Chen

Kavitha Srinivas

+1 more

Jan 2021

SemTab 2020 was the second edition of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, successfully collocated with the 19th International Semantic Web Conference (ISWC) and the 15th Ontology Matching (OM) Workshop. SemTab provides a common framework to conduct a systematic evaluation of state-of-the-art systems.

Overview of the VQA-Med Task at ImageCLEF 2020: Visual Question Answering and Generation in the Medical Domain

Asma Ben Abacha

Vivek V. Datla2

Sadid A. Hasan3

Dina Demner-Fushman1

and Henning M¨uller4

Jan 2020

Proceedings of the CLEF 2020 - Conference and labs of the evaluation forum

ImageCLEF 2020 VQA-Med - VQA

This paper presents an overview of the Medical Visual Question Answering (VQA-Med) task at ImageCLEF 2020. This third edition of VQA-Med included two tasks: (i) Visual Question Answering (VQA), where participants were tasked with answering abnormality questions from the visual content of radiology images and (ii) Visual Question Generation (VQG), consisting of generating relevant questions about radiology images based on their visual content. In VQA-Med 2020, 11 teams participated in at least one of the two tasks and submitted a total of 62 runs. The best team achieved a BLEU score of 0.542 in the VQA task and 0.348 in the VQG task.

Deep Learning for Understanding Satellite Imagery: An Experimental Survey

Sharada Prasanna Mohanty

Jakub Czakon

Kamil A. Kaczmarek

Andrzej Pyskir

Piotr Tarasiewicz

+11 more

Nov 2020

Frontiers in Artificial Intelligence

Translating satellite imagery into maps requires intensive effort and time, especially leading to inaccurate maps of the affected regions during disaster and conflict. The combination of availability of recent datasets and advances in computer vision made through deep learning paved the way toward automated satellite image translation. To facilitate research in this direction, we introduce the Satellite Imagery Competition using a modified SpaceNet dataset. Participants had to come up with different segmentation models to detect positions of buildings on satellite images. In this work, we present five approaches based on improvements of U-Net and Mask R-Convolutional Neuronal Networks models, coupled with unique training adaptations using boosting algorithms, morphological filter, Conditional Random Fields and custom losses. The good results—as high as AP=0.937 and AR=0.959—from these models demonstrate the feasibility of Deep Learning in automated satellite image annotation.

Learning to Run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning

Łukasz Kidziński

Sharada P. Mohanty

Carmichael Ong

Jennifer L. Hicks

Sean F. Carroll

+3 more

Mar 2018

Synthesizing physiologically-accurate human movement in a variety of conditions can help practitioners plan surgeries, design experiments, or prototype assistive devices in simulated environments, reducing time and costs and improving treatment outcomes. Because of the large and complex solution spaces of biomechanical models, current methods are constrained to specific movements and models, requiring careful design of a controller and hindering many possible applications. We sought to discover if modern optimization methods efficiently explore these complex spaces. To do this, we posed the problem as a competition in which participants were tasked with developing a controller to enable a physiologically-based human model to navigate a complex obstacle course as quickly as possible, without using any experimental data. They were provided with a human musculoskeletal model and a physics-based simulation environment. In this paper, we discuss the design of the competition, technical difficulties, results, and analysis of the top controllers. The challenge proved that deep reinforcement learning techniques, despite their high computational cost, can be successfully employed as an optimization method for synthesizing physiologically feasible motion in high-dimensional biomechanical systems.

Learn to Move Through a Combination of Policy Gradient Algorithms: DDPG, D4PG, and TD3

Nicolas Bach*

Andrew Melnik*

Malte Schilling

Timo Korthals

and Helge Ritter

Jan 2021

NeurIPS 2019: Learn to Move - Walk Around

Deep Reinforcement Learning has recently seen progress for continuous control tasks, driven by yearly challenges such as the NeurIPS Competition Track. This work combines complementary characteristics of two current state of the art methods, Twin-Delayed Deep Deterministic Policy Gradient and Distributed Distributional Deep Deterministic Policy Gradient, and applied this in the state-of-the-art Learn to move—Walk Around locomotion control challenge which was part of the NeurIPS 2019 Competition Track. The combined approach showed improved results and achieved the 4th place in this competition. The article presents this combination and evaluates the performance.

Overview of the MEDIQA 2019 Shared Task on Textual Inference, Question Entailment and Question Answering

Asma Ben Abacha

Chaitanya Shivade

Dina Demner-Fushman

Aug 2019

Proceedings of the BioNLP 2019 workshop,Florence, Italy, August 1, 2019.

MEDIQA 2019 - Natural Language Inference (NLI)

This paper presents the MEDIQA 2019 shared task organized at the ACL-BioNLP workshop. The shared task is motivated by a need to develop relevant methods, techniques and gold standards for inference and entailment in the medical domain, and their application to improve domain specific information retrieval and question answering systems. MEDIQA 2019 includes three tasks: Natural Language Inference (NLI), Recognizing Question Entailment (RQE), and Question Answering (QA) in the medical domain. 72 teams participated in the challenge, achieving an accuracy of 98% in the NLI task, 74.9% in the RQE task, and 78.3% in the QA task. In this paper, we describe the tasks, the datasets, and the participants’ approaches and results. We hope that this shared task will attract further research efforts in textual inference, question entailment, and question answering in the medical domain.

Back to AIcrowd Research