Loading
Feedback

dipam_chakraborty 0

Name

Dipam Chakraborty

Location

IN

Badges

0
0
1

Activity

Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Mon
Wed
Fri

Ratings Progression

Loading...

Challenge Categories

Loading...

Challenges Entered

Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments

Latest submissions

See All
graded 93732
graded 93693
graded 93520

3D seismic image Interpretation by Machine Learning

Latest submissions

No submissions made in this challenge.
Gold 0
Silver 0
Bronze 1
Trustable
May 16, 2020

Badges


  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020
  • Has filled their profile page
    May 16, 2020

  • May 16, 2020

  • May 16, 2020
Participant Rating
priteshgohil 0
Participant Rating
  • Gamma NeurIPS 2020: Procgen Competition
    View

NeurIPS 2020: Procgen Competition

Questions about the competition timeline

5 days ago

Hello @jyotish … Will there be a generalization track? If yes, can we select which submissions to use for generalization?

Can we change the x-axis on the Graphana viewer?

9 days ago

Hi @shivam, ok thanks for the info. Can it aslo be changed into values apart from clock time, for example environment steps?

Can we change the x-axis on the Graphana viewer?

9 days ago

Would be helpful for some analysis.

Anyone else getting MaxRuntimeExceeded in training?

20 days ago

I too recieved it again on #87296 in gemjourney and hovercraft.

Spot instances issues

26 days ago

Hi @xiaocheng_tang, I worked around it by saving the replay buffer as part of the checkpoint (with compression), but it seems like this is causing some new error MaxRuntimeExceeeded.

Waiting for organizers for a proper fix.

Anyone else getting MaxRuntimeExceeded in training?

26 days ago

Hi @jyotish I too recieved this error in #86264. I’d like to clarify that I’ve setup some code that saves the experience replay buffer along with the checkpoints because the reset of spot instances is messing up the replay buffer and really hurts performance. But the checkpoint saving and loading time doesn’t seem to be counted in ray “Time elapsed”. If this is the issue please suggest some proper solution to get replay buffers properly working with spot instances.

Spot instances issues

28 days ago

Hi @jyotish

Any suggestions on how to save the replay buffer properly with ray?

Anyone else having trouble with Plunder rollout timeout?

About 1 month ago

In my testing actually, there is high variance of result at 1000 runs but reduces at 5000 … and I still think rollouts can be much faster with more optimized code for parallel envs.

Anyone else having trouble with Plunder rollout timeout?

About 1 month ago

I think the rollout setup is pretty suboptimal as well. I guess its using same number of parallel envs and workers as training, but typically a lot of memory goes for the training which is pretty much not used for inference with the current setup, the inference code should be allowed to be optimized for 8 workers and as many parallel envs that maximizes throughput.

Round 2 is open for submissions 🚀

About 1 month ago

@Paseul I think the normalization ranges for the private envs can be obtained if you log the “return max” and “return blind” as custom metrics. I haven’t tried it though.

Number of environments in round 2

About 1 month ago

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

Open the round 1 testing in round 2?

About 2 months ago

I suspect sample efficiency will play a significant role for generalization in this 8M timesteps setting, according to the results files released by openai, the test performance keeps increasing until it the 200 training levels nearly have the maximum average reward.

Training Results

How to find subtle implementation details

About 2 months ago

Hello @jyotish

Yeah, you guys did a really awesome job of matching the baseline, though you left out some details (intentially? :sweat_smile:). I was actually surprised how many small one-line optimizations are hidden in openai baselines’ version … though not all of them helped in my case.

This one I believe is a bug but haven’t got any response about it though. Deep neural nets are so awesome everything works fine even with this bug present.

How to find subtle implementation details

About 2 months ago

Haha sorry if I left you hanging there, these are the differences in torch and tf I found can make a difference if used in the correct way, xavier init vs kaiming init, zero vs non-zero bias init, epsilon parameter in adam. Interestingly, I noticed the differences after moving from torch to tensorflow, because ray still uses placeholders for tensorflow which made it a pain to work with, but my torch code is slower than tensorflow so I had to give up some compute time.

How to find subtle implementation details

About 2 months ago

A very fun (also annoying) part of deep learning is that subtle differences in implementations are very easy to miss, for example, there are differences in default values in torch and tensorflow, or in rllib and openai baselines … some of these differences actually helped boost performance significantly in Round 1

Is there any good way to find these subtle details effectively, apart from thoroughly reading code and small unit tests?

About Ray trainer and workers

About 2 months ago

If I understood correctly, the RLLib trainer process waits while the worker processes collect the rollouts, but if we set num_workers to 0, the trainer process collects the rollouts. Is it possible that the trainer process also collects rollouts along with the workers? I’m guessing that will increase throughput somewhat.

Running the evaluation worker during evaluations is now optional

About 2 months ago

Hello @joao_schapke

I’m a complete rllib noob, can you please share some code snippet or link of how to output the custom metrics.

Min and Max rewards for an environment

2 months ago

So do we need to use these attributes provided or can we program our own? For example Distributional RL we need the actual minimum instead of the “blind observation training” minimum given.

📢 [Announcement] Timesteps budget when using frameskip

2 months ago

I think its because the video renders on only the frame output by the environment, since a frameskip wrapper… skips frames :smiley: … it will look faster … It will probably be easier to just go and check the code though … but I’m being lazy about it.

Submission issues

2 months ago

Hello @jyotish

Submission #78454 failed after all steps completed, is it due to network issue?

Running the evaluation worker during evaluations is now optional

2 months ago

Hello @jyotish

I’m getting this error on local machine, is it some ray version issue or something else? Ray version installed is ray[rllib]==0.8.5

File "<...>/python3.7/site-packages/ray/tune/experiment.py", line 170, in from_json
exp = cls(name, run_value, **spec) 
TypeError: __init__() got an unexpected keyword argument 'disable_evaluation_worker'

📢 [Announcement] Timesteps budget when using frameskip

2 months ago

I was just thinking that the “fast” submission videos were using frameskip, was that the reason?

Submission issues

2 months ago

Hello @jyotish

Can you please clarify what happened in submission #77475, three environments ran fine but a lot of the log data is missing, and coinrun immediately failed, log file for coinrun is empty.

📢 [Announcement] Environment specific logic is not allowed

2 months ago

One more thing I’d like to point out is the file size limit set for scraping, 30 MB is actually quite high given Impala baseline is only ~7 MB even with the optimizer state, without optimizer its ~3 MB I guess.

How do checkpoints work in rllib

2 months ago

Ok, I was confusing the get_weights function in TorchPolicy with get_state. Thanks for the clarification, I guess for custom policy extra training related states need to go in get_state then.

How do checkpoints work in rllib

2 months ago

I was going through the basic TorchPolicy code for rllib, and it isn’t clear to me if the optimizer state and other training-related states are saved as part of the checkpoints. Though it seems PPO with Adam doesn’t get affected much for losing the optimizer state, but I’d like to know out of curiosity from people who are more familiar with rllib. I do think this may be relevant for custom policies though because the runs seem to sometimes stop in between and get resumed from the last checkpoint.

Submission issues

3 months ago

Hello @jyotish

On local machine I have a conda environment with python 3.7 and only install requirements.txt mentioned below with pip.

ray[rllib]==0.8.5
procgen==0.10.1
torch==1.3.1
torchvision==0.4.2
mlflow==1.8.0
boto3==1.13.10

I haven’t tried pip freeze/conda export … will surely try it on the next submission. If you have any best known practices advice regarding that please let me know.

Submission issues

3 months ago

Hello @jyotish

Sorry for bothering about this issue too many times. But I’m certain #76348 and #76311 should not have hit OOM, local runs are taking up only 14.3 GB (including the evaluation worker). Just like #76311, in #76348 coinrun and miner hit OOM after exactly 14 iters, while bigfish didn’t, then the pod got reset and the new machine resumed from 75 iters, then it hit OOM after 75+14 = 89 iters. Is the memory usage so machine-dependent? Was something new introduced that is causing OOM errors after exactly 14 iters?

Submission issues

3 months ago

Hello @jyotish

Thanks for re-queuing #76265.

Can you please also help me with #76311, I believe if #76265 is running ok so should #76311, but it failed with OOM exactly after 14 iters for coinrun, bigfish and miner, and after 24 for caterpillar. In the pod info GPU memory section there are 4 items in #76265 and 5 items in #76311. I think this is somehow related to evaluation run, as the difference between these two shouldn’t cause OOM, rather #76311 should take slightly less memory.

Rllib custom env

3 months ago

This kind of points out a sort of loophole though, the env config can be modified from a wrapper (for example setting num_levels or changing other keys), which should be against competition rules. I think the organizers should explicitly mention this in the rules, else we could probably use “paint_vel_info” like the last round, though I don’t want to waste a submission to try it.

Rllib custom env

3 months ago

@tim_whitaker Copy the config instead of popping the value directly, python is implicitly is having a “pass by reference” on the config.

Submission issues

3 months ago

Hello @jyotish

Submission #76265 seems to be stuck in some initial stage, it’s on “evaluation initiated” for 8 hours but never started.

"No slot" submissions getting counted among 5 daily submissions

3 months ago

If I make a submission after the time mentioned by a “The participant has no submission slots remaining for today”, It moves the time to the next submission’s time. I think its counting the failed no slot submissions among the daily 5 submission limit. Not sure if that is the intended operation but it kinda seems like a bug. :no_mouth:

How to install external libraries?

3 months ago

Merged the starter kit changes and submitted #75275, but its still failing on build image stage. Is this one also due to network issues or with the environment setup. I’ve just added scikit-image to requirements.txt, the rest of the environment related files are same as current starter kit repo.

How to install external libraries?

3 months ago

Thanks for the info regarding setting up custom environment.

I tried to make submission #74986 with docker_build set to true, didn’t make any changes to the provided dockerfile and only added scikit-image. The submission failed immediately and no error is mentioned. Did I need to make any changes to the dockerfile or was it something else?

How to install external libraries?

3 months ago

I want to use skimage in my code, how to install it? I added scikit-image to the requiremts.txt but it doesn’t seem to work.

Submission failed after training and rollouts completed

3 months ago

Organizers can anyone please clarify why submission #74986 failed. I had a few runs with OOM issues that fail immediately but this one completed fully and failed in aggregating scores stage. I checked the logs but since those are training logs they have no errors.

i

Do evalution levels appear in training phase?

3 months ago

Round 1 is on complete distribution of levels, the held out levels you’re describing will be in the next round. In that context, there are 2^32 levels for each game, which makes repeating levels unlikely, but the training encounters many more levels than if the number of levels is explicitly limited to 200. This means it generalizes much better when trained on the fully distribution of levels. Round 1 is all about sample efficiency to get a higher score on all 4 environments with the same algorithm.

Is the private environment score on public leaderboard

3 months ago

Need a clarification whether the score from the private environment is part of the public leaderboard score. From Kaggle its generally observed that giving away the private score on public leaderboard results in too many submissions doing metric hacking by tuning hyperparameters.

Problems of using rllib for a research competition

4 months ago

Right now I’ve planned to test everything on my own machine and then move write the final code on rllib after refining it. But of course it will be much harder to check properly on 16 environments in the final round. I only have a single machine, For anyone without a local machine they would be at a even higher disadvantage. Maybe one strategy is to team up with someone who is already an rllib expert.

Problems of using rllib for a research competition

4 months ago

I’ve read through FAQ: Implementing a custom random agent and use most concepts from it. I’ll still say rllib makes development quite slow from a research perspective. I still don’t see the value of using a distributed framework and all its overhead for a single machine training system. I understand a lot development must have gone from the organizers side in terms of providing logs or tracking metrics etc using rllib, but its a big burden from a research perspective.

Problems of using rllib for a research competition

4 months ago

The following might be somewhat of a rant, my point of view is from a research point of view, I understand there may be other considerations to make.

The rllib constraint of the competition seems to be really problematic from a research point of view. The framework has a very steep learning curve and lots of constraints for memory management and algorithmic flow, it’s great for large scale training, but unnecessary overhead for single machine training.

I’m sure many others must have faced the following situation. You think of an idea and try it in the framework you know well, for example, baselines or dopamine, then you spend much more time hacking it into rllib only to hit unnecessary memory management issues, even though the same method works on baselines with same constraints, this really hinders the research time. In my opinion, research like this should have more flexibility.

I understand no solution is perfect, but it would be better to be given an environment with a wrapper that limits the timesteps, and a simple API and model filename to call for the final evaluation. The reasoning mentioned for rllib by the organizers is easy future integration of the top solutions, but I doubt that you could join RL algorithms so easily, for example, an off-policy method with an on-policy method. Given this is a NeuRIPs competition on a benchmark on which very little research has been done, I would imagine making the research easier would be an important aspect.

So I request the organizers to consider relaxing the rllib constraint.

Competition metric seems to favor sample efficiency over generalization gap

5 months ago

Thank you @Miffyli and @kcobbe for the insightful and helpful explanations. Looking forward to the competition.

Competition metric seems to favor sample efficiency over generalization gap

5 months ago

As far as I understand, the Procgen benchmark was made to address generalization gap. If mean normalized reward is used as the metrics on a small number of timesteps, is it not more likely to favor sample efficiency than generalization gap? If current algorithms cannot achieve the maximum train time reward in 8M timesteps, how do we disentangle sample efficiency and generalization gap?

I understand there must have been practical considerations taken for selecting the 8M timestep threshold. Would like to know the community’s and organizer’s thoughts on this.

Seismic Facies Identification Challenge

📝 Explained by the Community | Win 4 x DJI Mavic Drones

24 days ago

EDA on facies variation by geographic axis

Here’s my EDA notebook on how the seismic data varies by geographic axis, along with some ideas for training.

A peek some of the stuff in the notebook

How the patterns look per label:

How the facies vary by z-axis


Splitting the data for training based on the EDA results

Do share your feedback. :blush: I’ll try to add training code soon.

The full colab notebook

Seismic-Facies-EDA - Colab

[Explainer] - EDA of Seismic data by geographic axis

24 days ago

EDA on variation by geographic axis

Here’s my EDA notebook on how the seismic data varies by geographic axis, along with some ideas for training.

A peek some of the stuff in the notebook

How the patterns look per label:

How the facies vary by z-axis


Splitting the data for training based on the EDA results

Do share your feedback. :blush:

Seismic-Facies-EDA - Colab

dipam_chakraborty has not provided any information yet.