If I were an MDX organizer, I would set these rules:
Winners must provide the organizers with training scripts that use only training part of MUSDB18(HQ) and early stopping. The winning submission must contain only models trained using these scripts. Training must be reproducible.
Winners owe nothing to anyone )
(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)
Is a participant obligated to use early stopping? Can a participant arbitrarily choose the number of epochs? If so, does a participant have to justify their choice?
I don’t have much experience with machine learning. Using the test part seems like a tempting idea to to me. Compare these ways:
Training on 86 with validation on 14 and early stopping, then re-training on 100 with optimal number of epochs (or finetuning on 14 ?)
Training on 100 with validation on 50 and early stopping
The second way is faster and can probably lead to a slightly better model.
Cheating scenario: participant train a model with validation on test part, then remove validation step from the training script, submit model to Leaderbord A and say that number of epochs was chosen based on intuition / experience / leaderboard.
This is probably not a very important issue. I just share my thoughts.
I train a model on the training part of MUSDB18, but evaluate it on the test part after each epoch to select optimal number of epochs. So, the training script uses both parts of MUSDB18. Can I submit a model trained in this way to leaderboard A?
My submission failed with message:
Submission failed : Unable to find a valid `aicrowd.json` file at the root of the repository.
But this file is all right.
What happens if a participant does something like this in the submission code?
import os os.environ['INFERENCE_PER_MUSIC_TIMEOUT_SECONDS'] = '250' #240
Participants make dozens of submissions during the competition. Which ones will be re-evaluated on the new songs? Only those that are visible on the leaderboards?
Will this be the longest song in the entire test dataset (28 songs)?
It will be great if organizers reproduce training of winning models from Leaderboard A at the end of the competition. Otherwise, participants can hide usage of extra data.
What is the maximum song duration in the full test dataset (28 songs)?
Dear organizers, could you tell us why MUSDB18 includes all tracks from DSD100, but not all tracks from MedleyDB and only 2 tracks from Native Instruments? Is there anything wrong with the rest of the tracks from these datasets?
Can I train model on a non free dataset or use such pretrained model?
Should the winners provide the training code for their models at the end of the competition? Will the organizers reproduce the training? If so, with what hardware and time resources?
What resources (cpu, gpu, memory, internet) are available from container during evaluation?
Can I train model localy and unpickle it in conteiner?
Can I use external datasets?
My submission was evaluated sucsessfuly in debug mode. I set “debug: false” and got “submission failed”, because function rdkit.Chem.MolFromSmiles returned None for some input strings. My rdkit version is 2020.09.2
Could you please check SMILES in test data?
Can I make new submission before container of last submission stops?
I tried to predict [[‘rose’],,,,] for each molecule in debug mode and got the error:
Submission Vocabulary contains Unknown smell words : .Are you sure you are using the correct vocabulary for this round ?
This force me to predict at least one word in each sentense. Do all input molecules have smells from round-3 vocabulary?
Could you make this data available from the container?