
Location
Badges
Activity
Challenge Categories
Challenges Entered
A benchmark for image-based food recognition
Latest submissions
Using AI For Buildingโs Energy Management
Latest submissions
What data should you label to get the most value for your money?
Latest submissions
See All| failed | 184202 | ||
| graded | 179185 | ||
| graded | 179000 |
Behavioral Representation Learning from Animal Poses.
Latest submissions
Classify images of snake species from around the world
Latest submissions
| Participant | Rating |
|---|
| Participant | Rating |
|---|
Data Purchasing Challenge 2022
Code for End of Competition Training pipelines
Over 3 years agoEach of 5 training pipilines will go with its own budget, right ?
:aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!
Over 3 years agoHi, it seems theresโs a bug in local_evaluation.py.
I think you should change
time_available = COMPUTE_BUDGET - (time_started - time.time())
โ
time_available = COMPUTE_BUDGET - (time.time() - time_started)
0.9+ Baseline Solution for Part 1 of Challenge
Almost 4 years agoThanks for publishing your solution!
Do you know how much โpseudolabel remaining datasetโ gives in terms of accuracy? (a boost)
I didnโt use it.
Experiments with โunlabelledโ data
Almost 4 years ago
Iโve checked it locally.
Using all 10K images is better than my 3K choosing by 0.006. Maybe I can take some of it by changing purchasing algorithm. But still I feel I need to tune my model.
Experiments with โunlabelledโ data
Almost 4 years agoI wrote scores from the leaderboard. I canโt check 10K thereโฆ
Local scores are a little bit higher than LB, but correlated with LB.
Yeah maybe Iโll check it locally.
Experiments with โunlabelledโ data
Almost 4 years agoHere are just my results. I used the same model, but different purchase modes.
- Train with initial 5000 images only: LB 0.869
- Add 3000 random images from unlabelled dataset: 0.881
- โsmartโ purchasing (at least non random): 0.888
So we see, that using some โsmartโ purchasing is helpful, but not so many, maybe ~0.01.
Probably tuning models would be more helpful to push further.
First round doesn't matter?
Almost 4 years agoIf I understood correctly, then the first round means a little and is preliminary. The second round is decisive, right?
Size of Datasets
Almost 4 years agoAhhโฆ I see so AICrowd runs the whole pipeline twice, and I can see logs only from the debug version.
Great, thanks!
Size of Datasets
Almost 4 years agoHello!
During submission sizes of datasets are only 100 (both training dataset and unlabelled dataset).
Probably it is the debug version.
Is it intentionally?
Potential loop hole in purchasing phase
Almost 4 years agoI think local evaluation can be modified somehow.
Maybe in ZEWDPCProtectedDataset class, that it doesnโt give you the label in a sample.
Allowance of Pre-trained Model
Almost 4 years agoSorry, whatโs the right way to use pre-trained model?
Iโve tried โmodels.resnet18(pretrained=True)โ but it has failed with
urllib.error.URLError: <urlopen error [Errno 99] Cannot assign requested address>
๐ Share your solutions! ๐
Over 3 years agoHello, I want to share my solution.
The competition was very interesting and unusual. And it was my first competition on AI crowd platform and guides/pages/discussions were very helpful for me. So thanks to organizers!!!
Actually my solution is very similar to xiaozhou_wangโs.
I have two strategies. First strategy is based on the idea to collect samples with โhardโ classes (it went from Round 1). Suppose we have a trained model and we know F1-measure for all six classes from validation. Let us sum class predictions with weights equal to 1 - f1_validataion. And then choose samples with maximum of weighted predictions.
The second strategy is to collect samples with higher uncertainty. I consider the prediction 0.5 is the most uncertain, so I just sum the absolute value of 0.5 โ over all classes.
I also considered the third strategy from hosts: โmatch labels to target distributionโ, but it was worse than without it. PS. to organizers โ I have this code in my solution since I exprimented, but take very little samples by it and I think it doesnโt matter for score.
I tried several ratios of first strategies, but I didnโt see an obvious advantage of one of them. So finally I used both strategies with the equal budget.
I saw the idea of โActive Learningโ in one of papers and decided to make several iterations (letโs say, L).
The problem was to calculate the number L of iterations. My way is not so clever as xiaozhou_wangโs. I noticed that ~300 samples are enough for one iteration. Even more, in my experiments sometimes more iterations worsened a result. I looked at the submissions table to estimate training time and inference time. So I came to the formula (I have Pretraining Phase, so the first iteration doesnโt need training)
For training I used efficientnet_b3, 5 epochs with
and the following augmentations