
1 Follower
0 Following
xiaozhou_wang
Location
Badges
0
0
0
Activity
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mon
Wed
Fri
Ratings Progression
Loading...
Challenge Categories
Loading...
Challenges Entered
Using AI For Buildingβs Energy Management
Latest submissions
What data should you label to get the most value for your money?
Latest submissions
See Allgraded | 179187 | ||
graded | 179186 | ||
graded | 179182 |
Latest submissions
See Allgraded | 189718 | ||
graded | 189313 | ||
graded | 189283 |
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
The first, open autonomous racing challenge.
Latest submissions
Participant | Rating |
---|---|
![]() |
0 |
Participant | Rating |
---|
xiaozhou_wang has not joined any teams yet...
Data Purchasing Challenge 2022

[Announcement] Leaderboard Winners
10 months agoThanks Camaro! Would love to share my approach. Just not sure what is the usual way of sharing solutions at aicrowd (e.g. do we just do a post or do we make our code public, or is there any other place they ask us to put everything there?)

Which submission is used for private LB scoring?
11 months agoI couldnβt find it anywhere stating this. Which of the following statements is true?
- all submissions to round 2 will be evaluated on private LB and the best score is picked automatically for each participant.
- only the best submission on public LB of each participant will be selected for private LB scoring
- each participant needs to specify which submission to use for private LB scoring.
Thank you in advance for clarification!

xiaozhou_wang has not provided any information yet.
π Share your solutions! π
9 months agoFirst of all I would like to say huge thank you to aicrowd for this unique and super fun challenge! Also congrats to other winners! I consider myself very lucky to land the first place and would love to share my solution and learnings here!
My solution is very simple and straightforward. It is basically βiteratively purchase the next batch of data with the best possible model until run out of purchase budgetβ. One of the biggest challenges for this competition, imho, is that you cannot get very reliable performance scores locally or on public leaderboard. So if we can filter out more noise from the weak signal of the scores, the chance of overfitting may be much lower. And during my experiments, I was more focused on simple strategies, mainly because more complex strategies require more tuning which means more decisions to make, and higher risk of overfitting (since everytime when making a decision, we may like to refer to the same local and public scores, over and over again).
OK, enough hypothesis and high level talk! Hereβs the details (code):
Most importantly, the purchase strategy:
basically, select the most uncertain samples based on entropy.
And for each iteration, number of labels to purchase is decided on the fly given the compute and purchase budget:
basically, we try to see if we can purchase 300 images for each iteration and get the purchase budget exhausted before we run out of time. If not, then we increase it to 350 images (so fewer iterations), and see if that works. And then increase to 400 imagesβ¦ And we do it for each iteration and only take the first element of the purchase list generated by the strategy. Namely, we may have decided to purchase 300 images for each iteration last round, and may increase that to 400 images this iteration. Mainly because we couldnβt accurately estimate the exact time it may take to train the next iteration model so would like to re-estimate each time if we can still finish in time. In fact, I did a moving average (with a extra 0.1 time buffer) to estimate how long it may take to train the next iteration.
Now within each iteration, we need to train the model.
The model I ended up with is
basically, the most complex model that is still reasonable to train.
Like any other computer vision problems, data augmentation is also very key:
Was trying to also do test time augmentation but I found that prediction takes too much time, and it might not be worth it.
And the optimizer training scheduler:
NEPOCH was a tuning parameter, I tried 5, 7, 10, 15. 5 or 7 didnβt seem to be enough, 15 seemed to be a bit too much, and 10 seemed to be pretty good.
So baiscally, the flow works like this:
Hopefully this is helpful! And please ask any questions if you have any!