#### Location

#### Badges

#### Activity

#### Ratings Progression

#### Challenge Categories

#### Challenges Entered

Using AI For Buildingβs Energy Management

#### Latest submissions

What data should you label to get the most value for your money?

#### Latest submissions

See Allfailed | 184202 |
||

graded | 179185 |
||

graded | 179000 |

A benchmark for image-based food recognition

#### Latest submissions

Behavioral Representation Learning from Animal Poses.

#### Latest submissions

Classify images of snake species from around the world

#### Latest submissions

Participant | Rating |
---|

Participant | Rating |
---|

#### Data Purchasing Challenge 2022

### Code for End of Competition Training pipelines

10 months agoEach of 5 training pipilines will go with its own budget, right ?

### :aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!

11 months agoHi, it seems theresβs a bug in local_evaluation.py.

I think you should change

time_available = COMPUTE_BUDGET - (time_started - time.time())

β

time_available = COMPUTE_BUDGET - **(time.time() - time_started)**

### 0.9+ Baseline Solution for Part 1 of Challenge

11 months agoThanks for publishing your solution!

Do you know how much βpseudolabel remaining datasetβ gives in terms of accuracy? (a boost)

I didnβt use it.

### Experiments with βunlabelledβ data

12 months agoIβve checked it locally.

Using all 10K images is better than my 3K choosing by 0.006. Maybe I can take some of it by changing purchasing algorithm. But still I feel I need to tune my model.

### Experiments with βunlabelledβ data

12 months agoI wrote scores from the leaderboard. I canβt check 10K thereβ¦

Local scores are a little bit higher than LB, but correlated with LB.

Yeah maybe Iβll check it locally.

### Experiments with βunlabelledβ data

12 months agoHere are just my results. I used the same model, but different purchase modes.

- Train with initial 5000 images only: LB 0.869
- Add 3000 random images from unlabelled dataset: 0.881
- βsmartβ purchasing (at least non random): 0.888

So we see, that using some βsmartβ purchasing is helpful, but not so many, maybe ~0.01.

Probably tuning models would be more helpful to push further.

### First round doesn't matter?

12 months agoIf I understood correctly, then the first round means a little and is preliminary. The second round is decisive, right?

### Size of Datasets

12 months agoAhhβ¦ I see so AICrowd runs the whole pipeline twice, and I can see logs only from the debug version.

Great, thanks!

### Size of Datasets

12 months agoHello!

During submission sizes of datasets are only 100 (both training dataset and unlabelled dataset).

Probably it is the debug version.

Is it intentionally?

### Potential loop hole in purchasing phase

12 months agoI think local evaluation can be modified somehow.

Maybe in ZEWDPCProtectedDataset class, that it doesnβt give you the label in a sample.

### Allowance of Pre-trained Model

12 months agoSorry, whatβs the right way to use pre-trained model?

Iβve tried βmodels.resnet18(pretrained=True)β but it has failed with

urllib.error.URLError: <urlopen error [Errno 99] Cannot assign requested address>

## π Share your solutions! π

9 months agoHello, I want to share my solution.

The competition was very interesting and unusual. And it was my first competition on AI crowd platform and guides/pages/discussions were very helpful for me. So thanks to organizers!!!

Actually my solution is very similar to xiaozhou_wangβs.

I have two strategies. First strategy is based on the idea to collect samples with βhardβ classes (it went from Round 1). Suppose we have a trained model and we know F1-measure for all six classes from validation. Let us sum class predictions with weights equal to 1 - f1_validataion. And then choose samples with maximum of weighted predictions.

The second strategy is to collect samples with higher uncertainty. I consider the prediction 0.5 is the most uncertain, so I just sum the absolute value of 0.5 β over all classes.

I also considered the third strategy from hosts: βmatch labels to target distributionβ, but it was worse than without it. PS. to organizers β I have this code in my solution since I exprimented, but take very little samples by it and I think it doesnβt matter for score.

I tried several ratios of first strategies, but I didnβt see an obvious advantage of one of them. So finally I used both strategies with the equal budget.

I saw the idea of βActive Learningβ in one of papers and decided to make several iterations (letβs say, L).

The problem was to calculate the number L of iterations. My way is not so clever as xiaozhou_wangβs. I noticed that ~300 samples are enough for one iteration. Even more, in my experiments sometimes more iterations worsened a result. I looked at the submissions table to estimate training time and inference time. So I came to the formula (I have Pretraining Phase, so the first iteration doesnβt need training)

For training I used

efficientnet_b3, 5 epochs withand the following augmentations