Hello, I want to share my solution.
The competition was very interesting and unusual. And it was my first competition on AI crowd platform and guides/pages/discussions were very helpful for me. So thanks to organizers!!!
Actually my solution is very similar to xiaozhou_wang’s.
I have two strategies. First strategy is based on the idea to collect samples with “hard” classes (it went from Round 1). Suppose we have a trained model and we know F1-measure for all six classes from validation. Let us sum class predictions with weights equal to 1 - f1_validataion. And then choose samples with maximum of weighted predictions.
def choose_unlabelled_by_sum_probs(self, unlabelled_indices, unlabelled_preds, choose_size): assert len(unlabelled_indices) == len(unlabelled_preds) if len(unlabelled_indices) <= choose_size: return unlabelled_indices _, best_f1s = self.best_states['best_thrs_0'] choose_scores = unlabelled_preds[:, 0] * (1 - best_f1s) for x in range(1, n_classes): choose_scores += unlabelled_preds[:, x] * (1 - best_f1s[x]) sorted_indices = np.argsort(-choose_scores) return [unlabelled_indices[x] for x in sorted_indices[:choose_size]]
The second strategy is to collect samples with higher uncertainty. I consider the prediction 0.5 is the most uncertain, so I just sum the absolute value of 0.5 – over all classes.
def choose_unlabelled_by_uncertainty(self, unlabelled_indices, unlabelled_preds, choose_size): assert len(unlabelled_indices) == len(unlabelled_preds) if len(unlabelled_indices) <= choose_size: return unlabelled_indices _, best_f1s = self.best_states['best_thrs_0'] choose_scores = np.sum(0.5 - np.abs(unlabelled_preds - 0.5), axis=1) sorted_indices = np.argsort(-choose_scores) return [unlabelled_indices[x] for x in sorted_indices[:choose_size]]
I also considered the third strategy from hosts: “match labels to target distribution”, but it was worse than without it. PS. to organizers – I have this code in my solution since I exprimented, but take very little samples by it and I think it doesn’t matter for score.
I tried several ratios of first strategies, but I didn’t see an obvious advantage of one of them. So finally I used both strategies with the equal budget.
I saw the idea of “Active Learning” in one of papers and decided to make several iterations (let’s say, L).
- Train a model with current known samples
- Take ~purchase_budget//L samples by two strategies (the last one batch can be bigger by 1).
The problem was to calculate the number L of iterations. My way is not so clever as xiaozhou_wang’s. I noticed that ~300 samples are enough for one iteration. Even more, in my experiments sometimes more iterations worsened a result. I looked at the submissions table to estimate training time and inference time. So I came to the formula (I have Pretraining Phase, so the first iteration doesn’t need training)
max_choose_size = min(len(unlabelled_dataset), purchase_budget) n_loops = max(1, min(1 + (compute_budget - 50) // 220, int_ceil(max_choose_size, 290)))
For training I used efficientnet_b3, 5 epochs with
CosineAnnealingLR(optimizer, T_max=5, eta_min=1e-5)
and the following augmentations
return A.Compose([ A.OneOf([A.GaussianBlur(), A.MotionBlur()], p=0.5), A.ToGray(p=0.01), A.HorizontalFlip(p=0.5), A.VerticalFlip(p=0.5), A.RandomRotate90(p=0.5), ])
Each of 5 training pipilines will go with its own budget, right ?
Hi, it seems theres’s a bug in local_evaluation.py.
I think you should change
time_available = COMPUTE_BUDGET - (time_started - time.time())
time_available = COMPUTE_BUDGET - (time.time() - time_started)
Thanks for publishing your solution!
Do you know how much “pseudolabel remaining dataset” gives in terms of accuracy? (a boost)
I didn’t use it.
I’ve checked it locally.
Using all 10K images is better than my 3K choosing by 0.006. Maybe I can take some of it by changing purchasing algorithm. But still I feel I need to tune my model.
I wrote scores from the leaderboard. I can’t check 10K there…
Local scores are a little bit higher than LB, but correlated with LB.
Yeah maybe I’ll check it locally.
Here are just my results. I used the same model, but different purchase modes.
- Train with initial 5000 images only: LB 0.869
- Add 3000 random images from unlabelled dataset: 0.881
- “smart” purchasing (at least non random): 0.888
So we see, that using some “smart” purchasing is helpful, but not so many, maybe ~0.01.
Probably tuning models would be more helpful to push further.
If I understood correctly, then the first round means a little and is preliminary. The second round is decisive, right?
Ahh… I see so AICrowd runs the whole pipeline twice, and I can see logs only from the debug version.
During submission sizes of datasets are only 100 (both training dataset and unlabelled dataset).
Probably it is the debug version.
Is it intentionally?
I think local evaluation can be modified somehow.
Maybe in ZEWDPCProtectedDataset class, that it doesn’t give you the label in a sample.
Sorry, what’s the right way to use pre-trained model?
I’ve tried “models.resnet18(pretrained=True)” but it has failed with
urllib.error.URLError: <urlopen error [Errno 99] Cannot assign requested address>