#### Data Purchasing Challenge 2022

# Representation Learning

Representation Learning: A Stepping Stone for an Improved Data Label Purchase aka Active Learning

In this notebook, I will introduce Representation Learning (RL) and how RL can be utilised for data label purchase aka Active Learning (AL).

# Representation Learning (or Feature Engineering): A Stepping Stone for an Improved Active Learning¶

Reprenstation Learning (RL) is concerned with discovering hidden patterns in the data and encode/compress these information into a feature vector. Using RL in the context of Neural Netwokrs (NN) can also be viewed as Feature Engineering (FE) which removes manual FE, and thus you have a system that learns end-to-end.

## Data Purchasing Challenge¶

As I already mentioned in my previous notebook (explainable AI), key point in this challenge is to learn a good representation of the given training images. The literature offers a series of approaches, but the difficulty arises due to the small number of the training images and the given time constraint. Therefore, the loss for RL is of paramount importance, especially, in the case of little data. I will introduce you RL including possible loss functions. Of course, RL is just half of the battle. The other important part is going to be what to do with the learned feature vectors. Here, I will make a few possible suggestions.

### Load Dependencies¶

```
import math
import os
import random
import time
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models, transforms
from evaluator.dataset import ZEWDPCBaseDataset
from evaluator.evaluation_metrics import get_zew_dpc_metrics
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
```

### Helper function¶

As an exemplary RL approach, I will introduce CosFace or Large Margin Cosine Loss (LMCL) and it is defined as following

The goal is to separate the examples of different classes and bring examples of the same class closer together in the feature space. CosFace aims to cluster examples of the same class together and moves these clusters (classes of clusters) away from each other for a better separation. The following image depicts this geometrically:

is the angle between *W* and the feature vector *x*. The following is an possible implementation of LCML

```
class CosFace(nn.Module):
'''
Paper: CosFace: Large Margin Cosine Loss for Deep Face Recognition
arxiv: https://arxiv.org/pdf/1801.09414.pdf
'''
def __init__(self, in_features, out_features, margin, scale) -> None:
super(CosFace, self).__init__()
self.scale = scale
self.margin = margin
self.weight = nn.Parameter(torch.randn((out_features, in_features)))
def forward(self, features, label):
cosine_similarity = F.linear(F.normalize(features), F.normalize(self.weight)) # cos(phi) = features/||features|| * weight/||weight||
output = (label * (cosine_similarity-self.margin)) + ((1.0 - label) * cosine_similarity)
output *= self.scale
return output
```

```
def train(training_dataset: ZEWDPCBaseDataset, train_idx, val_idx, model, optimizer, batch_size, epochs, criterion, device):
start_time = time.time()
n_train_it, n_val_it = math.ceil(train_idx.shape[0] / batch_size), math.ceil(val_idx.shape[0] / batch_size)
for epoch in range(epochs):
model.train()
train_predictions, val_predictions = [], []
random_idx = np.random.permutation(train_idx.shape[0])
train_idx = train_idx[random_idx]
for batch in range(n_train_it):
optimizer.zero_grad()
data, y_true = [], []
for i in train_idx[batch*batch_size:(batch+1)*batch_size]:
sample = training_dataset[i]
data.append(sample['image'])
y_true.append(sample['label'])
data = torch.stack(data, dim=0).to(device)
y_true = torch.tensor(y_true).float().to(device)
output = model(data, y_true, True)
train_predictions.append(output.cpu().detach())
loss = criterion(output, y_true)
loss.backward()
optimizer.step()
# print(f"==> [Train] Epoch {epoch+1}/{epochs} | Batch {batch+1}/{n_train_it} | Loss {loss:.6f} | Passed time {(time.time() - start_time)/60:.2f} min.")
with torch.no_grad():
train_predictions = torch.concat(train_predictions, dim=0)
y_true = torch.from_numpy(training_dataset._get_all_labels()[train_idx]).float()
loss = criterion(train_predictions, y_true)
train_predictions[train_predictions <= .5] = 0
train_predictions[train_predictions > .5] = 1
scores = get_zew_dpc_metrics(training_dataset._get_all_labels()[train_idx], train_predictions)
print(f"=> [Train] Epoch {epoch+1}/{epochs} | Loss {loss:.6f} | F1 {scores['F1_score_macro']:.3f} | "
f"Acc {scores['accuracy_score']:.3f} | HamL {scores['hamming_loss']:.3f} | Passed time {(time.time() - start_time)/60:.2f} min.")
model.eval()
for batch in range(n_val_it):
data, y_true = [], []
for i in val_idx[batch*batch_size:(batch+1)*batch_size]:
sample = training_dataset[i]
data.append(sample['image'])
y_true.append(sample['label'])
data = torch.stack(data, dim=0).to(device)
y_true = torch.tensor(y_true).float().to(device)
output = model(data, y_true, True)
val_predictions.append(output.cpu())
loss = criterion(output, y_true)
# print(f"==> [Val] Epoch {epoch+1}/{epochs} | Batch {batch+1}/{n_train_it} | Loss {loss:.6f} | Passed time {(time.time() - start_time)/60:.2f} min.")
val_predictions = torch.concat(val_predictions, dim=0)
y_true = torch.from_numpy(training_dataset._get_all_labels()[val_idx]).float()
loss = criterion(val_predictions, y_true)
val_predictions[val_predictions <= .5] = 0
val_predictions[val_predictions > .5] = 1
scores = get_zew_dpc_metrics(training_dataset._get_all_labels()[val_idx], val_predictions)
print(f"=> [Val] Epoch {epoch+1}/{epochs} | Loss {loss:.6f} | F1 {scores['F1_score_macro']:.3f} | "
f"Acc {scores['accuracy_score']:.3f} | HamL {scores['hamming_loss']:.3f} | Passed time {(time.time() - start_time)/60:.2f} min.")
```

### Set Parameters¶

```
device = 'cuda:0' if torch.cuda.is_available else 'cpu'
device = torch.device(device)
```

### Load Dataset¶

```
mean, std = torch.tensor([0.485, 0.456, 0.406]), torch.tensor([0.229, 0.224, 0.225])
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=mean, std=std)
])
training_dataset = ZEWDPCBaseDataset(
images_dir="./data/public_training/images",
labels_path="./data/public_training/labels.csv",
shuffle_seed=seed,
transform=transform
)
n_samples = len(training_dataset)
random_idx = np.random.permutation(n_samples)
train_idx, val_idx = random_idx[:math.floor(n_samples*.9)], random_idx[math.floor(n_samples*.9):]
```

### Load Torchvision's Pre-Trained Model: ResNet18¶

```
class Encoder(nn.Module):
def __init__(self, in_features, out_features, margin, scale) -> None:
super(Encoder, self).__init__()
self.model = models.resnet18(True, False)
self.model = nn.Sequential(*list(self.model.children())[:-1], nn.Flatten())
self.cos_face = CosFace(in_features, out_features, margin, scale)
def forward(self, img, label=None, training=None):
feature_vector = self.model(img)
if training:
output = self.cos_face(feature_vector, label)
return output
return feature_vector
```

```
model = Encoder(512, 6, .5, 64).to(device)
```

### Train ResNet18¶

```
optimizer, batch_size, epochs, criterion = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-4), 32, 12, nn.CrossEntropyLoss()
train(training_dataset, train_idx, val_idx, model, optimizer, batch_size, epochs, criterion, device)
```

So, why are we doing this? The following figure depicts the result up until now

The encoded representation is a compact representation of the input image. Now, the input images can be used for further processing, e.g. they can be used as input for a classifier as depicted in the following figure:

Of course, the classifier can be a neural network but it can also be some other type of classifier. You can also do something else with the learned representations. This is completely up to your creativity.

Above, I introduced one possible loss function. It is one among many. Other possible loss functions are:

- Angular Loss
- Sphere Face Loss
- Contrastive Loss
- ...

The list goes on and on. You can have look at here. It contains many possible loss functions.

### Data (Label) Purchase (aka Active Learning (AL))¶

Now, that we covered RL, let's continue with the data (label) purchase. The question is, how to use RL in the context of AL? Possible approaches are

- Random Sampling
- Least Confidence
- Margin Sampling
- Entropy Sampling
- Ensemble of Active Learners
- ...

E.g., you could train a classifier based on RL and use the probabilities for least confidence, or entropy sampling or even train multiple classifiers and employ them as a comittee of active learners (classifiers).

The list of combinations of RL and AL approaches is comprehensive and there should be something for everyone.

Hopefully, this notebook will be helpful to you. Best of luck in the challenge!

#### MIT LICENSE¶

Copyright 2022 AO

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

```
```

#### Content

#### Comments

You must login before you can post a comment.