#### Multi-Agent Behavior: Representation, Modeling, Measurement, and Applications

# Clustering of learned annotator embeddings

Using the task 1 and 2 winning solution model to cluster annotation styles from task 2.

# Clustering of learned annotator embeddings¶

In this notebook, we will use the winning model for task 1 and 2 to cluster the annotation style of the different annotators from task 2.

You can learn more about the model in the paper preprint and in the solution outline in the github repository.

# Install dependencies¶

First of all, we need to install the dependencies, which in this case are simply PyTorch and the module containing all the code from my solution.

```
!pip install torch
!pip install git+https://github.com/nebw/mabe.git
```

# Download parameters and predictions from pretrained model¶

Training the model requires a lot of time and compute, and the training code assumes that the training data has already been preprocessed.

Fortunately, we can completely skip the model training for our purposes and load a file with the pretrained model parameters and predictions for all test videos.

The training results are quite large, and we will load them from a shared Google Drive:

```
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
file_id = '1X7RGaoBtFwIvedfoAioZ8Ft1d7PvxbP0'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('pretrained.pt')
```

# Load training results¶

Here we load the `TrainingResult`

object that we just downloaded and instantiate a `MultiAnnotatorLogisticRegressionHead`

, the final component in the model architecture.

The `MultiAnnotatorLogisticRegressionHead`

consists of a LayerNorm layer, a residual block, and final linear layers for each classification from Task 1 and 3 (i.e., a multinomial classification layer for Task 1, and 7 binary classification layers for Task 3). The residual block has one additional input: A learned embedding of the annotator for Task 3. This embedding is initialized as a diagonal matrix, i.e. the embedding for annotator 0 will initially be $[1, 0, 0, 0, 0, 0]$. The model can then learn similar embeddings for annotators with a similar style and the residual block can modify the inputs to the classification head to match the style of each annotator. The annotator embeddings are kept small to avoid overfitting, but I did not experiment with larger or different kinds of embeddings.

```
import torch
import numpy as np
import pandas as pd
import mabe
import mabe.model
```

```
training_result = torch.load('pretrained.pt')[0]
```

```
# usually we would load these hyperparameters directly from the data, but here
# we define them manually to avoid having to download and preprocess the
# training data. they can be ignored in the context of this notebook.
num_extra_features = 4
num_extra_clf_tasks = 7
num_annotators = 6
annotators = np.arange(num_annotators)
```

```
# instantiate a new MultiAnnotatorLogisticRegressionHead object
logreg = mabe.model.MultiAnnotatorLogisticRegressionHead(
training_result.config.num_context,
num_annotators,
num_extra_features,
num_extra_clf_tasks,
)
```

```
def load_annotator_embeddings(logreg):
# load pretrained annotator embeddings from model and store them in a pandas
# dataframe for convenient analysis
annotator_embedding = logreg.embedding.cpu().data.numpy().T
annotator_embedding = pd.DataFrame(
annotator_embedding, columns=[f"Annotator {a}" for a in range(6)]
)
return annotator_embedding
```

```
load_annotator_embeddings(logreg)
```

We can see that the annotator embeddings for a new instance of the logistic regression hat are simply a diagonal matrix.

```
# load parameters from epoch with best validation loss during training
cpc_state_dict, logreg_state_dict = training_result.best_params[0]
logreg.load_state_dict(logreg_state_dict)
_ = logreg.eval()
```

We can now load the learned annotator embeddings from the pretrained model and cluster them.

```
annotator_embedding = load_annotator_embeddings(logreg)
annotator_embedding
```

The annotator embeddings have changed significantely during training, but what does that tell us about the annotation styles of the different annotators?

# Clustering of annotator embeddings¶

The annotator embeddings are initialized as a diagonal matrix, but we expect the model to learn similar embeddings for two annotators if they share a similar annotation style.

We can quantify this by computing the pairwise euclidean distances between the embeddings of the all annotators and then perform hierachical clustering using these distances.

Fortunately, we can use the `clustermap`

function provided by `seaborn`

to perform the distance calculation, clustering, and visualization automatically.

```
import io
import seaborn as sns
import scipy
import scipy.special
import scipy.spatial as sp
import scipy.cluster.hierarchy as hc
import matplotlib.pyplot as plt
```

```
# perform and plot hierachical clustering on the annotator embeddings. similar
# embeddings (with a low euclidean distance) will be assigned to the same
# cluster
cm = sns.clustermap(
annotator_embedding, row_cluster=False, method="ward",
cbar_kws={"label": "Learned value"}
)
cm.ax_heatmap.set_ylabel("Embedding dimension")
plt.setp(cm.ax_heatmap.get_xticklabels(), rotation=90)
```

We can see that the hierachical clustering has assigned two high level clusters to the annotator embeddings. It appears that Annotator 3 and 4 share a similar style. The other annotators get assigned to the second cluster, but there also appears to be some structure here, with Annotator 0 and 1 and Annotator 2 and 5 forming two subclusters.

# Clustering of model predictions¶

We have to be careful when interpreting these results, though. We can't safely assume that low euclidean distances in the annotator embeddings are equivalent with similar annotation styles.

The annotator embeddings are used in the final component of the model, but there are still several fully connected layers with nonlinearities in this final component.

We can verify our previous results by clustering the predictions of to model for the different annotators.

The `TrainingResults`

object we've loaded above already contains the predictions of the model for all test videos and all annotators.

```
# sequence IDs from all test videos
sequences = list(training_result.test_predictions.keys())
```

Due to the size of the predictions, we now have to calculate the distance matrix manually.

The model predictions are stored as compressed numpy arrays, so we need a bit of boilerplate code to load them:

```
distances = []
# iterate over all test video sequence IDs
for sequence_id in sequences:
# this will store the probabilites the model assigns to each class for each
# frame, for each annotator
predictions_by_annotator = []
# iterate over all annotators
for annotator_id in annotators:
# the results are stored as compressed numpy arrays, so we need to load
# them into a BytesIO object first s.t. numpy.load can read them like a
# normal file
compressed_results = io.BytesIO(
training_result.test_logits[sequence_id][annotator_id]
)
# we can now load the model predictions for the current video and
# annotator. We apply the softmax function to turn the model outputs
# (logits) to probabilities
logits = np.load(compressed_results)['arr_0']
predictions = scipy.special.softmax(logits, -1)
predictions_by_annotator.append(predictions)
# we can now stack and flatten the predictions for each annotator
predictions_by_annotator = np.stack(predictions_by_annotator)
predictions_flat = predictions_by_annotator.reshape(num_annotators, -1)
# loaded all predictions from all videos at once and calculating the distance
# matrix in one step would be quite memory intensive, so we simply calculate
# the distance matrix for each video seperately and sum them up at the end.
# note that pdist returns a `condensed distance matrix` which works out fine
# in this case, because that's the format that scipy's linkage function
# expects anyways.
sequence_distances = scipy.spatial.distance.pdist(
predictions_flat,
lambda u, v: ((u - v) ** 2).sum()
)
distances.append(sequence_distances)
distances = np.stack(distances)
# sum up the precomputed distance matrices and apply sqrt to get the final
# euclidean distance matrix.
distances = np.sqrt(np.sum(distances, axis=0))
```

```
# finally, we can use scipy to perform and visualize the hierachical clustering.
Z = scipy.cluster.hierarchy.linkage(distances, 'ward')
dn = scipy.cluster.hierarchy.dendrogram(Z)
plt.xlabel('Annotator')
```

We can see that there are actually differences between clustering the annotation embeddings directly, and clustering the model predictions.

While the high-level clusters are identical (One cluster for Annotator 3 and 4, and another cluster for the remaining Annotators), we now have a subcluster for Annotator 0 and 5, and another for Annotator 1 and 2.

It would be interesting to see how these two clustering approaches compare to a manual (human) inspection of the different annotation styles. Would a domain expert also identify the annotation styles of Annotator 3 and 4 as particularily similar?

#### Content

#### Comments

You must login before you can post a comment.