Loading

Emotion Detection

Solution for submission 147035

A detailed solution for submission 147035 submitted for challenge Emotion Detection

sean_benhur

Downloading Dataset

AIcrowd had a recent addition that allows you to directly download the dataset from any challenge using AIcrowd CLI.

So we will first need to download the python library by AIcrowd that will allow us to download the dataset by just inputting the API key.

In [1]:
!pip install aicrowd-cli simpletransformers -q
     |████████████████████████████████| 51kB 4.1MB/s 
     |████████████████████████████████| 225kB 11.3MB/s 
     |████████████████████████████████| 163kB 11.2MB/s 
     |████████████████████████████████| 61kB 6.6MB/s 
     |████████████████████████████████| 81kB 7.7MB/s 
     |████████████████████████████████| 61kB 7.3MB/s 
     |████████████████████████████████| 215kB 11.8MB/s 
     |████████████████████████████████| 1.8MB 14.4MB/s 
     |████████████████████████████████| 1.2MB 36.1MB/s 
     |████████████████████████████████| 122kB 48.8MB/s 
     |████████████████████████████████| 51kB 8.1MB/s 
     |████████████████████████████████| 245kB 47.0MB/s 
     |████████████████████████████████| 2.3MB 50.2MB/s 
     |████████████████████████████████| 8.2MB 25.7MB/s 
     |████████████████████████████████| 3.3MB 53.3MB/s 
     |████████████████████████████████| 71kB 9.7MB/s 
     |████████████████████████████████| 51kB 8.4MB/s 
     |████████████████████████████████| 102kB 11.2MB/s 
     |████████████████████████████████| 133kB 52.8MB/s 
     |████████████████████████████████| 245kB 52.8MB/s 
     |████████████████████████████████| 122kB 59.4MB/s 
     |████████████████████████████████| 901kB 36.6MB/s 
     |████████████████████████████████| 81kB 10.7MB/s 
     |████████████████████████████████| 112kB 54.4MB/s 
     |████████████████████████████████| 4.2MB 38.7MB/s 
     |████████████████████████████████| 122kB 54.7MB/s 
  Building wheel for seqeval (setup.py) ... done
  Building wheel for pathtools (setup.py) ... done
  Building wheel for subprocess32 (setup.py) ... done
  Building wheel for blinker (setup.py) ... done
ERROR: google-colab 1.0.0 has requirement ipykernel~=4.10, but you'll have ipykernel 5.5.5 which is incompatible.
ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.25.1 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.
ERROR: datasets 1.8.0 has requirement tqdm<4.50.0,>=4.27, but you'll have tqdm 4.61.1 which is incompatible.
ERROR: transformers 4.6.1 has requirement huggingface-hub==0.0.8, but you'll have huggingface-hub 0.0.10 which is incompatible.
In [2]:

API Key valid
Saved API Key successfully!
In [3]:
# Downloading the Dataset
!mkdir data
val.csv: 100% 262k/262k [00:00<00:00, 1.11MB/s]
train.csv: 100% 2.30M/2.30M [00:00<00:00, 4.51MB/s]
test.csv: 100% 642k/642k [00:00<00:00, 1.57MB/s]

Train

In [4]:
import os
In [5]:
import numpy as np 
import pandas as pd 


from simpletransformers.classification import ClassificationModel
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
In [14]:
custom_args = {'fp16': True, # not using mixed precision 
               'train_batch_size': 16, # default is 8
               'gradient_accumulation_steps': 2,
               'do_lower_case': True,
               'learning_rate': 1e-05, # using lower learning rate
               'overwrite_output_dir': True, # important for CV
               'num_train_epochs': 3} # default is 1
In [7]:
train_data = pd.read_csv("/content/data/train.csv")
val_data = pd.read_csv("/content/data/val.csv")
test_data = pd.read_csv("/content/data/test.csv")
In [8]:
combined_data = pd.concat([train_data,val_data],axis=0)
combined_data
Out[8]:
text label
0 takes no time to copy/paste a press release 0
1 You're delusional 1
2 Jazz fan here. I completely feel. Lindsay Mann... 0
3 ah i was also confused but i think they mean f... 0
4 Thank you so much. ♥️ that means a lot. 0
... ... ...
3468 I remember saying Kristin Wade messed up the t... 1
3469 Wayakum, Julia Burns make it easy for you insh... 0
3470 That’s pretty cool. Bet your dad is *vewy pwou... 0
3471 That is a valid question, I hope it can change... 0
3472 I feel personally attacked on this one. We're ... 0

34728 rows × 2 columns

In [15]:
model = ClassificationModel("roberta", "cardiffnlp/twitter-roberta-base-emotion", args=custom_args) 
model.train_model(combined_data)
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-emotion and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/usr/local/lib/python3.7/dist-packages/simpletransformers/classification/classification_model.py:602: UserWarning: Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels.
  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Out[15]:
(3255, 0.30974922824090206)

Submitting Results 📄

Okay, this is the last section 😌 , let's get out testing results from the model real quick and submit our prediction directly using AIcrowd CLI

In [ ]:
predictions, raw_outputs = model.predict(list(test_data['text'].values))
In [11]:
# Applying the predictions to the labels column of the sample submission 
test_data['label'] = predictions
test_data
Out[11]:
text label
0 I was already over the edge with Cassie Zamora... 1
1 I think you're right. She has oodles of cash a... 0
2 Haha I love this. I used to give mine phone bo... 0
3 Probably out of desperation as they going no a... 0
4 Sorry !! You’re real good at that!! 0
... ... ...
8677 Yeah no...I would find it very demeaning 1
8678 This is how mafia works 0
8679 Ah thanks 👍🏻 0
8680 I ask them straight why they don't respect my ... 0
8681 Annette Acosta also tends to out vote Annette ... 0

8682 rows × 2 columns

Note : Please make sure that there should be filename submission.csv in assets folder before submitting it

In [12]:
!mkdir assets

# Saving the sample submission in assets directory
test_data.to_csv(os.path.join("assets", "submission.csv"), index=False)

Uploading the Results

Note : Please save the notebook before submitting it (Ctrl + S)

In [13]:

Mounting Google Drive 💾
Your Google Drive will be mounted to access the colab notebook
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code

Enter your authorization code:
4/1AY0e-g52V5knG4JI5QqpcnZMQw9-0xl45TUTRhdmJ3bhc0qsRLcyMqZf4LI
Mounted at /content/drive
Using notebook: /content/drive/MyDrive/Colab Notebooks/AI CROWD Blitz -9 #1 for submission...
Scrubbing API keys from the notebook...
Collecting notebook...
submission.zip ━━━━━━━━━━━━━━━━━━ 100.0%321.5/319.9 KB656.6 kB/s0:00:00
                                                  ╭─────────────────────────╮                                                  
                                                  │ Successfully submitted! │                                                  
                                                  ╰─────────────────────────╯                                                  
                                                        Important links                                                        
┌──────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection/submissions/147026              │
│                  │                                                                                                          │
│  All submissions │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection/submissions?my_submissions=true │
│                  │                                                                                                          │
│      Leaderboard │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection/leaderboards                    │
│                  │                                                                                                          │
│ Discussion forum │ https://discourse.aicrowd.com/c/ai-blitz-9                                                               │
│                  │                                                                                                          │
│   Challenge page │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection                                 │
└──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Congratulations 🎉 you did it, but there still a lot of improvement that can be made, data exploration is one of the most import pipelines in machine learning, especially in competitions, so maybe see if there is data imbalance, how minimize it's effects, maybe looking first few rows to each dataset. Or maybe improving the score, have fun!

And btw -

Don't be shy to ask question related to any errors you are getting or doubts in any part of this notebook in discussion forum or in AIcrowd Discord sever, AIcrew will be happy to help you :)

Also, wanna give us your valuable feedback for next blitz or wanna work with us creating blitz challanges ? Let us know!

In [ ]:


Comments

You must login before you can post a comment.

Execute