Loading

Emotion Detection

Bert Using Ktrain:- LB score:-0.844

In this notebook, You will learn how to use BERT using Ktrain.

aman_patkar

Please Explore more using this notebook. You can change The Bert to FastText or any other Algorithm. You can also introduce the methods to handle the imbalance. Like, if you feel this is helpful.

Sentiment Classification Using BERT

We will use IMDB Movie Reviews Dataset

What is BERT

Bidirectional Encoder Representations from Transformers (BERT) is a technique for NLP (Natural Language Processing) pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google. Google is leveraging BERT to better understand user searches.

BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.

image.png

There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based and fine-tuning.

The feature-based approach, such as ELMo (Peters et al., 2018a), uses task-specific architectures that include the pre-trained representations as additional features. The fine-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pretrained parameters.

The two approaches share the same objective function during pre-training, where they use unidirectional language models to learn general language representations.

Why BERT

  • Accurate
  • Can be used for wide variety of task
  • Easy to use
  • It is game changer in NLP

Additional Reading

Video Lecture: BERT NLP Tutorial 1- Introduction | BERT Machine Learning | KGP Talkie

https://www.youtube.com/watch?v=h_U27jBNYI4

Ref BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

https://arxiv.org/abs/1810.04805

Understanding searches better than ever before:

https://www.blog.google/products/search/search-language-understanding-bert/

Good Resource to Read More About the BERT:

http://jalammar.github.io/illustrated-bert/


What is ktrain

ktrain is a library to help build, train, debug, and deploy neural networks in the deep learning software framework, Keras.

ktrain uses tf.keras in TensorFlow instead of standalone Keras.) Inspired by the fastai library, with only a few lines of code, ktrain allows you to easily:

  • estimate an optimal learning rate for your model given your data using a learning rate finder
  • employ learning rate schedules such as the triangular learning rate policy, 1cycle policy, and SGDR to more effectively train your model
  • employ fast and easy-to-use pre-canned models for both text classification (e.g., NBSVM, fastText, GRU with pretrained word embeddings) and image classification (e.g., ResNet, Wide Residual Networks, Inception)
  • load and preprocess text and image data from a variety of formats

  • inspect data points that were misclassified to help improve your model

  • leverage a simple prediction API for saving and deploying both models and data-preprocessing steps to make predictions on new raw data

Notebook Setup

In [ ]:
!pip install ktrain
Collecting ktrain
  Downloading https://files.pythonhosted.org/packages/bb/41/d36714e51bf4e1d304f2ba80eb3c30c7eed69d72310d7f34fab86ed10b58/ktrain-0.26.4.tar.gz (25.3MB)
     |████████████████████████████████| 25.3MB 120kB/s 
Collecting scikit-learn==0.23.2
  Downloading https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl (6.8MB)
     |████████████████████████████████| 6.8MB 24.1MB/s 
Requirement already satisfied: matplotlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from ktrain) (3.2.2)
Requirement already satisfied: pandas>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from ktrain) (1.1.5)
Requirement already satisfied: fastprogress>=0.1.21 in /usr/local/lib/python3.7/dist-packages (from ktrain) (1.0.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from ktrain) (2.23.0)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from ktrain) (1.0.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from ktrain) (20.9)
Requirement already satisfied: ipython in /usr/local/lib/python3.7/dist-packages (from ktrain) (5.5.0)
Collecting langdetect
  Downloading https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz (981kB)
     |████████████████████████████████| 983kB 38.6MB/s 
Requirement already satisfied: jieba in /usr/local/lib/python3.7/dist-packages (from ktrain) (0.42.1)
Collecting cchardet
  Downloading https://files.pythonhosted.org/packages/80/72/a4fba7559978de00cf44081c548c5d294bf00ac7dcda2db405d2baa8c67a/cchardet-2.1.7-cp37-cp37m-manylinux2010_x86_64.whl (263kB)
     |████████████████████████████████| 266kB 48.4MB/s 
Collecting syntok
  Downloading https://files.pythonhosted.org/packages/8c/76/a49e73a04b3e3a14ce232e8e28a1587f8108baa665644fe8c40e307e792e/syntok-1.3.1.tar.gz
Collecting seqeval==0.0.19
  Downloading https://files.pythonhosted.org/packages/93/e5/b7705156a77f742cfe4fc6f22d0c71591edb2d243328dff2f8fc0f933ab6/seqeval-0.0.19.tar.gz
Collecting transformers<=4.3.3,>=4.0.0
  Downloading https://files.pythonhosted.org/packages/f9/54/5ca07ec9569d2f232f3166de5457b63943882f7950ddfcc887732fc7fb23/transformers-4.3.3-py3-none-any.whl (1.9MB)
     |████████████████████████████████| 1.9MB 36.3MB/s 
Collecting sentencepiece
  Downloading https://files.pythonhosted.org/packages/ac/aa/1437691b0c7c83086ebb79ce2da16e00bef024f24fec2a5161c35476f499/sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2MB)
     |████████████████████████████████| 1.2MB 35.3MB/s 
Collecting keras_bert>=0.86.0
  Downloading https://files.pythonhosted.org/packages/6a/e4/3b2e2927c15c22f44005cb0ab0eaf2f7e623ea2b6488e4b7c5aca6c162c2/keras-bert-0.88.0.tar.gz
Requirement already satisfied: networkx>=2.3 in /usr/local/lib/python3.7/dist-packages (from ktrain) (2.5.1)
Collecting whoosh
  Downloading https://files.pythonhosted.org/packages/ba/19/24d0f1f454a2c1eb689ca28d2f178db81e5024f42d82729a4ff6771155cf/Whoosh-2.7.4-py2.py3-none-any.whl (468kB)
     |████████████████████████████████| 471kB 43.0MB/s 
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.23.2->ktrain) (1.4.1)
Collecting threadpoolctl>=2.0.0
  Downloading https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.23.2->ktrain) (1.19.5)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (1.3.1)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (2.8.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (2.4.7)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.0.1->ktrain) (2018.9)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (2021.5.30)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (2.10)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (4.8.0)
Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (0.8.1)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (57.0.0)
Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (4.4.2)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (0.7.5)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (5.0.5)
Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (2.6.1)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (1.0.18)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from langdetect->ktrain) (1.15.0)
Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from syntok->ktrain) (2019.12.20)
Requirement already satisfied: Keras>=2.2.4 in /usr/local/lib/python3.7/dist-packages (from seqeval==0.0.19->ktrain) (2.4.3)
Collecting tokenizers<0.11,>=0.10.1
  Downloading https://files.pythonhosted.org/packages/d4/e2/df3543e8ffdab68f5acc73f613de9c2b155ac47f162e725dcac87c521c11/tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3MB)
     |████████████████████████████████| 3.3MB 35.8MB/s 
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)
     |████████████████████████████████| 901kB 34.3MB/s 
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (3.0.12)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (4.41.1)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (4.5.0)
Collecting keras-transformer>=0.39.0
  Downloading https://files.pythonhosted.org/packages/8a/35/6b079e920fe09a9349028bc2f209447e5636d90e29c5cf060bcc3177803a/keras-transformer-0.39.0.tar.gz
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect; sys_platform != "win32"->ipython->ktrain) (0.7.0)
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.7/dist-packages (from traitlets>=4.2->ipython->ktrain) (0.2.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython->ktrain) (0.2.5)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from Keras>=2.2.4->seqeval==0.0.19->ktrain) (3.13)
Requirement already satisfied: h5py in /usr/local/lib/python3.7/dist-packages (from Keras>=2.2.4->seqeval==0.0.19->ktrain) (3.1.0)
Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers<=4.3.3,>=4.0.0->ktrain) (7.1.2)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers<=4.3.3,>=4.0.0->ktrain) (3.4.1)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers<=4.3.3,>=4.0.0->ktrain) (3.7.4.3)
Collecting keras-pos-embd>=0.12.0
  Downloading https://files.pythonhosted.org/packages/d8/d2/1cc072ea68b573f366e08936177a33e237e66fa7d5338289d4bee64696cf/keras-pos-embd-0.12.0.tar.gz
Collecting keras-multi-head>=0.28.0
  Downloading https://files.pythonhosted.org/packages/a5/e6/a83f26b2e1582de237b125f595874d808e40698f31d44d5903e872d5b64d/keras-multi-head-0.28.0.tar.gz
Collecting keras-layer-normalization>=0.15.0
  Downloading https://files.pythonhosted.org/packages/33/e1/0da586d544a0940a56a2f4aa704b7dbd95eaa8ceda6168b48f5ac95e6608/keras-layer-normalization-0.15.0.tar.gz
Collecting keras-position-wise-feed-forward>=0.7.0
  Downloading https://files.pythonhosted.org/packages/58/02/cd3e7e51cf45d3825818384a2f7d9c340b60c9bf55a5682b7318e1c16eab/keras-position-wise-feed-forward-0.7.0.tar.gz
Collecting keras-embed-sim>=0.9.0
  Downloading https://files.pythonhosted.org/packages/2d/48/78f6d134f1ede597d91186819c9e428ada51cd8d9ea28e5faf37ed2ee602/keras-embed-sim-0.9.0.tar.gz
Requirement already satisfied: cached-property; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from h5py->Keras>=2.2.4->seqeval==0.0.19->ktrain) (1.5.2)
Collecting keras-self-attention>=0.50.0
  Downloading https://files.pythonhosted.org/packages/ea/75/e6bc5b43ee968fef714f2f10a2a1674639ec85d2428cc47b2fe1f9af0115/keras-self-attention-0.50.0.tar.gz
Building wheels for collected packages: ktrain, langdetect, syntok, seqeval, keras-bert, keras-transformer, keras-pos-embd, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-self-attention
  Building wheel for ktrain (setup.py) ... done
  Created wheel for ktrain: filename=ktrain-0.26.4-cp37-none-any.whl size=25282461 sha256=d036903fc25a735b0e7e28c2b3bd3612876c9da3ee259a24557a878f10093de0
  Stored in directory: /root/.cache/pip/wheels/95/cc/1b/84c3799864cd864b99d94a8aeac2ee2d928fb3d0ec3d3f1a13
  Building wheel for langdetect (setup.py) ... done
  Created wheel for langdetect: filename=langdetect-1.0.9-cp37-none-any.whl size=993242 sha256=fc6e1528fac16a994be2a7fd3095e55a1d093f9f0701e047bf69d94a2b37a2e7
  Stored in directory: /root/.cache/pip/wheels/7e/18/13/038c34057808931c7ddc6c92d3aa015cf1a498df5a70268996
  Building wheel for syntok (setup.py) ... done
  Created wheel for syntok: filename=syntok-1.3.1-cp37-none-any.whl size=20919 sha256=a2cfa0914239b8bb4e1d2337876abf310336491a8558942337d18c8a1fe26777
  Stored in directory: /root/.cache/pip/wheels/51/c6/a4/be1920586c49469846bcd2888200bdecfe109ec421dab9be2d
  Building wheel for seqeval (setup.py) ... done
  Created wheel for seqeval: filename=seqeval-0.0.19-cp37-none-any.whl size=9932 sha256=2634c75fce9148d9b9e645f003ebd9b285f7487d72aaaa575e5371fdfa6f4061
  Stored in directory: /root/.cache/pip/wheels/8d/1f/bf/1198beceed805a2099060975f6281d1b01046dd279e19c97be
  Building wheel for keras-bert (setup.py) ... done
  Created wheel for keras-bert: filename=keras_bert-0.88.0-cp37-none-any.whl size=34206 sha256=7ed781dec5bb24a6836bbd97b14aa96a29acacaafa1b79c2cce064cdaca1c088
  Stored in directory: /root/.cache/pip/wheels/7f/d8/86/b4d91b941f6f3256c487b258d5e4268a3301203b717dd11f11
  Building wheel for keras-transformer (setup.py) ... done
  Created wheel for keras-transformer: filename=keras_transformer-0.39.0-cp37-none-any.whl size=12841 sha256=09cbd4a73155b326fc2965fb9148fd6c15e272cc2385675f6f9d0b430b6ec763
  Stored in directory: /root/.cache/pip/wheels/77/42/35/d33c5907bca04ac5742e9eceefb644b680286de26728506a70
  Building wheel for keras-pos-embd (setup.py) ... done
  Created wheel for keras-pos-embd: filename=keras_pos_embd-0.12.0-cp37-none-any.whl size=7471 sha256=1c861957562f256a49916689365e921dd441dfa20ba90f32f0c7fd7094927540
  Stored in directory: /root/.cache/pip/wheels/36/d8/36/06ed09215806dca9ff504d8c0dda5da68d7f2c67d34a231d82
  Building wheel for keras-multi-head (setup.py) ... done
  Created wheel for keras-multi-head: filename=keras_multi_head-0.28.0-cp37-none-any.whl size=15559 sha256=3ae0255b9126e669ea82e5a23b05fc55bec8068c86093bf96dc3eb7edc417476
  Stored in directory: /root/.cache/pip/wheels/ec/92/bd/b3407bc29501f7e28eb970a6c425a9a375485c5d8197df6a8f
  Building wheel for keras-layer-normalization (setup.py) ... done
  Created wheel for keras-layer-normalization: filename=keras_layer_normalization-0.15.0-cp37-none-any.whl size=5224 sha256=99b091e7bf5e1d35a7634ff2db0a551b1f6923fc141cf42d9729db7fd9d734e1
  Stored in directory: /root/.cache/pip/wheels/de/ea/db/833c8a9b8326e703e9f8a78c0d4153294e6a1b1f97a1836397
  Building wheel for keras-position-wise-feed-forward (setup.py) ... done
  Created wheel for keras-position-wise-feed-forward: filename=keras_position_wise_feed_forward-0.7.0-cp37-none-any.whl size=5542 sha256=5e62642493714cf6a433f5d19991d1f20960f771c70ea8e779b46f9946894b6f
  Stored in directory: /root/.cache/pip/wheels/d2/d2/f6/58ce0aae0055dbccba8b40e62a6c22ab997105ad8c431a9e80
  Building wheel for keras-embed-sim (setup.py) ... done
  Created wheel for keras-embed-sim: filename=keras_embed_sim-0.9.0-cp37-none-any.whl size=4505 sha256=9bde778bd1e30b3f63600cffc9cd7215488673ee1c5f9e7cb5501f83ae5a2543
  Stored in directory: /root/.cache/pip/wheels/c1/d5/7d/bef5ee93c88bc6150294cc74cbb081647c505bf816918dd7ff
  Building wheel for keras-self-attention (setup.py) ... done
  Created wheel for keras-self-attention: filename=keras_self_attention-0.50.0-cp37-none-any.whl size=19416 sha256=49d3c7baa04f2e5aa129f72ee8c9963c1f0c001fb741c9cb799f0c1f57d6ee81
  Stored in directory: /root/.cache/pip/wheels/29/93/0d/891573db60f74d0e43bd7db1496c3ef898f8b5946a4c24cbda
Successfully built ktrain langdetect syntok seqeval keras-bert keras-transformer keras-pos-embd keras-multi-head keras-layer-normalization keras-position-wise-feed-forward keras-embed-sim keras-self-attention
Installing collected packages: threadpoolctl, scikit-learn, langdetect, cchardet, syntok, seqeval, tokenizers, sacremoses, transformers, sentencepiece, keras-pos-embd, keras-self-attention, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-transformer, keras-bert, whoosh, ktrain
  Found existing installation: scikit-learn 0.22.2.post1
    Uninstalling scikit-learn-0.22.2.post1:
      Successfully uninstalled scikit-learn-0.22.2.post1
Successfully installed cchardet-2.1.7 keras-bert-0.88.0 keras-embed-sim-0.9.0 keras-layer-normalization-0.15.0 keras-multi-head-0.28.0 keras-pos-embd-0.12.0 keras-position-wise-feed-forward-0.7.0 keras-self-attention-0.50.0 keras-transformer-0.39.0 ktrain-0.26.4 langdetect-1.0.9 sacremoses-0.0.45 scikit-learn-0.23.2 sentencepiece-0.1.96 seqeval-0.0.19 syntok-1.3.1 threadpoolctl-2.1.0 tokenizers-0.10.3 transformers-4.3.3 whoosh-2.7.4
In [ ]:
import tensorflow as tf
In [ ]:
tf.__version__
Out[ ]:
'2.5.0'
In [ ]:
import pandas as pd
import numpy as np
import ktrain
from ktrain import text
import tensorflow as tf
from sklearn.metrics import accuracy_score, confusion_matrix,classification_report,f1_score,roc_curve
In [ ]:
!pip install aicrowd-cli
Collecting aicrowd-cli
  Downloading https://files.pythonhosted.org/packages/1f/57/59b5a00c6e90c9cc028b3da9dff90e242ad2847e735b1a0e81a21c616e27/aicrowd_cli-0.1.7-py3-none-any.whl (49kB)
     |████████████████████████████████| 51kB 4.0MB/s 
Requirement already satisfied: click<8,>=7.1.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: toml<1,>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (0.10.2)
Collecting requests<3,>=2.25.1
  Downloading https://files.pythonhosted.org/packages/29/c1/24814557f1d22c56d50280771a17307e6bf87b70727d975fd6b2ce6b014a/requests-2.25.1-py2.py3-none-any.whl (61kB)
     |████████████████████████████████| 61kB 6.5MB/s 
Collecting rich<11,>=10.0.0
  Downloading https://files.pythonhosted.org/packages/69/a1/660d718e61d4c64fb8f1ef7b4aaf6db7a48a2b720cfac2991f06561d9a6c/rich-10.4.0-py3-none-any.whl (206kB)
     |████████████████████████████████| 215kB 12.3MB/s 
Collecting gitpython<4,>=3.1.12
  Downloading https://files.pythonhosted.org/packages/bc/91/b38c4fabb6e5092ab23492ded4f318ab7299b19263272b703478038c0fbc/GitPython-3.1.18-py3-none-any.whl (170kB)
     |████████████████████████████████| 174kB 13.6MB/s 
Collecting requests-toolbelt<1,>=0.9.1
  Downloading https://files.pythonhosted.org/packages/60/ef/7681134338fc097acef8d9b2f8abe0458e4d87559c689a8c306d0957ece5/requests_toolbelt-0.9.1-py2.py3-none-any.whl (54kB)
     |████████████████████████████████| 61kB 8.2MB/s 
Collecting tqdm<5,>=4.56.0
  Downloading https://files.pythonhosted.org/packages/b4/20/9f1e974bb4761128fc0d0a32813eaa92827309b1756c4b892d28adfb4415/tqdm-4.61.1-py2.py3-none-any.whl (75kB)
     |████████████████████████████████| 81kB 8.7MB/s 
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.24.3)
Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.5.30)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.10)
Collecting commonmark<0.10.0,>=0.9.0
  Downloading https://files.pythonhosted.org/packages/b1/92/dfd892312d822f36c55366118b95d914e5f16de11044a27cf10a7d71bbbf/commonmark-0.9.1-py2.py3-none-any.whl (51kB)
     |████████████████████████████████| 51kB 7.0MB/s 
Collecting colorama<0.5.0,>=0.4.0
  Downloading https://files.pythonhosted.org/packages/44/98/5b86278fbbf250d239ae0ecb724f8572af1c91f4a11edf4d36a206189440/colorama-0.4.4-py2.py3-none-any.whl
Requirement already satisfied: typing-extensions<4.0.0,>=3.7.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (3.7.4.3)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.6.1)
Collecting gitdb<5,>=4.0.1
  Downloading https://files.pythonhosted.org/packages/ea/e8/f414d1a4f0bbc668ed441f74f44c116d9816833a48bf81d22b697090dba8/gitdb-4.0.7-py3-none-any.whl (63kB)
     |████████████████████████████████| 71kB 8.7MB/s 
Collecting smmap<5,>=3.0.1
  Downloading https://files.pythonhosted.org/packages/68/ee/d540eb5e5996eb81c26ceffac6ee49041d473bc5125f2aa995cf51ec1cf1/smmap-4.0.0-py2.py3-none-any.whl
ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.25.1 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.
Installing collected packages: requests, commonmark, colorama, rich, smmap, gitdb, gitpython, requests-toolbelt, tqdm, aicrowd-cli
  Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Found existing installation: tqdm 4.41.1
    Uninstalling tqdm-4.41.1:
      Successfully uninstalled tqdm-4.41.1
Successfully installed aicrowd-cli-0.1.7 colorama-0.4.4 commonmark-0.9.1 gitdb-4.0.7 gitpython-3.1.18 requests-2.25.1 requests-toolbelt-0.9.1 rich-10.4.0 smmap-4.0.0 tqdm-4.61.1
In [ ]:
API_KEY = '' # Please get your your API Key from [https://www.aicrowd.com/participants/me]
!aicrowd login --api-key $API_KEY
API Key valid
Saved API Key successfully!
In [ ]:
# Downloading the Dataset
!mkdir data
!aicrowd dataset download --challenge emotion-detection -j 3 -o data
val.csv:   0% 0.00/262k [00:00<?, ?B/s]
val.csv: 100% 262k/262k [00:00<00:00, 3.31MB/s]

train.csv: 100% 2.30M/2.30M [00:00<00:00, 16.1MB/s]
test.csv: 100% 642k/642k [00:00<00:00, 6.35MB/s]
In [ ]:
train_dataset = pd.read_csv("data/train.csv")
validation_dataset = pd.read_csv("data/val.csv")
test_dataset = pd.read_csv("data/test.csv")
train_dataset.head()
Out[ ]:
text label
0 takes no time to copy/paste a press release 0
1 You're delusional 1
2 Jazz fan here. I completely feel. Lindsay Mann... 0
3 ah i was also confused but i think they mean f... 0
4 Thank you so much. ♥️ that means a lot. 0
In [ ]:
trainplusval=pd.concat([train_dataset,validation_dataset])
In [ ]:
word_count=test_dataset['text'].apply(lambda x:len(x.split()))
In [ ]:
max(word_count)
Out[ ]:
32

Balancing the dataset

In [ ]:

In [ ]:
text.print_text_classifiers()
fasttext: a fastText-like model [http://arxiv.org/pdf/1607.01759.pdf]
logreg: logistic regression using a trainable Embedding layer
nbsvm: NBSVM model [http://www.aclweb.org/anthology/P12-2018]
bigru: Bidirectional GRU with pretrained fasttext word vectors [https://fasttext.cc/docs/en/crawl-vectors.html]
standard_gru: simple 2-layer GRU with randomly initialized embeddings
bert: Bidirectional Encoder Representations from Transformers (BERT) from keras_bert [https://arxiv.org/abs/1810.04805]
distilbert: distilled, smaller, and faster BERT from Hugging Face transformers [https://arxiv.org/abs/1910.01108]
In [ ]:
(X_train, y_train), (X_test, y_test), preproc = text.texts_from_df(train_df=train_dataset,
                                                                   text_column ='text',
                                                                   label_columns ='label',
                                                                   val_df = validation_dataset,
                                                                   maxlen = 91,
                                                                   preprocess_mode = 'bert')
['not_label', 'label']
   not_label  label
0        1.0    0.0
1        0.0    1.0
2        1.0    0.0
3        1.0    0.0
4        1.0    0.0
['not_label', 'label']
   not_label  label
0        1.0    0.0
1        0.0    1.0
2        1.0    0.0
3        1.0    0.0
4        1.0    0.0
downloading pretrained BERT model (uncased_L-12_H-768_A-12.zip)...
[██████████████████████████████████████████████████]
extracting pretrained BERT model...
done.

cleanup downloaded zip...
done.

preprocessing train...
language: en
done.
Is Multi-Label? False
preprocessing test...
language: en
done.
In [ ]:
model = text.text_classifier(name = 'bert',
                             train_data = (X_train, y_train),
                             preproc = preproc)
Is Multi-Label? False
maxlen is 91
done.
In [ ]:
learner = ktrain.get_learner(model=model, train_data=(X_train, y_train),
                   val_data = (X_test, y_test),
                   batch_size = 6)
In [ ]:

In [ ]:
# find out best learning rate?
#learner.lr_find()
#learner.lr_plot()

# it may take days or many days to find out.
In [ ]:
def f1_scores(y_true, y_pred):
  return f1_score(y_true, y_pred)
In [ ]:
learner.fit_onecycle(lr = 3e-5, epochs = 3)

begin training using onecycle policy with max lr of 3e-05...
406/977 [===========>..................] - ETA: 7:38 - loss: 0.3931 - accuracy: 0.8214
In [ ]:
predictor = ktrain.get_predictor(learner.model, preproc)
In [ ]:
predictor.get_classes()
Out[ ]:
['not_label', 'label']
In [ ]:
def BestThreshold(y_test,pred_prob):
    fpr, tpr, thresholds = roc_curve(y_test, pred_prob)
    roc_auc_score_ = []
    for thres in thresholds:
        y_pred = np.where(pred_prob>thres,1,0)
        roc_auc_score_.append(f1_score(y_test, y_pred))

    roc_auc_score_ = pd.concat([pd.Series(thresholds), pd.Series(roc_auc_score_)],
                            axis=1)
    roc_auc_score_.columns = ['thresholds', 'f1_score']
    roc_auc_score_.sort_values(by='f1_score', ascending=False, inplace=True)
    return roc_auc_score_.head()
In [ ]:
pred_prob = predictor.predict_proba(validation_dataset['text'].values)[:,1]
In [ ]:
y_val=validation_dataset['label']
In [ ]:
topthres=BestThreshold(y_val,pred_prob)
topthres
Out[ ]:
thresholds f1_score
341 0.299314 0.664047
344 0.290478 0.663625
342 0.299148 0.663613
339 0.300950 0.663607
343 0.295589 0.663185
In [ ]:
y_pred = np.where(pred_prob>0.299314,1,0)
In [ ]:
#0.383457 0.855 only train 
#0.375026 0.850 only train
In [ ]:
# Getting F1 & Accuracy score of validation predictions
f1 = f1_score(y_val, y_pred)
accuracy = accuracy_score(y_val, y_pred)

print(f"Validation F1 Score  : {f1} and Accuracy Score {accuracy}")
Validation F1 Score  : 0.6636125654450261 and Accuracy Score 0.8520011517420097

Submitting Results 📄

Okay, this is the last section 😌 , let's get out testing results from the model real quick and submit our prediction directly using AIcrowd CLI

In [ ]:
# By settings is_train=False, the create_data function will only output the features as setuped in the function
pred_prob_test = predictor.predict_proba(test_dataset['text'].values)[:,1]
test_predictions=np.where(pred_prob_test>0.299314,1,0)
In [ ]:
len(test_predictions)
Out[ ]:
8682
In [ ]:
# Applying the predictions to the labels column of the sample submission 
test_dataset['label'] = test_predictions
test_dataset
Out[ ]:
text label
0 I was already over the edge with Cassie Zamora... 1
1 I think you're right. She has oodles of cash a... 0
2 Haha I love this. I used to give mine phone bo... 0
3 Probably out of desperation as they going no a... 0
4 Sorry !! You’re real good at that!! 0
... ... ...
8677 Yeah no...I would find it very demeaning 1
8678 This is how mafia works 0
8679 Ah thanks 👍🏻 0
8680 I ask them straight why they don't respect my ... 0
8681 Annette Acosta also tends to out vote Annette ... 0

8682 rows × 2 columns

Note : Please make sure that there should be filename submission.csv in assets folder before submitting it

In [ ]:
import os
In [ ]:
# Saving the sample submission in assets directory
!mkdir assets
test_dataset.to_csv(os.path.join("assets", "submission.csv"), index=False)

Uploading the Results

Note : Please save the notebook before submitting it (Ctrl + S)

In [ ]:
!aicrowd notebook submit -c emotion-detection -a assets --no-verify
Mounting Google Drive 💾
Your Google Drive will be mounted to access the colab notebook
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code

Enter your authorization code:
4/1AX4XfWjX1KJbVO-ezD88t1_IXx5DkN5kLrro5QM0kYudQ448vOySl4tJKck
Mounted at /content/drive
Using notebook: /content/drive/MyDrive/Colab Notebooks/Emotional_Classification_using_BERT.ipynb for submission...
Scrubbing API keys from the notebook...
Collecting notebook...
submission.zip ━━━━━━━━━━━━━━━━━━━━━━━━ 100.0%1.3/1.3 MB1.4 MB/s0:00:00
                                                  ╭─────────────────────────╮                                                  
                                                  │ Successfully submitted! │                                                  
                                                  ╰─────────────────────────╯                                                  
                                                        Important links                                                        
┌──────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection/submissions/148298              │
│                  │                                                                                                          │
│  All submissions │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection/submissions?my_submissions=true │
│                  │                                                                                                          │
│      Leaderboard │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection/leaderboards                    │
│                  │                                                                                                          │
│ Discussion forum │ https://discourse.aicrowd.com/c/ai-blitz-9                                                               │
│                  │                                                                                                          │
│   Challenge page │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection                                 │
└──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Congratulations 🎉 you did it, but there still a lot of improvement that can be made, data exploration is one of the most import pipelines in machine learning, especially in competitions, so maybe see if there is data imbalance (Have discussed in this notebook that how can you do that although i have commented that portion, you can try with different model), how minimize it's effects, maybe looking first few rows to each dataset. Or maybe improving the score, have fun!

In [ ]:


Comments

You must login before you can post a comment.

Execute