Loading

Emotion Detection

Solution for submission 147640

A detailed solution for submission 147640 submitted for challenge Emotion Detection

aman_patkar

Sentiment Classification Using BERT

We will use IMDB Movie Reviews Dataset

What is BERT

Bidirectional Encoder Representations from Transformers (BERT) is a technique for NLP (Natural Language Processing) pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google. Google is leveraging BERT to better understand user searches.

BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.

image.png

There are two existing strategies for applying pre-trained language representations to downstream tasks: feature-based and fine-tuning.

The feature-based approach, such as ELMo (Peters et al., 2018a), uses task-specific architectures that include the pre-trained representations as additional features. The fine-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pretrained parameters.

The two approaches share the same objective function during pre-training, where they use unidirectional language models to learn general language representations.

Why BERT

  • Accurate
  • Can be used for wide variety of task
  • Easy to use
  • It is game changer in NLP

Additional Reading

Video Lecture: BERT NLP Tutorial 1- Introduction | BERT Machine Learning | KGP Talkie

https://www.youtube.com/watch?v=h_U27jBNYI4

Ref BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

https://arxiv.org/abs/1810.04805

Understanding searches better than ever before:

https://www.blog.google/products/search/search-language-understanding-bert/

Good Resource to Read More About the BERT:

http://jalammar.github.io/illustrated-bert/


What is ktrain

ktrain is a library to help build, train, debug, and deploy neural networks in the deep learning software framework, Keras.

ktrain uses tf.keras in TensorFlow instead of standalone Keras.) Inspired by the fastai library, with only a few lines of code, ktrain allows you to easily:

  • estimate an optimal learning rate for your model given your data using a learning rate finder
  • employ learning rate schedules such as the triangular learning rate policy, 1cycle policy, and SGDR to more effectively train your model
  • employ fast and easy-to-use pre-canned models for both text classification (e.g., NBSVM, fastText, GRU with pretrained word embeddings) and image classification (e.g., ResNet, Wide Residual Networks, Inception)
  • load and preprocess text and image data from a variety of formats

  • inspect data points that were misclassified to help improve your model

  • leverage a simple prediction API for saving and deploying both models and data-preprocessing steps to make predictions on new raw data

Notebook Setup

In [1]:
!pip install ktrain
Collecting ktrain
  Downloading https://files.pythonhosted.org/packages/4c/88/10d29578f47d0d140bf669d5598e9f5a50465ddc423b32031c65e840d003/ktrain-0.26.3.tar.gz (25.3MB)
     |████████████████████████████████| 25.3MB 120kB/s 
Collecting scikit-learn==0.23.2
  Downloading https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl (6.8MB)
     |████████████████████████████████| 6.8MB 49.1MB/s 
Requirement already satisfied: matplotlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from ktrain) (3.2.2)
Requirement already satisfied: pandas>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from ktrain) (1.1.5)
Requirement already satisfied: fastprogress>=0.1.21 in /usr/local/lib/python3.7/dist-packages (from ktrain) (1.0.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from ktrain) (2.23.0)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from ktrain) (1.0.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from ktrain) (20.9)
Requirement already satisfied: ipython in /usr/local/lib/python3.7/dist-packages (from ktrain) (5.5.0)
Collecting langdetect
  Downloading https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz (981kB)
     |████████████████████████████████| 983kB 46.7MB/s 
Requirement already satisfied: jieba in /usr/local/lib/python3.7/dist-packages (from ktrain) (0.42.1)
Collecting cchardet
  Downloading https://files.pythonhosted.org/packages/80/72/a4fba7559978de00cf44081c548c5d294bf00ac7dcda2db405d2baa8c67a/cchardet-2.1.7-cp37-cp37m-manylinux2010_x86_64.whl (263kB)
     |████████████████████████████████| 266kB 43.7MB/s 
Collecting syntok
  Downloading https://files.pythonhosted.org/packages/8c/76/a49e73a04b3e3a14ce232e8e28a1587f8108baa665644fe8c40e307e792e/syntok-1.3.1.tar.gz
Collecting seqeval==0.0.19
  Downloading https://files.pythonhosted.org/packages/93/e5/b7705156a77f742cfe4fc6f22d0c71591edb2d243328dff2f8fc0f933ab6/seqeval-0.0.19.tar.gz
Collecting transformers<=4.3.3,>=4.0.0
  Downloading https://files.pythonhosted.org/packages/f9/54/5ca07ec9569d2f232f3166de5457b63943882f7950ddfcc887732fc7fb23/transformers-4.3.3-py3-none-any.whl (1.9MB)
     |████████████████████████████████| 1.9MB 42.4MB/s 
Collecting sentencepiece
  Downloading https://files.pythonhosted.org/packages/ac/aa/1437691b0c7c83086ebb79ce2da16e00bef024f24fec2a5161c35476f499/sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2MB)
     |████████████████████████████████| 1.2MB 42.8MB/s 
Collecting keras_bert>=0.86.0
  Downloading https://files.pythonhosted.org/packages/6a/e4/3b2e2927c15c22f44005cb0ab0eaf2f7e623ea2b6488e4b7c5aca6c162c2/keras-bert-0.88.0.tar.gz
Requirement already satisfied: networkx>=2.3 in /usr/local/lib/python3.7/dist-packages (from ktrain) (2.5.1)
Collecting whoosh
  Downloading https://files.pythonhosted.org/packages/ba/19/24d0f1f454a2c1eb689ca28d2f178db81e5024f42d82729a4ff6771155cf/Whoosh-2.7.4-py2.py3-none-any.whl (468kB)
     |████████████████████████████████| 471kB 38.3MB/s 
Collecting threadpoolctl>=2.0.0
  Downloading https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.23.2->ktrain) (1.4.1)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.23.2->ktrain) (1.19.5)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (0.10.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.0.1->ktrain) (2018.9)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (2021.5.30)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (1.24.3)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (0.7.5)
Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (4.4.2)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (4.8.0)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (1.0.18)
Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (0.8.1)
Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (2.6.1)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (5.0.5)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (57.0.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from langdetect->ktrain) (1.15.0)
Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from syntok->ktrain) (2019.12.20)
Requirement already satisfied: Keras>=2.2.4 in /usr/local/lib/python3.7/dist-packages (from seqeval==0.0.19->ktrain) (2.4.3)
Collecting tokenizers<0.11,>=0.10.1
  Downloading https://files.pythonhosted.org/packages/d4/e2/df3543e8ffdab68f5acc73f613de9c2b155ac47f162e725dcac87c521c11/tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3MB)
     |████████████████████████████████| 3.3MB 44.5MB/s 
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (4.5.0)
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)
     |████████████████████████████████| 901kB 45.5MB/s 
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (3.0.12)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (4.41.1)
Collecting keras-transformer>=0.39.0
  Downloading https://files.pythonhosted.org/packages/8a/35/6b079e920fe09a9349028bc2f209447e5636d90e29c5cf060bcc3177803a/keras-transformer-0.39.0.tar.gz
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect; sys_platform != "win32"->ipython->ktrain) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython->ktrain) (0.2.5)
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.7/dist-packages (from traitlets>=4.2->ipython->ktrain) (0.2.0)
Requirement already satisfied: h5py in /usr/local/lib/python3.7/dist-packages (from Keras>=2.2.4->seqeval==0.0.19->ktrain) (3.1.0)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from Keras>=2.2.4->seqeval==0.0.19->ktrain) (3.13)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers<=4.3.3,>=4.0.0->ktrain) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers<=4.3.3,>=4.0.0->ktrain) (3.4.1)
Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers<=4.3.3,>=4.0.0->ktrain) (7.1.2)
Collecting keras-pos-embd>=0.12.0
  Downloading https://files.pythonhosted.org/packages/d8/d2/1cc072ea68b573f366e08936177a33e237e66fa7d5338289d4bee64696cf/keras-pos-embd-0.12.0.tar.gz
Collecting keras-multi-head>=0.28.0
  Downloading https://files.pythonhosted.org/packages/a5/e6/a83f26b2e1582de237b125f595874d808e40698f31d44d5903e872d5b64d/keras-multi-head-0.28.0.tar.gz
Collecting keras-layer-normalization>=0.15.0
  Downloading https://files.pythonhosted.org/packages/33/e1/0da586d544a0940a56a2f4aa704b7dbd95eaa8ceda6168b48f5ac95e6608/keras-layer-normalization-0.15.0.tar.gz
Collecting keras-position-wise-feed-forward>=0.7.0
  Downloading https://files.pythonhosted.org/packages/58/02/cd3e7e51cf45d3825818384a2f7d9c340b60c9bf55a5682b7318e1c16eab/keras-position-wise-feed-forward-0.7.0.tar.gz
Collecting keras-embed-sim>=0.9.0
  Downloading https://files.pythonhosted.org/packages/2d/48/78f6d134f1ede597d91186819c9e428ada51cd8d9ea28e5faf37ed2ee602/keras-embed-sim-0.9.0.tar.gz
Requirement already satisfied: cached-property; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from h5py->Keras>=2.2.4->seqeval==0.0.19->ktrain) (1.5.2)
Collecting keras-self-attention>=0.50.0
  Downloading https://files.pythonhosted.org/packages/ea/75/e6bc5b43ee968fef714f2f10a2a1674639ec85d2428cc47b2fe1f9af0115/keras-self-attention-0.50.0.tar.gz
Building wheels for collected packages: ktrain, langdetect, syntok, seqeval, keras-bert, keras-transformer, keras-pos-embd, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-self-attention
  Building wheel for ktrain (setup.py) ... done
  Created wheel for ktrain: filename=ktrain-0.26.3-cp37-none-any.whl size=25282390 sha256=b8a36725c7833b1f9a1a24c5396ff38bafd726a70bdb7301c735d614bf57aa99
  Stored in directory: /root/.cache/pip/wheels/16/05/be/d6e659b3349016b1059e19fa028f165af4eeae2c196f329112
  Building wheel for langdetect (setup.py) ... done
  Created wheel for langdetect: filename=langdetect-1.0.9-cp37-none-any.whl size=993242 sha256=a26462a1c8d3363da857f9da11b931d8508789f4a9dfcbbd067c943058228294
  Stored in directory: /root/.cache/pip/wheels/7e/18/13/038c34057808931c7ddc6c92d3aa015cf1a498df5a70268996
  Building wheel for syntok (setup.py) ... done
  Created wheel for syntok: filename=syntok-1.3.1-cp37-none-any.whl size=20919 sha256=485dbaece988897af72fc943090bff95d85c42be6f103acd82f296607aeb6c3b
  Stored in directory: /root/.cache/pip/wheels/51/c6/a4/be1920586c49469846bcd2888200bdecfe109ec421dab9be2d
  Building wheel for seqeval (setup.py) ... done
  Created wheel for seqeval: filename=seqeval-0.0.19-cp37-none-any.whl size=9932 sha256=fcb51a117dcc8d5c26284a16d1f2218c64d04856eb4961be483679a094bc70c9
  Stored in directory: /root/.cache/pip/wheels/8d/1f/bf/1198beceed805a2099060975f6281d1b01046dd279e19c97be
  Building wheel for keras-bert (setup.py) ... done
  Created wheel for keras-bert: filename=keras_bert-0.88.0-cp37-none-any.whl size=34206 sha256=307e789c0ff81f1ff211e8cd829c37f26c0bd8efd2038084af78a653678f563b
  Stored in directory: /root/.cache/pip/wheels/7f/d8/86/b4d91b941f6f3256c487b258d5e4268a3301203b717dd11f11
  Building wheel for keras-transformer (setup.py) ... done
  Created wheel for keras-transformer: filename=keras_transformer-0.39.0-cp37-none-any.whl size=12841 sha256=6be6bc4150c9e9f608546cc3d51fa64a58f851b7890323d14985d55b02aff500
  Stored in directory: /root/.cache/pip/wheels/77/42/35/d33c5907bca04ac5742e9eceefb644b680286de26728506a70
  Building wheel for keras-pos-embd (setup.py) ... done
  Created wheel for keras-pos-embd: filename=keras_pos_embd-0.12.0-cp37-none-any.whl size=7471 sha256=e2625e02a4b7f12187f81549f9dacc9566c3030dc8f8a98d460b85875667502f
  Stored in directory: /root/.cache/pip/wheels/36/d8/36/06ed09215806dca9ff504d8c0dda5da68d7f2c67d34a231d82
  Building wheel for keras-multi-head (setup.py) ... done
  Created wheel for keras-multi-head: filename=keras_multi_head-0.28.0-cp37-none-any.whl size=15559 sha256=40a251d93c7fe954f6ed1caf750cbcaf2045d16f0d11aeda819dc85a494ab16d
  Stored in directory: /root/.cache/pip/wheels/ec/92/bd/b3407bc29501f7e28eb970a6c425a9a375485c5d8197df6a8f
  Building wheel for keras-layer-normalization (setup.py) ... done
  Created wheel for keras-layer-normalization: filename=keras_layer_normalization-0.15.0-cp37-none-any.whl size=5224 sha256=5fa1eb91e880a2d58d49903bb5334940d6a01521cf88da0ec9b3c9f252d39ae0
  Stored in directory: /root/.cache/pip/wheels/de/ea/db/833c8a9b8326e703e9f8a78c0d4153294e6a1b1f97a1836397
  Building wheel for keras-position-wise-feed-forward (setup.py) ... done
  Created wheel for keras-position-wise-feed-forward: filename=keras_position_wise_feed_forward-0.7.0-cp37-none-any.whl size=5542 sha256=cc67bc2605c6a48cfba97782e9226610b8857a9fd373efdf88a52f8791fc08e8
  Stored in directory: /root/.cache/pip/wheels/d2/d2/f6/58ce0aae0055dbccba8b40e62a6c22ab997105ad8c431a9e80
  Building wheel for keras-embed-sim (setup.py) ... done
  Created wheel for keras-embed-sim: filename=keras_embed_sim-0.9.0-cp37-none-any.whl size=4505 sha256=1b085466dd79d8f68f2dd10db2b51a4585f78bc1ae1de07197a5b361974dd287
  Stored in directory: /root/.cache/pip/wheels/c1/d5/7d/bef5ee93c88bc6150294cc74cbb081647c505bf816918dd7ff
  Building wheel for keras-self-attention (setup.py) ... done
  Created wheel for keras-self-attention: filename=keras_self_attention-0.50.0-cp37-none-any.whl size=19416 sha256=f49a129018174fd9ad6b16e7a28be15646c238c813486829a83bc054005c3463
  Stored in directory: /root/.cache/pip/wheels/29/93/0d/891573db60f74d0e43bd7db1496c3ef898f8b5946a4c24cbda
Successfully built ktrain langdetect syntok seqeval keras-bert keras-transformer keras-pos-embd keras-multi-head keras-layer-normalization keras-position-wise-feed-forward keras-embed-sim keras-self-attention
Installing collected packages: threadpoolctl, scikit-learn, langdetect, cchardet, syntok, seqeval, tokenizers, sacremoses, transformers, sentencepiece, keras-pos-embd, keras-self-attention, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-transformer, keras-bert, whoosh, ktrain
  Found existing installation: scikit-learn 0.22.2.post1
    Uninstalling scikit-learn-0.22.2.post1:
      Successfully uninstalled scikit-learn-0.22.2.post1
Successfully installed cchardet-2.1.7 keras-bert-0.88.0 keras-embed-sim-0.9.0 keras-layer-normalization-0.15.0 keras-multi-head-0.28.0 keras-pos-embd-0.12.0 keras-position-wise-feed-forward-0.7.0 keras-self-attention-0.50.0 keras-transformer-0.39.0 ktrain-0.26.3 langdetect-1.0.9 sacremoses-0.0.45 scikit-learn-0.23.2 sentencepiece-0.1.96 seqeval-0.0.19 syntok-1.3.1 threadpoolctl-2.1.0 tokenizers-0.10.3 transformers-4.3.3 whoosh-2.7.4
In [2]:
import tensorflow as tf
In [3]:
tf.__version__
Out[3]:
'2.5.0'
In [5]:
import pandas as pd
import numpy as np
import ktrain
from ktrain import text
import tensorflow as tf
In [6]:
!pip install aicrowd-cli
Collecting aicrowd-cli
  Downloading https://files.pythonhosted.org/packages/1f/57/59b5a00c6e90c9cc028b3da9dff90e242ad2847e735b1a0e81a21c616e27/aicrowd_cli-0.1.7-py3-none-any.whl (49kB)
     |████████████████████████████████| 51kB 3.4MB/s 
Collecting rich<11,>=10.0.0
  Downloading https://files.pythonhosted.org/packages/69/a1/660d718e61d4c64fb8f1ef7b4aaf6db7a48a2b720cfac2991f06561d9a6c/rich-10.4.0-py3-none-any.whl (206kB)
     |████████████████████████████████| 215kB 7.5MB/s 
Collecting gitpython<4,>=3.1.12
  Downloading https://files.pythonhosted.org/packages/bc/91/b38c4fabb6e5092ab23492ded4f318ab7299b19263272b703478038c0fbc/GitPython-3.1.18-py3-none-any.whl (170kB)
     |████████████████████████████████| 174kB 9.2MB/s 
Collecting requests-toolbelt<1,>=0.9.1
  Downloading https://files.pythonhosted.org/packages/60/ef/7681134338fc097acef8d9b2f8abe0458e4d87559c689a8c306d0957ece5/requests_toolbelt-0.9.1-py2.py3-none-any.whl (54kB)
     |████████████████████████████████| 61kB 6.6MB/s 
Collecting tqdm<5,>=4.56.0
  Downloading https://files.pythonhosted.org/packages/b4/20/9f1e974bb4761128fc0d0a32813eaa92827309b1756c4b892d28adfb4415/tqdm-4.61.1-py2.py3-none-any.whl (75kB)
     |████████████████████████████████| 81kB 6.9MB/s 
Requirement already satisfied: toml<1,>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (0.10.2)
Collecting requests<3,>=2.25.1
  Downloading https://files.pythonhosted.org/packages/29/c1/24814557f1d22c56d50280771a17307e6bf87b70727d975fd6b2ce6b014a/requests-2.25.1-py2.py3-none-any.whl (61kB)
     |████████████████████████████████| 61kB 5.8MB/s 
Requirement already satisfied: click<8,>=7.1.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: typing-extensions<4.0.0,>=3.7.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (3.7.4.3)
Collecting commonmark<0.10.0,>=0.9.0
  Downloading https://files.pythonhosted.org/packages/b1/92/dfd892312d822f36c55366118b95d914e5f16de11044a27cf10a7d71bbbf/commonmark-0.9.1-py2.py3-none-any.whl (51kB)
     |████████████████████████████████| 51kB 5.3MB/s 
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.6.1)
Collecting colorama<0.5.0,>=0.4.0
  Downloading https://files.pythonhosted.org/packages/44/98/5b86278fbbf250d239ae0ecb724f8572af1c91f4a11edf4d36a206189440/colorama-0.4.4-py2.py3-none-any.whl
Collecting gitdb<5,>=4.0.1
  Downloading https://files.pythonhosted.org/packages/ea/e8/f414d1a4f0bbc668ed441f74f44c116d9816833a48bf81d22b697090dba8/gitdb-4.0.7-py3-none-any.whl (63kB)
     |████████████████████████████████| 71kB 6.5MB/s 
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.24.3)
Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.5.30)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.10)
Collecting smmap<5,>=3.0.1
  Downloading https://files.pythonhosted.org/packages/68/ee/d540eb5e5996eb81c26ceffac6ee49041d473bc5125f2aa995cf51ec1cf1/smmap-4.0.0-py2.py3-none-any.whl
ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.25.1 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.
Installing collected packages: commonmark, colorama, rich, smmap, gitdb, gitpython, requests, requests-toolbelt, tqdm, aicrowd-cli
  Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Found existing installation: tqdm 4.41.1
    Uninstalling tqdm-4.41.1:
      Successfully uninstalled tqdm-4.41.1
Successfully installed aicrowd-cli-0.1.7 colorama-0.4.4 commonmark-0.9.1 gitdb-4.0.7 gitpython-3.1.18 requests-2.25.1 requests-toolbelt-0.9.1 rich-10.4.0 smmap-4.0.0 tqdm-4.61.1
In [7]:

API Key valid
Saved API Key successfully!
In [8]:
# Downloading the Dataset
!mkdir data
train.csv:   0% 0.00/2.30M [00:00<?, ?B/s]
test.csv:   0% 0.00/642k [00:00<?, ?B/s]

val.csv:   0% 0.00/262k [00:00<?, ?B/s]

val.csv: 100% 262k/262k [00:00<00:00, 1.63MB/s]

test.csv: 100% 642k/642k [00:00<00:00, 2.81MB/s]
train.csv: 100% 2.30M/2.30M [00:00<00:00, 6.75MB/s]
In [9]:
train_dataset = pd.read_csv("data/train.csv")
validation_dataset = pd.read_csv("data/val.csv")
test_dataset = pd.read_csv("data/test.csv")
train_dataset.head()
Out[9]:
text label
0 takes no time to copy/paste a press release 0
1 You're delusional 1
2 Jazz fan here. I completely feel. Lindsay Mann... 0
3 ah i was also confused but i think they mean f... 0
4 Thank you so much. ♥️ that means a lot. 0
In [ ]:
data_train = pd.read_excel('/content/IMDB-Movie-Reviews-Large-Dataset-50k/train.xlsx', dtype = str)
In [ ]:
data_test = pd.read_excel('/content/IMDB-Movie-Reviews-Large-Dataset-50k/test.xlsx', dtype = str)
In [ ]:
data_train.tail()
Out[ ]:
Reviews Sentiment
24995 Everyone plays their part pretty well in this ... pos
24996 It happened with Assault on Prescient 13 in 20... neg
24997 My God. This movie was awful. I can't complain... neg
24998 When I first popped in Happy Birthday to Me, I... neg
24999 So why does this show suck? Unfortunately, tha... neg
In [ ]:
data_test.head()
Out[ ]:
Reviews Sentiment
0 Who would have thought that a movie about a ma... pos
1 After realizing what is going on around us ...... pos
2 I grew up watching the original Disney Cindere... neg
3 David Mamet wrote the screenplay and made his ... pos
4 Admittedly, I didn't have high expectations of... neg
In [ ]:
data_test.shape, data_train.shape
Out[ ]:
((25000, 2), (25000, 2))
In [ ]:

In [12]:
(X_train, y_train), (X_test, y_test), preproc = text.texts_from_df(train_df=train_dataset,
                                                                   text_column = 'text',
                                                                   label_columns = 'label',
                                                                   val_df = validation_dataset,
                                                                   maxlen = 40,
                                                                   preprocess_mode = 'bert')
['not_label', 'label']
   not_label  label
0        1.0    0.0
1        0.0    1.0
2        1.0    0.0
3        1.0    0.0
4        1.0    0.0
['not_label', 'label']
   not_label  label
0        1.0    0.0
1        0.0    1.0
2        1.0    0.0
3        1.0    0.0
4        1.0    0.0
downloading pretrained BERT model (uncased_L-12_H-768_A-12.zip)...
[██████████████████████████████████████████████████]
extracting pretrained BERT model...
done.

cleanup downloaded zip...
done.

preprocessing train...
language: en
done.
Is Multi-Label? False
preprocessing test...
language: en
done.
In [13]:
model = text.text_classifier(name = 'bert',
                             train_data = (X_train, y_train),
                             preproc = preproc)
Is Multi-Label? False
maxlen is 40
done.
In [14]:
learner = ktrain.get_learner(model=model, train_data=(X_train, y_train),
                   val_data = (X_test, y_test),
                   batch_size = 6)
In [ ]:
# find out best learning rate?
# learner.lr_find()
# learner.lr_plot()

# it may take days or many days to find out.
In [15]:
learner.fit_onecycle(lr = 2e-5, epochs = 1)

begin training using onecycle policy with max lr of 2e-05...
5210/5210 [==============================] - 1744s 331ms/step - loss: 0.3377 - accuracy: 0.8501 - val_loss: 0.2919 - val_accuracy: 0.8707
Out[15]:
<tensorflow.python.keras.callbacks.History at 0x7f92c68b6a10>
In [16]:
predictor = ktrain.get_predictor(learner.model, preproc)
In [ ]:
#data = ['this movie was horrible, the plot was really boring. acting was okay',
#        'the fild is really sucked. there is not plot and acting was bad',
#        'what a beautiful movie. great plot. acting was good. will see it again']
In [24]:
test_dataset['text'].values
Out[24]:
array(['I was already over the edge with Cassie Zamora. Just showing my disdain for two terrible individuals.',
       "I think you're right. She has oodles of cash and young grandchildren to enjoy. Going through that hideous gauntlet again probably isn't that appealing.",
       'Haha I love this. I used to give mine phone books and my room would look just like this in a matter of hours. Crazy.',
       ..., 'Ah thanks 👍🏻',
       "I ask them straight why they don't respect my decision. That usually fixes it.",
       'Annette Acosta also tends to out vote Annette Acosta in polls in my experience'],
      dtype=object)
In [25]:
#predictor.predict(data)
pred=predictor.predict(test_dataset['text'].values)
In [31]:
prediction=[]
for preds in pred:
  if preds=='label':
    prediction.append(1)
  else:
    prediction.append(0)
In [ ]:
predictor.predict(data, return_proba=True)
Out[ ]:
array([[0.99797565, 0.00202436],
       [0.99606663, 0.00393336],
       [0.00292433, 0.9970757 ]], dtype=float32)
In [33]:
predictor.get_classes()
Out[33]:
['not_label', 'label']
In [ ]:
#predictor.save('/content/bert')
In [ ]:
#!zip -r /content/bert.zip /content/bert
  adding: content/bert/ (stored 0%)
  adding: content/bert/tf_model.h5 (deflated 11%)
  adding: content/bert/tf_model.preproc (deflated 52%)
In [ ]:
#predictor_load = ktrain.load_predictor('/content/bert')
In [ ]:
#predictor_load.get_classes()
Out[ ]:
['neg', 'pos']
In [ ]:
#predictor_load.predict(data)
Out[ ]:
['neg', 'neg', 'pos']
In [40]:
def BestThreshold(y_test,pred_prob):
    fpr, tpr, thresholds = roc_curve(y_test, pred_prob)
    roc_auc_score_ = []
    for thres in thresholds:
        y_pred = np.where(pred_prob>thres,1,0)
        roc_auc_score_.append(f1_score(y_test, y_pred))

    roc_auc_score_ = pd.concat([pd.Series(thresholds), pd.Series(roc_auc_score_)],
                            axis=1)
    roc_auc_score_.columns = ['thresholds', 'f1_score']
    roc_auc_score_.sort_values(by='f1_score', ascending=False, inplace=True)
    return roc_auc_score_.head()
In [43]:
pred_prob = predictor.predict_proba(validation_dataset['text'].values)[:,1]
In [44]:
from sklearn.metrics import accuracy_score, confusion_matrix,classification_report,f1_score,roc_curve
In [45]:
y_val=validation_dataset['label']
In [46]:
topthres=BestThreshold(y_val,pred_prob)
topthres
Out[46]:
thresholds f1_score
299 0.383457 0.681698
301 0.380097 0.681668
300 0.382613 0.681246
302 0.379608 0.681217
303 0.377057 0.681188
In [47]:
y_pred = np.where(pred_prob>0.383457,1,0)
In [48]:
# Getting F1 & Accuracy score of validation predictions
f1 = f1_score(y_val, y_pred)
accuracy = accuracy_score(y_val, y_pred)

print(f"Validation F1 Score  : {f1} and Accuracy Score {accuracy}")
Validation F1 Score  : 0.681245858184228 and Accuracy Score 0.8615030233227757

Submitting Results 📄

Okay, this is the last section 😌 , let's get out testing results from the model real quick and submit our prediction directly using AIcrowd CLI

In [49]:
# By settings is_train=False, the create_data function will only output the features as setuped in the function
pred_prob_test = predictor.predict_proba(test_dataset['text'].values)[:,1]
#test_predictions = clf.predict(test_data)
test_predictions=np.where(pred_prob_test>0.383457,1,0)
In [50]:
len(test_predictions)
Out[50]:
8682
In [51]:
# Applying the predictions to the labels column of the sample submission 
test_dataset['label'] = test_predictions
test_dataset
Out[51]:
text label
0 I was already over the edge with Cassie Zamora... 1
1 I think you're right. She has oodles of cash a... 0
2 Haha I love this. I used to give mine phone bo... 0
3 Probably out of desperation as they going no a... 0
4 Sorry !! You’re real good at that!! 0
... ... ...
8677 Yeah no...I would find it very demeaning 1
8678 This is how mafia works 0
8679 Ah thanks 👍🏻 0
8680 I ask them straight why they don't respect my ... 0
8681 Annette Acosta also tends to out vote Annette ... 0

8682 rows × 2 columns

Note : Please make sure that there should be filename submission.csv in assets folder before submitting it

In [ ]:
import os
In [38]:
# Saving the sample submission in assets directory
!mkdir assets
test_dataset.to_csv(os.path.join("assets", "submission.csv"), index=False)
mkdir: cannot create directory ‘assets’: File exists

Uploading the Results

Note : Please save the notebook before submitting it (Ctrl + S)

In [39]:

Mounting Google Drive 💾
Your Google Drive will be mounted to access the colab notebook
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code

Enter your authorization code:
4/1AY0e-g7b_SL0N99TbwUG4ZYg-4ySeGcVMfbQhps2KUJ6zT2xJmNt8-Yowmo
Mounted at /content/drive
Using notebook: /content/drive/MyDrive/Colab Notebooks/Sentiment_Classification_using_BERT.ipynb for submission...
Scrubbing API keys from the notebook...
Collecting notebook...
submission.zip ━━━━━━━━━━━━━━━━━━━━━━━━ 100.0%1.3/1.3 MB1.7 MB/s0:00:00
                                                  ╭─────────────────────────╮                                                  
                                                  │ Successfully submitted! │                                                  
                                                  ╰─────────────────────────╯                                                  
                                                        Important links                                                        
┌──────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection/submissions/147639              │
│                  │                                                                                                          │
│  All submissions │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection/submissions?my_submissions=true │
│                  │                                                                                                          │
│      Leaderboard │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection/leaderboards                    │
│                  │                                                                                                          │
│ Discussion forum │ https://discourse.aicrowd.com/c/ai-blitz-9                                                               │
│                  │                                                                                                          │
│   Challenge page │ https://www.aicrowd.com/challenges/ai-blitz-9/problems/emotion-detection                                 │
└──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Congratulations 🎉 you did it, but there still a lot of improvement that can be made, data exploration is one of the most import pipelines in machine learning, especially in competitions, so maybe see if there is data imbalance (Have discussed in this notebook that how can you do that although i have commented that portion, you can try with different model), how minimize it's effects, maybe looking first few rows to each dataset. Or maybe improving the score, have fun!

In [ ]:

In [ ]:

In [ ]:


Comments

You must login before you can post a comment.

Execute