Loading

Research Paper Classification

Solution for submission 147910

A detailed solution for submission 147910 submitted for challenge Research Paper Classification

aman_patkar

Setup AIcrowd Utilities 🛠

Here we are installing AIcrowd CLI to download the the challange dataset

In [1]:
!nvidia-smi
Tue Jun 22 15:26:45 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
In [2]:
!pip install -q -U aicrowd-cli
     |████████████████████████████████| 51kB 8.1MB/s 
     |████████████████████████████████| 61kB 10.3MB/s 
     |████████████████████████████████| 81kB 11.2MB/s 
     |████████████████████████████████| 61kB 9.7MB/s 
     |████████████████████████████████| 215kB 51.0MB/s 
     |████████████████████████████████| 174kB 54.0MB/s 
     |████████████████████████████████| 51kB 8.9MB/s 
     |████████████████████████████████| 71kB 11.7MB/s 
ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.25.1 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.

Downloading Dataset

So first, as in the previous challenge, we will first need to download the python library by AIcrowd that will allow us to download the dataset by just inputting the API key.

In [3]:

API Key valid
Saved API Key successfully!
In [4]:
!pip install ktrain
Collecting ktrain
  Downloading https://files.pythonhosted.org/packages/4c/88/10d29578f47d0d140bf669d5598e9f5a50465ddc423b32031c65e840d003/ktrain-0.26.3.tar.gz (25.3MB)
     |████████████████████████████████| 25.3MB 121kB/s 
Collecting scikit-learn==0.23.2
  Downloading https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl (6.8MB)
     |████████████████████████████████| 6.8MB 41.7MB/s 
Requirement already satisfied: matplotlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from ktrain) (3.2.2)
Requirement already satisfied: pandas>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from ktrain) (1.1.5)
Requirement already satisfied: fastprogress>=0.1.21 in /usr/local/lib/python3.7/dist-packages (from ktrain) (1.0.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from ktrain) (2.25.1)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from ktrain) (1.0.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from ktrain) (20.9)
Requirement already satisfied: ipython in /usr/local/lib/python3.7/dist-packages (from ktrain) (5.5.0)
Collecting langdetect
  Downloading https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz (981kB)
     |████████████████████████████████| 983kB 43.6MB/s 
Requirement already satisfied: jieba in /usr/local/lib/python3.7/dist-packages (from ktrain) (0.42.1)
Collecting cchardet
  Downloading https://files.pythonhosted.org/packages/80/72/a4fba7559978de00cf44081c548c5d294bf00ac7dcda2db405d2baa8c67a/cchardet-2.1.7-cp37-cp37m-manylinux2010_x86_64.whl (263kB)
     |████████████████████████████████| 266kB 48.4MB/s 
Collecting syntok
  Downloading https://files.pythonhosted.org/packages/8c/76/a49e73a04b3e3a14ce232e8e28a1587f8108baa665644fe8c40e307e792e/syntok-1.3.1.tar.gz
Collecting seqeval==0.0.19
  Downloading https://files.pythonhosted.org/packages/93/e5/b7705156a77f742cfe4fc6f22d0c71591edb2d243328dff2f8fc0f933ab6/seqeval-0.0.19.tar.gz
Collecting transformers<=4.3.3,>=4.0.0
  Downloading https://files.pythonhosted.org/packages/f9/54/5ca07ec9569d2f232f3166de5457b63943882f7950ddfcc887732fc7fb23/transformers-4.3.3-py3-none-any.whl (1.9MB)
     |████████████████████████████████| 1.9MB 52.1MB/s 
Collecting sentencepiece
  Downloading https://files.pythonhosted.org/packages/ac/aa/1437691b0c7c83086ebb79ce2da16e00bef024f24fec2a5161c35476f499/sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2MB)
     |████████████████████████████████| 1.2MB 55.5MB/s 
Collecting keras_bert>=0.86.0
  Downloading https://files.pythonhosted.org/packages/6a/e4/3b2e2927c15c22f44005cb0ab0eaf2f7e623ea2b6488e4b7c5aca6c162c2/keras-bert-0.88.0.tar.gz
Requirement already satisfied: networkx>=2.3 in /usr/local/lib/python3.7/dist-packages (from ktrain) (2.5.1)
Collecting whoosh
  Downloading https://files.pythonhosted.org/packages/ba/19/24d0f1f454a2c1eb689ca28d2f178db81e5024f42d82729a4ff6771155cf/Whoosh-2.7.4-py2.py3-none-any.whl (468kB)
     |████████████████████████████████| 471kB 48.7MB/s 
Collecting threadpoolctl>=2.0.0
  Downloading https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.23.2->ktrain) (1.4.1)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.23.2->ktrain) (1.19.5)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (2.8.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.0->ktrain) (1.3.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.0.1->ktrain) (2018.9)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (2021.5.30)
Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->ktrain) (3.0.4)
Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (2.6.1)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (57.0.0)
Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (4.4.2)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (5.0.5)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (0.7.5)
Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (0.8.1)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (1.0.18)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/local/lib/python3.7/dist-packages (from ipython->ktrain) (4.8.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from langdetect->ktrain) (1.15.0)
Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from syntok->ktrain) (2019.12.20)
Requirement already satisfied: Keras>=2.2.4 in /usr/local/lib/python3.7/dist-packages (from seqeval==0.0.19->ktrain) (2.4.3)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (4.61.1)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (4.5.0)
Collecting tokenizers<0.11,>=0.10.1
  Downloading https://files.pythonhosted.org/packages/d4/e2/df3543e8ffdab68f5acc73f613de9c2b155ac47f162e725dcac87c521c11/tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3MB)
     |████████████████████████████████| 3.3MB 55.8MB/s 
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (3.0.12)
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)
     |████████████████████████████████| 901kB 37.4MB/s 
Collecting keras-transformer>=0.39.0
  Downloading https://files.pythonhosted.org/packages/8a/35/6b079e920fe09a9349028bc2f209447e5636d90e29c5cf060bcc3177803a/keras-transformer-0.39.0.tar.gz
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.7/dist-packages (from traitlets>=4.2->ipython->ktrain) (0.2.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython->ktrain) (0.2.5)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect; sys_platform != "win32"->ipython->ktrain) (0.7.0)
Requirement already satisfied: h5py in /usr/local/lib/python3.7/dist-packages (from Keras>=2.2.4->seqeval==0.0.19->ktrain) (3.1.0)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from Keras>=2.2.4->seqeval==0.0.19->ktrain) (3.13)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers<=4.3.3,>=4.0.0->ktrain) (3.4.1)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers<=4.3.3,>=4.0.0->ktrain) (3.7.4.3)
Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers<=4.3.3,>=4.0.0->ktrain) (7.1.2)
Collecting keras-pos-embd>=0.12.0
  Downloading https://files.pythonhosted.org/packages/d8/d2/1cc072ea68b573f366e08936177a33e237e66fa7d5338289d4bee64696cf/keras-pos-embd-0.12.0.tar.gz
Collecting keras-multi-head>=0.28.0
  Downloading https://files.pythonhosted.org/packages/a5/e6/a83f26b2e1582de237b125f595874d808e40698f31d44d5903e872d5b64d/keras-multi-head-0.28.0.tar.gz
Collecting keras-layer-normalization>=0.15.0
  Downloading https://files.pythonhosted.org/packages/33/e1/0da586d544a0940a56a2f4aa704b7dbd95eaa8ceda6168b48f5ac95e6608/keras-layer-normalization-0.15.0.tar.gz
Collecting keras-position-wise-feed-forward>=0.7.0
  Downloading https://files.pythonhosted.org/packages/58/02/cd3e7e51cf45d3825818384a2f7d9c340b60c9bf55a5682b7318e1c16eab/keras-position-wise-feed-forward-0.7.0.tar.gz
Collecting keras-embed-sim>=0.9.0
  Downloading https://files.pythonhosted.org/packages/2d/48/78f6d134f1ede597d91186819c9e428ada51cd8d9ea28e5faf37ed2ee602/keras-embed-sim-0.9.0.tar.gz
Requirement already satisfied: cached-property; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from h5py->Keras>=2.2.4->seqeval==0.0.19->ktrain) (1.5.2)
Collecting keras-self-attention>=0.50.0
  Downloading https://files.pythonhosted.org/packages/ea/75/e6bc5b43ee968fef714f2f10a2a1674639ec85d2428cc47b2fe1f9af0115/keras-self-attention-0.50.0.tar.gz
Building wheels for collected packages: ktrain, langdetect, syntok, seqeval, keras-bert, keras-transformer, keras-pos-embd, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-self-attention
  Building wheel for ktrain (setup.py) ... done
  Created wheel for ktrain: filename=ktrain-0.26.3-cp37-none-any.whl size=25282390 sha256=db9c40e5eeb81b5cdeac83a185dda1692e6acc2bcc1f99a7b19da68ecb1bbbd3
  Stored in directory: /root/.cache/pip/wheels/16/05/be/d6e659b3349016b1059e19fa028f165af4eeae2c196f329112
  Building wheel for langdetect (setup.py) ... done
  Created wheel for langdetect: filename=langdetect-1.0.9-cp37-none-any.whl size=993242 sha256=1301ee25c233af2664f367b17e615a2995805c33f52a651252efb32a98507cfc
  Stored in directory: /root/.cache/pip/wheels/7e/18/13/038c34057808931c7ddc6c92d3aa015cf1a498df5a70268996
  Building wheel for syntok (setup.py) ... done
  Created wheel for syntok: filename=syntok-1.3.1-cp37-none-any.whl size=20919 sha256=b9231a2fa3b176913f4b3d839ac330c88af1fb5ce054b5813cda91b63b1e23ce
  Stored in directory: /root/.cache/pip/wheels/51/c6/a4/be1920586c49469846bcd2888200bdecfe109ec421dab9be2d
  Building wheel for seqeval (setup.py) ... done
  Created wheel for seqeval: filename=seqeval-0.0.19-cp37-none-any.whl size=9932 sha256=924d70a54744311031a5c4111d96f6fde247674cc136dff31ecf9d9a7b52892d
  Stored in directory: /root/.cache/pip/wheels/8d/1f/bf/1198beceed805a2099060975f6281d1b01046dd279e19c97be
  Building wheel for keras-bert (setup.py) ... done
  Created wheel for keras-bert: filename=keras_bert-0.88.0-cp37-none-any.whl size=34206 sha256=5a7b7cf104d2bd2b13aff79f062deaeb6c6b095f58485ddf544383447ed39121
  Stored in directory: /root/.cache/pip/wheels/7f/d8/86/b4d91b941f6f3256c487b258d5e4268a3301203b717dd11f11
  Building wheel for keras-transformer (setup.py) ... done
  Created wheel for keras-transformer: filename=keras_transformer-0.39.0-cp37-none-any.whl size=12841 sha256=8650b553d3499a26ea9af41dc020b4a71479eccb379fc090d6a5c7f76bba66b2
  Stored in directory: /root/.cache/pip/wheels/77/42/35/d33c5907bca04ac5742e9eceefb644b680286de26728506a70
  Building wheel for keras-pos-embd (setup.py) ... done
  Created wheel for keras-pos-embd: filename=keras_pos_embd-0.12.0-cp37-none-any.whl size=7471 sha256=a4886bc14822d2b87e518c1bf3d5b01458f270611945710f9353028a04cca39f
  Stored in directory: /root/.cache/pip/wheels/36/d8/36/06ed09215806dca9ff504d8c0dda5da68d7f2c67d34a231d82
  Building wheel for keras-multi-head (setup.py) ... done
  Created wheel for keras-multi-head: filename=keras_multi_head-0.28.0-cp37-none-any.whl size=15559 sha256=8db93c3ac4311c54b35c7cbb7fefbe713a7b3fc8cea60c182e96965537d3ba5b
  Stored in directory: /root/.cache/pip/wheels/ec/92/bd/b3407bc29501f7e28eb970a6c425a9a375485c5d8197df6a8f
  Building wheel for keras-layer-normalization (setup.py) ... done
  Created wheel for keras-layer-normalization: filename=keras_layer_normalization-0.15.0-cp37-none-any.whl size=5224 sha256=5d894eb6427045fa323cb63c8440df6e6b836213800e30fb88b14e9cfa839ef1
  Stored in directory: /root/.cache/pip/wheels/de/ea/db/833c8a9b8326e703e9f8a78c0d4153294e6a1b1f97a1836397
  Building wheel for keras-position-wise-feed-forward (setup.py) ... done
  Created wheel for keras-position-wise-feed-forward: filename=keras_position_wise_feed_forward-0.7.0-cp37-none-any.whl size=5542 sha256=58a416bee0b0d9f01d6237f1500f852179d0d64f1d9ad586dc213d2caf3e5f59
  Stored in directory: /root/.cache/pip/wheels/d2/d2/f6/58ce0aae0055dbccba8b40e62a6c22ab997105ad8c431a9e80
  Building wheel for keras-embed-sim (setup.py) ... done
  Created wheel for keras-embed-sim: filename=keras_embed_sim-0.9.0-cp37-none-any.whl size=4505 sha256=0dcabdb2aad88a3468cad06e689e1034ce5853cf34cd21a532403bd2dc800280
  Stored in directory: /root/.cache/pip/wheels/c1/d5/7d/bef5ee93c88bc6150294cc74cbb081647c505bf816918dd7ff
  Building wheel for keras-self-attention (setup.py) ... done
  Created wheel for keras-self-attention: filename=keras_self_attention-0.50.0-cp37-none-any.whl size=19416 sha256=db932d7b3b12e4abe6433cd7e452ad3b3469ac83423f1d1e286014ff65bfa8b3
  Stored in directory: /root/.cache/pip/wheels/29/93/0d/891573db60f74d0e43bd7db1496c3ef898f8b5946a4c24cbda
Successfully built ktrain langdetect syntok seqeval keras-bert keras-transformer keras-pos-embd keras-multi-head keras-layer-normalization keras-position-wise-feed-forward keras-embed-sim keras-self-attention
Installing collected packages: threadpoolctl, scikit-learn, langdetect, cchardet, syntok, seqeval, tokenizers, sacremoses, transformers, sentencepiece, keras-pos-embd, keras-self-attention, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-transformer, keras-bert, whoosh, ktrain
  Found existing installation: scikit-learn 0.22.2.post1
    Uninstalling scikit-learn-0.22.2.post1:
      Successfully uninstalled scikit-learn-0.22.2.post1
Successfully installed cchardet-2.1.7 keras-bert-0.88.0 keras-embed-sim-0.9.0 keras-layer-normalization-0.15.0 keras-multi-head-0.28.0 keras-pos-embd-0.12.0 keras-position-wise-feed-forward-0.7.0 keras-self-attention-0.50.0 keras-transformer-0.39.0 ktrain-0.26.3 langdetect-1.0.9 sacremoses-0.0.45 scikit-learn-0.23.2 sentencepiece-0.1.96 seqeval-0.0.19 syntok-1.3.1 threadpoolctl-2.1.0 tokenizers-0.10.3 transformers-4.3.3 whoosh-2.7.4
In [5]:
import tensorflow as tf
In [6]:
tf.__version__
Out[6]:
'2.5.0'
In [8]:
import pandas as pd
import numpy as np
import ktrain
from ktrain import text
import tensorflow as tf
from sklearn.metrics import accuracy_score, confusion_matrix,classification_report,f1_score,roc_curve
In [9]:
# Downloading the Dataset ( removing data and assets folder if existing already and then creating the folder )
!rm -rf data
!mkdir data
!rm -rf assets
!mkdir assets
test.csv:   0% 0.00/3.01M [00:00<?, ?B/s]
val.csv:   0% 0.00/883k [00:00<?, ?B/s]

test.csv: 100% 3.01M/3.01M [00:00<00:00, 6.28MB/s]

val.csv: 100% 883k/883k [00:00<00:00, 2.41MB/s]


train.csv: 100% 8.77M/8.77M [00:00<00:00, 9.92MB/s]
In [10]:
train_df = pd.read_csv("data/train.csv")
val_df = pd.read_csv("data/val.csv")
test_df = pd.read_csv("data/test.csv")
train_df.head()
Out[10]:
id text label
0 0 we propose deep network models and learning al... 3
1 1 multi-distance information computed by the MDL... 3
2 2 traditional solutions consider dense pedestria... 2
3 3 in this paper, is used the lagrangian classica... 2
4 4 the aim of this work is to determine how vulne... 3
In [11]:
word_count=train_df['text'].apply(lambda x:len(x.split()))
In [13]:
max(word_count)
Out[13]:
90
In [14]:
(X_train, y_train), (X_test, y_test), preproc = text.texts_from_df(train_df=train_df,
                                                                   text_column ='text',
                                                                   label_columns ='label',
                                                                   val_df = val_df,
                                                                   maxlen = 90,
                                                                   preprocess_mode = 'bert')
['label_0', 'label_1', 'label_2', 'label_3']
   label_0  label_1  label_2  label_3
0      0.0      0.0      0.0      1.0
1      0.0      0.0      0.0      1.0
2      0.0      0.0      1.0      0.0
3      0.0      0.0      1.0      0.0
4      0.0      0.0      0.0      1.0
['label_0', 'label_1', 'label_2', 'label_3']
   label_0  label_1  label_2  label_3
0      0.0      1.0      0.0      0.0
1      0.0      0.0      0.0      1.0
2      0.0      0.0      1.0      0.0
3      0.0      0.0      0.0      1.0
4      0.0      0.0      0.0      1.0
downloading pretrained BERT model (uncased_L-12_H-768_A-12.zip)...
[██████████████████████████████████████████████████]
extracting pretrained BERT model...
done.

cleanup downloaded zip...
done.

preprocessing train...
language: en
done.
Is Multi-Label? False
preprocessing test...
language: en
done.
In [15]:
model = text.text_classifier(name = 'bert',
                             train_data = (X_train, y_train),
                             preproc = preproc)
Is Multi-Label? False
maxlen is 90
done.
In [16]:
learner = ktrain.get_learner(model=model, train_data=(X_train, y_train),
                   val_data = (X_test, y_test),
                   batch_size = 6)
In [32]:
learner.fit_onecycle(lr = 2e-5, epochs = 1)

begin training using onecycle policy with max lr of 2e-05...
5250/5250 [==============================] - 1658s 316ms/step - loss: 0.2879 - accuracy: 0.8983 - val_loss: 0.3152 - val_accuracy: 0.8900
Out[32]:
<tensorflow.python.keras.callbacks.History at 0x7fa78e3e1490>
In [33]:
predictor = ktrain.get_predictor(learner.model, preproc)
In [34]:
predictor.get_classes()
Out[34]:
['label_0', 'label_1', 'label_2', 'label_3']
In [35]:
pred_val = predictor.predict(val_df['text'].values)
In [36]:
prediction_val=[]
for pred in pred_val:
  if pred=='label_0':
    prediction_val.append(0)
  elif pred=='label_1':
    prediction_val.append(1)
  elif pred=='label_2':
    prediction_val.append(2)
  else:
    prediction_val.append(3)
In [37]:
y_val=val_df['label']
In [38]:
# Getting F1 & Accuracy score of validation predictions
f1 = f1_score(y_val, prediction_val,average='weighted')
accuracy = accuracy_score(y_val, prediction_val)
f1,accuracy
Out[38]:
(0.8888655097387193, 0.89)

Making Submission

In [ ]:
pred_val = predictor.predict(test_df['text'].values)
In [ ]:
prediction_test=[]
for pred in pred_val:
  if pred=='label_0':
    prediction_test.append(0)
  elif pred=='label_1':
    prediction_test.append(1)
  elif pred=='label_2':
    prediction_test.append(2)
  else:
    prediction_test.append(3)
In [ ]:
# Putting the results into the column of test dataset
test_df['label'] = prediction_test
test_df
In [ ]:

Submitting to AI-Crowd

In [ ]:
import os
In [ ]:
# Saving out results in submission.csv
test_df.to_csv(os.path.join("assets", 'submission.csv'), index=False)
In [ ]:

In [ ]:


Comments

You must login before you can post a comment.

Execute