Starter Code for ELEPH Practice Challange

Author: Gauransh Kumar¶

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Downloading Dataset¶

Installing aicrowd-cli

In [1]:

!pip install aicrowd-cli
%load_ext aicrowd.magic

Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)

In [2]:

%aicrowd login

Please login here: https://api.aicrowd.com/auth/XAfXKIO4zGaHi_QjVYxYwNOx8DGCDE8Zw4-9xIvW7zc
Opening in existing browser session.
API Key valid
Saved API Key successfully!

In [3]:

!rm -rf data
!mkdir data
%aicrowd ds dl -c eleph -o data

Importing Libraries¶

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

In [5]:

import warnings
warnings.simplefilter('ignore')

Reading the dataset¶

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [6]:

# Reading the CSV
train_data_df = pd.read_csv("data/train.csv", header=None)
test_data_df = pd.read_csv("data/test.csv", header=None)

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())

	0	1	2	3	4	5	6	7	8	9	...	222	223	224	225	228	229	230
0	-1.028440	-0.817687	-0.934378	0.179815	-0.801838	2.448520	-0.955764	0.173104	-0.834751	1.468270	...	-0.049855	-0.114025	-0.078862	-0.021452	-0.014952	-0.021097	0
1	1.351240	1.609290	0.484712	-0.220993	-0.120600	0.682554	0.482247	-0.299439	-0.296004	-0.359694	...	-0.049855	-0.114025	-0.078862	-0.021452	-0.014952	-0.021097	1
2	-0.346995	-0.389204	-0.597216	1.984550	1.651640	1.647410	-0.642140	1.627030	2.193470	-0.339238	...	-0.049855	-0.108641	-0.078862	-0.021452	-0.014952	-0.021097	0
3	0.196729	0.490943	0.903638	-0.480368	-0.050549	-0.039294	0.950828	-0.574630	-0.093726	-0.582012	...	-0.049855	-0.114025	0.861594	-0.021452	-0.014952	-0.021097	1
4	-0.012461	-0.534009	1.769360	-1.086040	-0.415996	0.137351	1.769340	-0.943016	-0.424576	0.725233	...	-0.049855	-0.114025	-0.078862	-0.021452	-0.014952	-0.021097	1

5 rows × 231 columns

	0	1	2	3	4	5	6	7	8	9	...	222	223	224	225	228	229
0	0.252040	-0.062914	-0.106032	-1.282030	-0.118691	0.376921	-0.106347	-1.253890	-0.249070	-0.383122	...	-0.049855	-0.114025	-0.078862	-0.021452	-0.014952	-0.021097
1	0.839039	0.832301	-0.532765	-0.705758	0.154705	-0.389010	-0.499860	-0.713806	0.419571	-0.569427	...	-0.049855	-0.114025	-0.078862	-0.021452	-0.014952	-0.021097
2	-1.202330	-1.109760	-0.997163	-0.040994	-0.773383	0.371334	-1.030180	-0.045524	-0.803352	-1.959980	...	-0.049855	-0.114025	-0.078862	-0.021452	-0.014952	-0.021097
3	0.266426	0.346442	0.873791	-1.449160	-0.482545	-1.454720	0.761362	-1.432490	-0.271545	1.138050	...	-0.049855	-0.114025	-0.078862	-0.021452	-0.014952	-0.021097
4	-1.086360	-0.995242	-1.153850	-1.609160	-0.770178	-1.396840	-1.164490	-1.638440	-0.774266	1.702590	...	-0.049855	-0.114025	-0.078862	-0.021452	-0.014952	-0.021097

5 rows × 230 columns

Data Preprocessing¶

In [7]:

# Separating data from the dataframe for final training
X = train_data_df.drop([230], axis=1).to_numpy()
y = train_data_df[230].to_numpy()
print(X.shape, y.shape)

(1112, 230) (1112,)

In [8]:

# Visualising the final lable classes for training
sns.countplot(y)

Out[8]:

<AxesSubplot:ylabel='count'>

Splitting the data¶

In [9]:

# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)

(889, 230)
(889,)

In [10]:

X_train[0], y_train[0]

Out[10]:

(array([ 0.861814,  1.12245 ,  0.046207, -0.455047,  0.63451 , -1.05154 ,
         0.014337, -0.514984,  0.768931, -0.431034,  0.19529 ,  1.00914 ,
        -0.054535, -0.297912, -0.261335, -0.302051, -0.225084, -0.014952,
        -0.08711 , -0.167402, -0.200329, -0.248836, -0.021302, -0.028883,
        -0.038542, -0.023332, -0.023412,  0.      ,  0.      , -0.02702 ,
        -0.198246, -0.20685 , -0.080912, -0.327191, -0.347995, -0.26545 ,
        -0.080866, -0.224814, -0.529019, -0.430795, -0.06608 , -0.025787,
        -0.073582, -0.114571, -0.18949 , -0.160794, -0.201557,  0.      ,
        -0.021708, -0.021658, -0.041491, -0.032279, -0.071887, -0.095154,
        -0.014952,  0.      ,  0.      ,  0.      ,  0.      , -0.014952,
        -0.017265, -0.024647, -0.021215,  0.      ,  0.      ,  0.      ,
         0.      ,  0.      ,  0.      , -0.065785, -0.049974, -0.014952,
        -0.048732, -0.173228, -0.177573, -0.091775, -0.050171, -0.114846,
        -0.3317  , -0.408478, -0.242065, -0.139878,  0.      , -0.06989 ,
        -0.263975, -0.785427, -0.525182, -0.112881, -0.102253,  0.      ,
         0.      , -0.040796, -0.056023, -0.168633, -0.188284, -0.115671,
        -0.074486,  0.      ,  0.      ,  0.      , -0.014952, -0.014982,
        -0.040764, -0.042342, -0.027653, -0.014952,  0.      ,  0.      ,
         0.      ,  0.      ,  0.      , -0.021732, -0.021365, -0.018   ,
         0.      ,  0.      ,  0.      ,  0.      ,  0.      ,  0.      ,
        -0.017556, -0.021126,  0.      ,  0.      ,  0.      ,  0.      ,
         0.      ,  0.      ,  0.      ,  0.      ,  0.      ,  0.      ,
         0.      , -0.018485, -0.01594 ,  0.      ,  0.      , -0.043134,
        -0.029996, -0.09331 , -0.098605, -0.071579, -0.017934, -0.063725,
        -0.182197, -0.255619, -0.391115, -0.186669, -0.077681, -0.03269 ,
         0.      , -0.053085, -0.364504, -1.06519 ,  1.61023 ,  1.89205 ,
        -0.072849, -0.032187,  0.      ,  0.      , -0.061848, -0.174962,
        -0.192699, -0.054582, -0.032023,  0.      ,  0.      ,  0.      ,
        -0.015029, -0.021431, -0.018157,  0.      ,  0.      ,  0.      ,
         0.      ,  0.      ,  0.      , -0.019294, -0.017308,  0.      ,
         0.      ,  0.      ,  0.      ,  0.      ,  0.      ,  0.      ,
         0.      ,  0.      ,  0.      ,  0.      ,  0.      ,  0.      ,
         0.      ,  0.      ,  0.      ,  0.      ,  0.      ,  0.      ,
         0.      ,  0.      , -0.032096, -0.043388, -0.027179,  0.      ,
         0.      ,  0.      , -0.099876, -0.135405, -0.183154, -0.106346,
        -0.037709,  0.      ,  0.      ,  0.      , -0.246219,  1.45299 ,
         0.71941 , -0.088099, -0.017565,  0.      ,  0.      ,  0.      ,
        -0.049855, -0.114025, -0.078862, -0.021452,  0.      ,  0.      ,
        -0.014952, -0.021097]),
 1)

Training the Model¶

In [11]:

model = MLPClassifier()
model.fit(X_train, y_train)

Out[11]:

MLPClassifier()

Validation¶

In [12]:

model.score(X_val, y_val)

Out[12]:

0.8654708520179372

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions¶

In [13]:

# Separating data from the dataframe for final testing
X_test = test_data_df.to_numpy()
print(X_test.shape)

(279, 230)

In [14]:

# Predicting the labels
predictions = model.predict(X_test)
predictions.shape

Out[14]:

(279,)

In [15]:

# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"Output":predictions})
submission

Out[15]:

	Output
0	0
1	0
2	1
3	1
4	1
...	...
274	1
275	1
276	0
277	0
278	0

279 rows × 1 columns

In [16]:

# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions¶

Note : Please save the notebook before submitting it (Ctrl + S)

In [17]:

!!aicrowd submission create -c eleph -f assets/submission.csv

Out[17]:

['submission.csv ━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2,210/565 bytes • ? • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/eleph/submissions/172218              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/eleph/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/eleph/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/eleph                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/eleph                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 172218, 'created_at': '2022-01-17T21:15:49.414Z'}"]

In [ ]:

ELEPH

[Getting Started Notebook] ELEPH Challange