Starter Code for DCRCL Practice Challange

Author: Gauransh Kumar¶

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Downloading Dataset¶

Installing aicrowd-cli

In [1]:

!pip install aicrowd-cli
%load_ext aicrowd.magic

Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)

In [2]:

%aicrowd login

Please login here: https://api.aicrowd.com/auth/qv_SPcDtZK-3QOTSpQ8YHAtbs2O-YQWItafTopccKjc
Opening in existing browser session.
API Key valid
Saved API Key successfully!

In [3]:

!rm -rf data
!mkdir data
%aicrowd ds dl -c dcrcl -o data

Importing Libraries¶

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

Reading the dataset¶

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [5]:

# Reading the CSV
train_data_df = pd.read_csv("data/train.csv")
test_data_df = pd.read_csv("data/test.csv")

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())

	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	PAY_0	PAY_2	PAY_3	PAY_4	PAY_5	...	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6	default payment next month
0	30000	2	2	1	38	0	0	0	0	0	...	22810	25772	26360	1650	1700	1400	3355	1146	0	0
1	170000	1	4	1	28	0	0	0	-1	-1	...	11760	0	4902	14000	5695	11760	0	4902	6000	0
2	340000	1	1	2	38	0	0	0	-1	-1	...	1680	1920	9151	5000	7785	1699	1920	9151	187000	0
3	140000	2	2	2	29	0	0	0	2	0	...	65861	64848	64936	3000	8600	6	2500	2500	2500	0
4	130000	2	2	1	42	2	2	2	0	0	...	126792	103497	96991	6400	0	4535	3900	4300	3700	1

5 rows × 24 columns

	LIMIT_BAL	SEX	EDUCATION	MARRIAGE	AGE	...	BILL_AMT3	BILL_AMT4	BILL_AMT5	BILL_AMT6	PAY_AMT1	PAY_AMT2	PAY_AMT3	PAY_AMT4	PAY_AMT5	PAY_AMT6
0	30000	1	2	2	25	...	11581	12580	13716	14828	1500	2000	1500	1500	1500	2000
1	150000	2	1	2	26	...	116684	101581	77741	77264	4486	4235	3161	2647	2669	2669
2	70000	2	3	1	32	...	68530	69753	70111	70212	2431	3112	3000	2438	2500	2554
3	130000	1	3	2	49	...	16172	16898	11236	6944	1610	1808	7014	27	7011	4408
4	50000	2	2	2	36	...	42361	19574	20295	19439	2000	1500	1000	1800	0	1000

5 rows × 23 columns

Data Preprocessing¶

In [6]:

# Separating data from the dataframe for final training
X = train_data_df.drop(['default payment next month'], axis=1).to_numpy()
y = train_data_df["default payment next month"].to_numpy()
print(X.shape, y.shape)

(25500, 23) (25500,)

In [7]:

# Visualising the final lable classes for training
sns.countplot(y)

/home/gauransh/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(

Out[7]:

<AxesSubplot:ylabel='count'>

Splitting the data¶

In [8]:

# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)

(20400, 23)
(20400,)

In [9]:

X_train[0], y_train[0]

Out[9]:

(array([160000,      2,      3,      1,     56,     -1,     -1,     -1,
            -1,      0,      0,   2992,   4562,   -928,   1619,    928,
             0,   4562,      0,   2547,      0,      0,      0]),
 0)

Training the Model¶

In [10]:

model = AdaBoostClassifier()
model.fit(X_train, y_train)

Out[10]:

AdaBoostClassifier()

Validation¶

In [11]:

model.score(X_val, y_val)

Out[11]:

0.8186274509803921

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions¶

In [12]:

# Separating data from the dataframe for final testing
X_test = test_data_df.to_numpy()
print(X_test.shape)

(4500, 23)

In [13]:

# Predicting the labels
predictions = model.predict(X_test)
predictions.shape

Out[13]:

(4500,)

In [14]:

# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"default payment next month":predictions})
submission

Out[14]:

	default payment next month
0	0
1	0
2	0
3	0
4	0
...	...
4495	0
4496	1
4497	0
4498	0
4499	0

4500 rows × 1 columns

In [15]:

# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions¶

Note : Please save the notebook before submitting it (Ctrl + S)

In [16]:

!!aicrowd submission create -c dcrcl -f assets/submission.csv

Out[16]:

['submission.csv ━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 10.7/9.0 KB • 6.3 MB/s • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/dcrcl/submissions/172193              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/dcrcl/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/dcrcl/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/dcrcl                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/dcrcl                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 172193, 'created_at': '2022-01-16T09:35:17.834Z'}"]

In [ ]:

DCRCL

[Getting Started Notebook] DCRCL Challange