Loading

DCRCL

[Getting Started Notebook] DCRCL Challange

This is a Baseline Code to get you started with the challenge.

gauransh_k

You can use this code to start understanding the data and create a baseline model for further imporvments.

Starter Code for DCRCL Practice Challange

Author: Gauransh Kumar

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Downloading Dataset

Installing aicrowd-cli

In [1]:
!pip install aicrowd-cli
%load_ext aicrowd.magic
Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/qv_SPcDtZK-3QOTSpQ8YHAtbs2O-YQWItafTopccKjc
Opening in existing browser session.
API Key valid
Saved API Key successfully!
In [3]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c dcrcl -o data

Importing Libraries

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

Reading the dataset

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [5]:
# Reading the CSV
train_data_df = pd.read_csv("data/train.csv")
test_data_df = pd.read_csv("data/test.csv")

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())
LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 PAY_3 PAY_4 PAY_5 ... BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 default payment next month
0 30000 2 2 1 38 0 0 0 0 0 ... 22810 25772 26360 1650 1700 1400 3355 1146 0 0
1 170000 1 4 1 28 0 0 0 -1 -1 ... 11760 0 4902 14000 5695 11760 0 4902 6000 0
2 340000 1 1 2 38 0 0 0 -1 -1 ... 1680 1920 9151 5000 7785 1699 1920 9151 187000 0
3 140000 2 2 2 29 0 0 0 2 0 ... 65861 64848 64936 3000 8600 6 2500 2500 2500 0
4 130000 2 2 1 42 2 2 2 0 0 ... 126792 103497 96991 6400 0 4535 3900 4300 3700 1

5 rows × 24 columns

LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 PAY_3 PAY_4 PAY_5 ... BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6
0 30000 1 2 2 25 0 0 0 0 0 ... 11581 12580 13716 14828 1500 2000 1500 1500 1500 2000
1 150000 2 1 2 26 0 0 0 0 0 ... 116684 101581 77741 77264 4486 4235 3161 2647 2669 2669
2 70000 2 3 1 32 0 0 0 0 0 ... 68530 69753 70111 70212 2431 3112 3000 2438 2500 2554
3 130000 1 3 2 49 0 0 0 0 0 ... 16172 16898 11236 6944 1610 1808 7014 27 7011 4408
4 50000 2 2 2 36 0 0 0 0 0 ... 42361 19574 20295 19439 2000 1500 1000 1800 0 1000

5 rows × 23 columns

Data Preprocessing

In [6]:
# Separating data from the dataframe for final training
X = train_data_df.drop(['default payment next month'], axis=1).to_numpy()
y = train_data_df["default payment next month"].to_numpy()
print(X.shape, y.shape)
(25500, 23) (25500,)
In [7]:
# Visualising the final lable classes for training
sns.countplot(y)
/home/gauransh/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(
Out[7]:
<AxesSubplot:ylabel='count'>

Splitting the data

In [8]:
# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)
(20400, 23)
(20400,)
In [9]:
X_train[0], y_train[0]
Out[9]:
(array([160000,      2,      3,      1,     56,     -1,     -1,     -1,
            -1,      0,      0,   2992,   4562,   -928,   1619,    928,
             0,   4562,      0,   2547,      0,      0,      0]),
 0)

Training the Model

In [10]:
model = AdaBoostClassifier()
model.fit(X_train, y_train)
Out[10]:
AdaBoostClassifier()

Validation

In [11]:
model.score(X_val, y_val)
Out[11]:
0.8186274509803921

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions

In [12]:
# Separating data from the dataframe for final testing
X_test = test_data_df.to_numpy()
print(X_test.shape)
(4500, 23)
In [13]:
# Predicting the labels
predictions = model.predict(X_test)
predictions.shape
Out[13]:
(4500,)
In [14]:
# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"default payment next month":predictions})
submission
Out[14]:
default payment next month
0 0
1 0
2 0
3 0
4 0
... ...
4495 0
4496 1
4497 0
4498 0
4499 0

4500 rows × 1 columns

In [15]:
# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions

Note : Please save the notebook before submitting it (Ctrl + S)

In [16]:
!!aicrowd submission create -c dcrcl -f assets/submission.csv
Out[16]:
['submission.csv ━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 10.7/9.0 KB • 6.3 MB/s • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/dcrcl/submissions/172193              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/dcrcl/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/dcrcl/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/dcrcl                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/dcrcl                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 172193, 'created_at': '2022-01-16T09:35:17.834Z'}"]
In [ ]:


Comments

You must login before you can post a comment.

Execute