Starter Code for CRIME Practice Challange

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Author: Gauransh Kumar¶

Downloading Dataset¶

Installing aicrowd-cli

In [1]:

!pip install aicrowd-cli
%load_ext aicrowd.magic

Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)

In [2]:

%aicrowd login

Please login here: https://api.aicrowd.com/auth/VA0UkUPcqou4AaNOLOyrH2qVAstlDmAu6ylAG29c4iI
Opening in existing browser session.
API Key valid
Saved API Key successfully!

In [3]:

!rm -rf data
!mkdir data
%aicrowd ds dl -c crime -o data

Importing Libraries¶

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [2]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

Reading the dataset¶

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [3]:

# Reading the CSV
train_data_df = pd.read_csv("data/train.csv", header=None)
test_data_df = pd.read_csv("data/test.csv", header=None, skipfooter=1, engine='python')

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())

	0	1	2	3	4	5	6	7	8	9	...	118	119	120	121	122	123	124	125	126	127
0	53	-1	-1.1	Tukwilacity	1	0.00	0.16	0.12	0.74	0.45	...	0.02	0.12	0.45	-1.19	-1.20	-1.21	-1.22	0.62	-1.23	0.67
1	46	13	100.0	Aberdeencity	2	0.02	0.35	0.00	0.94	0.03	...	0.02	0.25	0.00	-1.00	-1.00	-1.00	-1.00	0.00	-1.00	0.08
2	25	9	34550.0	Lawrencecity	10	0.10	0.56	0.12	0.47	0.12	...	0.02	0.84	0.15	0.00	0.02	0.54	0.00	0.93	0.14	0.88
3	51	-1	-1.0	Blacksburgtown	1	0.04	0.67	0.08	0.81	0.47	...	0.05	0.15	0.36	-1.00	-1.00	-1.00	-1.00	0.00	-1.00	0.05
4	12	-1	-1.0	SouthDaytonacity	10	0.00	0.24	0.07	0.93	0.05	...	0.01	0.29	0.03	-1.00	-1.00	-1.00	-1.00	0.00	-1.00	0.19

5 rows × 128 columns

	0	1	2	3	4	5	6	7	8	9	...	117	118	119	120	121	122	123	124	125	126
0	45	-1	-1.1	Tukwilacity	1	0.00	0.16	0.12	0.74	0.45	...	-1.18	0.02	0.12	0.45	-1.19	-1.2	-1.21	-1.22	0.60	-1.23
1	44	5	49960.0	Newportcity	6	0.03	0.37	0.16	0.83	0.09	...	0.44	0.02	0.30	0.12	0.00	0.0	0.80	0.00	0.79	0.21
2	34	21	80240.0	WestWindsortownship	1	0.01	0.61	0.05	0.73	0.91	...	-1.00	0.07	0.05	1.00	-1.00	-1.0	-1.00	-1.00	0.00	-1.00
3	18	-1	-1.0	Bedfordcity	5	0.01	0.31	0.01	0.99	0.02	...	-1.00	0.03	0.10	0.02	-1.00	-1.0	-1.00	-1.00	0.00	-1.00
4	6	-1	-1.0	Fillmorecity	2	0.00	0.90	0.00	0.63	0.05	...	-1.00	0.01	0.38	0.02	-1.00	-1.0	-1.00	-1.00	0.00	-1.00

5 rows × 127 columns

Data Preprocessing¶

In [4]:

train_data_df.columns

Out[4]:

Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            118, 119, 120, 121, 122, 123, 124, 125, 126, 127],
           dtype='int64', length=128)

In [5]:

# Separating data from the dataframe for final training
X = train_data_df.drop([0,1,2,3,4, 127], axis=1).to_numpy()
y = train_data_df[127].to_numpy()
print(X.shape, y.shape)

(1594, 122) (1594,)

Splitting the data¶

In [6]:

# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)

(1275, 122)
(1275,)

In [7]:

X_train[0], y_train[0]

Out[7]:

(array([ 0.01,  0.68,  0.03,  0.96,  0.08,  0.01,  0.55,  0.35,  0.23,
         0.16,  0.03,  1.  ,  0.97,  0.81,  0.55,  0.83,  0.17,  0.06,
         0.31,  0.89,  0.77,  0.75,  0.61,  0.  ,  0.48,  0.  ,  0.34,
         0.  ,  0.02,  0.02,  0.05,  0.9 ,  0.14,  0.61,  0.3 ,  0.55,
         0.05,  0.91,  0.08,  0.22,  0.15,  0.12,  0.6 ,  0.97,  0.96,
         0.94,  0.93,  0.27,  0.19,  0.  ,  0.04,  0.  ,  0.54,  0.48,
         0.49,  0.5 ,  0.11,  0.09,  0.08,  0.08,  0.96,  0.03,  0.15,
         0.19,  0.73,  0.71,  0.59,  0.96,  0.01,  0.  ,  1.  ,  0.01,
         0.85,  0.95,  0.03,  0.31,  0.83,  0.  ,  0.  ,  0.45,  0.46,
         0.5 ,  0.62,  0.72,  1.  ,  0.76,  0.6 ,  0.44,  0.24,  0.  ,
         0.  ,  0.08,  0.39,  0.47,  0.21,  0.25, -1.  , -1.  , -1.  ,
        -1.  , -1.  , -1.  , -1.  , -1.  , -1.  , -1.  , -1.  , -1.  ,
        -1.  , -1.  , -1.  , -1.  , -1.  ,  0.08,  0.05,  0.03, -1.  ,
        -1.  , -1.  , -1.  ,  0.  , -1.  ]),
 0.03)

Training the Model¶

In [8]:

model = MLPRegressor()
model.fit(X_train, y_train)

Out[8]:

MLPRegressor()

Validation¶

In [9]:

model.score(X_val, y_val)

Out[9]:

0.6343007561391997

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions¶

In [10]:

# Separating data from the dataframe for final testing
X_test = test_data_df.drop([0,1,2,3,4], axis=1).to_numpy()
print(X_test.shape)

(399, 122)

In [11]:

# Predicting the labels
predictions = model.predict(X_test)
predictions.shape

Out[11]:

(399,)

In [12]:

# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"crime_rate":predictions})
submission

Out[12]:

	crime_rate
0	0.617492
1	0.285325
2	-0.070311
3	0.177777
4	0.382927
...	...
394	0.243872
395	0.148659
396	0.047643
397	0.752610
398	0.815016

399 rows × 1 columns

In [13]:

# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions¶

Note : Please save the notebook before submitting it (Ctrl + S)

In [14]:

!!aicrowd submission create -c crime -f assets/submission.csv

Out[14]:

['submission.csv ━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 9.5/7.8 KB • 4.7 MB/s • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/crime/submissions/172184              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/crime/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/crime/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/crime                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/crime                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 172184, 'created_at': '2022-01-15T15:49:47.981Z'}"]

In [ ]:

CRIME

[Getting Started Notebook] CRIME Challange