Obstacle Prediction

Solution for submission 154978

A detailed solution for submission 154978 submitted for challenge Obstacle Prediction


Starter Code for Obstacle Prediction

What we are going to Learn

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Downloading Dataset

Installing aicrowd-cli

In [1]:
!pip install aicrowd-cli
%load_ext aicrowd.magic
Collecting aicrowd-cli
  Downloading aicrowd_cli-0.1.9-py3-none-any.whl (43 kB)
     |████████████████████████████████| 43 kB 1.2 MB/s 
Requirement already satisfied: toml<1,>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (0.10.2)
Collecting requests<3,>=2.25.1
  Downloading requests-2.26.0-py2.py3-none-any.whl (62 kB)
     |████████████████████████████████| 62 kB 918 kB/s 
Collecting GitPython==3.1.18
  Downloading GitPython-3.1.18-py3-none-any.whl (170 kB)
     |████████████████████████████████| 170 kB 49.2 MB/s 
Requirement already satisfied: click<8,>=7.1.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (7.1.2)
Collecting requests-toolbelt<1,>=0.9.1
  Downloading requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
     |████████████████████████████████| 54 kB 3.0 MB/s 
Collecting rich<11,>=10.0.0
  Downloading rich-10.9.0-py3-none-any.whl (211 kB)
     |████████████████████████████████| 211 kB 56.0 MB/s 
Requirement already satisfied: tqdm<5,>=4.56.0 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (4.62.0)
Requirement already satisfied: typing-extensions>= in /usr/local/lib/python3.7/dist-packages (from GitPython==3.1.18->aicrowd-cli) (
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.7-py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 2.0 MB/s 
Collecting smmap<5,>=3.0.1
  Downloading smmap-4.0.0-py2.py3-none-any.whl (24 kB)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.5.30)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.24.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.4)
Collecting commonmark<0.10.0,>=0.9.0
  Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
     |████████████████████████████████| 51 kB 7.6 MB/s 
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.6.1)
Collecting colorama<0.5.0,>=0.4.0
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Installing collected packages: smmap, requests, gitdb, commonmark, colorama, rich, requests-toolbelt, GitPython, aicrowd-cli
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.26.0 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
Successfully installed GitPython-3.1.18 aicrowd-cli-0.1.9 colorama-0.4.4 commonmark-0.9.1 gitdb-4.0.7 requests-2.26.0 requests-toolbelt-0.9.1 rich-10.9.0 smmap-4.0.0
In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/Yr79bQ8iichjQzq0mD-LjOxsRoMF_l9Yumbd75g1gyk
API Key valid
Saved API Key successfully!
In [3]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c obstacle-prediction -o data

Importing Libraries

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import os
import matplotlib.pyplot as plt
import seaborn as sns

Reading the dataset

Here, we will read the data.npz which contains both training samples & labels, and testing samples

In [6]:
data = np.load("/content/data/data.npz", allow_pickle=True)

train_data = data['train']
test_data = data['test']

train_data.shape, test_data.shape
((5000, 2), (3000,))
In [9]:
# Convert each training to 1D array so can we can put that into a sklearn model
X = [sample.flatten() for sample in train_data[:, 0].tolist()]
y = train_data[:, 1].tolist()
In [10]:
# Checking for any class imbalance
/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
<matplotlib.axes._subplots.AxesSubplot at 0x7f8d56ff3190>

Splitting the data

In [11]:
# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
In [14]:
X_train[0], y_train[0]
(array([-2.57025456,  0.05781218, -0.14870299, ..., -1.        ,
        -1.        , -1.        ]), 0)

Training the Model

In [15]:
model = RandomForestClassifier(max_depth=7, n_estimators=300)
model.fit(X_train, y_train)
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=7, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=300,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)


In [26]:
from sklearn.metrics import f1_score, accuracy_score

y_hat = model.predict(X_val)
f1 = f1_score(y_hat, y_val)
sms = model.score(X_val, y_val)
acc = accuracy_score(y_hat, y_val)

print('f1_score = {} \nanother score = {} \naccuracy = {}'.format(f1, sms, acc))
f1_score = 0.9755244755244756 
another score = 0.986 
accuracy = 0.986

Not too bad accuracy, but let's see how well it goes in testing set


In [27]:
# Converting each testing sample into 1D array
X_test = [sample.flatten() for sample in test_data.tolist()]
In [28]:
# Predicting the labels
predictions = model.predict(X_test)
In [29]:
# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"label":predictions})
0 0
1 1
2 0
3 1
4 0
... ...
2995 0
2996 1
2997 0
2998 0
2999 0

3000 rows × 1 columns

In [30]:
# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions

Note : Please save the notebook before submitting it (Ctrl + S)

In [32]:

Using notebook: /content/drive/MyDrive/Colab Notebooks/Obstacle Prediction for submission...
Removing existing files from submission directory...
Scrubbing API keys from the notebook...
An unexpected error occured!
[Errno 2] No such file or directory: '/content/drive/MyDrive/Colab Notebooks/Obstacle Prediction'
To get more information, you can run this command with -v.
To increase level of verbosity, you can go upto -vvvvv
In [ ]:


You must login before you can post a comment.