This notebook will help you to understand images from different classes. Specifically, images of 'stray_partical' and 'discoloration' new classes introduced in 2nd round of this challenge.

We make use of deepml python library to quickly visualize these images.

In [ ]:

!pip install deepml

In [1]:

import pandas as pd
import deepml
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl

#mpl.rcParams['text.color'] = 'white'

In [2]:

train_df = pd.read_csv("data-purchasing-challenge-2022-starter-kit/data/training/labels.csv")
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   filename        1000 non-null   object
 1   scratch_small   1000 non-null   int64 
 2   scratch_large   1000 non-null   int64 
 3   dent_small      1000 non-null   int64 
 4   dent_large      1000 non-null   int64 
 5   stray_particle  1000 non-null   int64 
 6   discoloration   1000 non-null   int64 
dtypes: int64(6), object(1)
memory usage: 54.8+ KB

In [3]:

train_df.head()

Out[3]:

	filename	scratch_small	dent_small
0	np7x98vV9L.png	0	1
1	eJL9eBxtwi.png	1	0
2	Mm0wzMknhT.png	0	0
3	UJhpQVf8LP.png	0	0
4	5vpsw4NX6n.png	0	0

Create additional class called 'no_defect' for image samples containig no damages.

In [4]:

train_df['no_defect'] = (~train_df.iloc[:, 1:].any(axis=1)).astype(int)

In [5]:

classes = train_df.columns[1:].tolist()
classes

Out[5]:

['scratch_small',
 'scratch_large',
 'dent_small',
 'dent_large',
 'stray_particle',
 'discoloration',
 'no_defect']

Since it's a multiclass classification challenge, let's create Joined Class Label Distribution.¶

In [6]:

train_df['joined_label'] = train_df[classes].apply(lambda row: " ".join([c for c in classes if row[c]]),
                                                                                     axis=1)
train_df.head()

Out[6]:

	filename	scratch_small	dent_small	no_defect	joined_label
0	np7x98vV9L.png	0	1	0	dent_small
1	eJL9eBxtwi.png	1	0	0	scratch_small
2	Mm0wzMknhT.png	0	0	1	no_defect
3	UJhpQVf8LP.png	0	0	1	no_defect
4	5vpsw4NX6n.png	0	0	1	no_defect

In [7]:

train_df['joined_label'].value_counts()

Out[7]:

stray_particle                                                                    524
no_defect                                                                         202
scratch_small dent_small stray_particle discoloration                              34
dent_small                                                                         30
scratch_small dent_large stray_particle discoloration                              27
dent_large stray_particle                                                          23
scratch_small scratch_large dent_small stray_particle                              23
scratch_small dent_small stray_particle                                            21
scratch_small dent_small                                                           19
scratch_small scratch_large                                                        16
scratch_small                                                                      14
scratch_small scratch_large dent_small dent_large stray_particle discoloration     12
scratch_large                                                                       6
dent_large                                                                          6
dent_small discoloration                                                            5
dent_small stray_particle                                                           5
scratch_small scratch_large dent_large stray_particle discoloration                 5
scratch_large dent_small                                                            4
dent_large stray_particle discoloration                                             4
scratch_small scratch_large dent_large                                              3
scratch_small dent_large                                                            3
scratch_large dent_small discoloration                                              2
scratch_small scratch_large dent_small stray_particle discoloration                 2
scratch_small scratch_large dent_small dent_large discoloration                     2
scratch_small stray_particle                                                        2
scratch_small scratch_large dent_small dent_large stray_particle                    2
scratch_small scratch_large dent_small discoloration                                1
dent_small dent_large discoloration                                                 1
discoloration                                                                       1
scratch_small dent_small dent_large stray_particle                                  1
Name: joined_label, dtype: int64

In [8]:

plt.figure(figsize=(10,15))
sns.countplot(y='joined_label', data=train_df)

Out[8]:

<AxesSubplot:xlabel='count', ylabel='joined_label'>

In [9]:

from deepml.visualize import show_images_from_dataframe

/Users/rathods/opt/anaconda3/envs/machine_learning/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Random samples from training csv file¶

In [10]:

train_image_dir = "data-purchasing-challenge-2022-starter-kit/data/training/images"
show_images_from_dataframe(train_df, img_dir = train_image_dir, image_file_name_column='filename', 
                           label_column='joined_label', samples=10, cols=2, figsize=(10, 30))

In [12]:

from deepml.visualize import show_images_from_folder

Image samples showing only large scratches (scratch_large)¶

In [13]:

show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'scratch_large']['filename'].tolist())

Image samples showing only small scratches (scratch_small)¶

In [14]:

show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'scratch_small']['filename'].tolist(), 
                        figsize=(15,20))

Image samples showing only small dents (dent_small)¶

In [15]:

show_images_from_folder(train_image_dir, images=train_df[train_df['joined_label'] == 'dent_small']['filename'].tolist()[:12], 
                        figsize=(15, 20))

Please watch out for noise samples in the dataset. May be image file j1NNKMd2ho.png does not contain any damages.

Image samples showing only large dents (dent_large)¶

In [16]:

show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'dent_large']['filename'].tolist(), 
                        figsize=(15, 10))

Image samples showing only discoloration (discoloration)¶

In [17]:

show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'discoloration']['filename'].tolist(), figsize=(15, 20))

We have only one sample showing only discoloration damages.

Image samples showing only stray particles (stray_particle)¶

In [18]:

show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'stray_particle']['filename'].tolist()[:12], 
                        figsize=(15, 20))

Image samples showing no damages (no_defect)¶

In [19]:

show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'no_defect']['filename'].tolist()[:12], 
                        figsize=(15, 20))

Similarly, we can look at image samples containing different combination of class labels.

Image samples showing all damages (scratch_small, scratch_large, dent_small, dent_large, stray_particle, discoloration)¶

In [20]:

show_images_from_folder(train_image_dir, images= train_df[train_df['joined_label'] == 'scratch_small scratch_large dent_small dent_large stray_particle discoloration']['filename'].tolist()[:12], 
                        figsize=(15, 20))