Loading

Tree Segmentation

Exploring Augmentations For Beginners

A notebook for beginners to get started with data augmentation

jinoooooooooo

This notebook will walk-through the albumentations library for data augmentation and how to get started for improving the quality of models you can build



Have you ever been at a point where you have an amazing idea that you could implement with your model , so you put on that headphones and start browsing the internet for relevant images... But uhhh there's not much of data that fits your need to start with 😔

But but ... the start of art networks all used thousands and thousands of images... What am I gonna do now ? 😢

Data augmentations to the rescue 🥏

So what exactly is data augmentation and why is it so popular?

Data Augmentation is a very powerful way of increasing your data by NOT copy pasting it but generate images through some special operations to trick the neural network that they are different images .

Refer this link for more examples and explanations - https://albumentations.ai/docs/

In this notebook, we will use the Albumentations library which is one of the most flexible libraries out there with plenty of techniques to increase data.

Y'all ready? 😋



In [1]:
!pip install aicrowd-cli

%load_ext aicrowd.magic
Collecting aicrowd-cli
  Downloading aicrowd_cli-0.1.8-py3-none-any.whl (43 kB)
     |████████████████████████████████| 43 kB 1.3 MB/s 
Collecting requests-toolbelt<1,>=0.9.1
  Downloading requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
     |████████████████████████████████| 54 kB 2.0 MB/s 
Requirement already satisfied: toml<1,>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (0.10.2)
Collecting tqdm<5,>=4.56.0
  Downloading tqdm-4.62.0-py2.py3-none-any.whl (76 kB)
     |████████████████████████████████| 76 kB 5.0 MB/s 
Collecting requests<3,>=2.25.1
  Downloading requests-2.26.0-py2.py3-none-any.whl (62 kB)
     |████████████████████████████████| 62 kB 1.0 MB/s 
Collecting rich<11,>=10.0.0
  Downloading rich-10.6.0-py3-none-any.whl (208 kB)
     |████████████████████████████████| 208 kB 15.1 MB/s 
Collecting GitPython==3.1.18
  Downloading GitPython-3.1.18-py3-none-any.whl (170 kB)
     |████████████████████████████████| 170 kB 17.4 MB/s 
Requirement already satisfied: click<8,>=7.1.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (7.1.2)
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.7-py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 2.2 MB/s 
Requirement already satisfied: typing-extensions>=3.7.4.0 in /usr/local/lib/python3.7/dist-packages (from GitPython==3.1.18->aicrowd-cli) (3.7.4.3)
Collecting smmap<5,>=3.0.1
  Downloading smmap-4.0.0-py2.py3-none-any.whl (24 kB)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.10)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.2)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.5.30)
Collecting commonmark<0.10.0,>=0.9.0
  Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
     |████████████████████████████████| 51 kB 7.8 MB/s 
Collecting colorama<0.5.0,>=0.4.0
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.6.1)
Installing collected packages: smmap, requests, gitdb, commonmark, colorama, tqdm, rich, requests-toolbelt, GitPython, aicrowd-cli
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Attempting uninstall: tqdm
    Found existing installation: tqdm 4.41.1
    Uninstalling tqdm-4.41.1:
      Successfully uninstalled tqdm-4.41.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.26.0 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
Successfully installed GitPython-3.1.18 aicrowd-cli-0.1.8 colorama-0.4.4 commonmark-0.9.1 gitdb-4.0.7 requests-2.26.0 requests-toolbelt-0.9.1 rich-10.6.0 smmap-4.0.0 tqdm-4.62.0
In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/KEkP77oL1fq_RzUHqtGAXGSS81nmQdS4P8nwk6_Epus
API Key valid
Saved API Key successfully!
In [3]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c tree-segmentation -o data
In [4]:
!unzip data/train.zip -d data/train > /dev/null
!unzip data/test.zip -d data/test > /dev/null
In [5]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

import numpy as np
import cv2
import matplotlib.pyplot as plt
from tqdm import tqdm
import shutil
import copy
import random
In [6]:
TRAIN_DIR = '/content/data/train'
os.listdir(TRAIN_DIR)
Out[6]:
['image', 'segmentation']
In [7]:
len(os.listdir('/content/data/train/image')), len(os.listdir('/content/data/train/segmentation'))
Out[7]:
(5000, 5000)
In [8]:
x_train_dir = os.path.join(TRAIN_DIR + '/image')
y_train_dir = os.path.join(TRAIN_DIR + '/segmentation')

THE VISUALIZE FUNCTION 📷

This function will help us visualize the before and after effects of using augmentations

In [9]:
def visualize(**images):
  n = len(images)
  plt.figure(figsize = (10,10))
  for i, (name, image) in enumerate(images.items()):
    plt.subplot(1, n, i + 1)
    plt.title(' '.join(name.split('_')).title())
    plt.imshow(image)
  plt.show()

LOADING THE DATA 🖨️

In [10]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
import cv2
from natsort import natsorted
from PIL import Image
In [11]:
class TreeSegmentationDataset(Dataset):
    def __init__(self, img_directory=None, label_directory=None, train=True, augmentation = None, preprocessing = None):
        

        self.img_directory = img_directory
        self.label_directory = label_directory  

        self.augmentation = augmentation        

        # If the image direcotry is valid      
        if img_directory != None:
          self.img_list = natsorted(os.listdir(img_directory))
          self.label_list = natsorted(os.listdir(label_directory))

        self.train = train

    def __len__(self):
        return len(self.img_list)

    def __getitem__(self, idx):

        image = cv2.imread(os.path.join(self.img_directory, self.img_list[idx]))
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        if self.train == True:

          mask = cv2.imread(os.path.join(self.label_directory, self.label_list[idx]))

          if self.augmentation:
            sample = self.augmentation(image = image, mask = mask)
            image, mask = sample['image'], sample['mask']

          return image, mask
        
        else:
          return image

This is how our current mask and images look like 😮

In [17]:
dataset = TreeSegmentationDataset(x_train_dir, y_train_dir)

image, mask = dataset[10]
visualize(
    image = image, 
    tree_mask = mask,
)

DATA AUGMENTATION

In [13]:
! pip install albumentations==0.4.6
Collecting albumentations==0.4.6
  Downloading albumentations-0.4.6.tar.gz (117 kB)
     |████████████████████████████████| 117 kB 9.1 MB/s 
Requirement already satisfied: numpy>=1.11.1 in /usr/local/lib/python3.7/dist-packages (from albumentations==0.4.6) (1.19.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from albumentations==0.4.6) (1.4.1)
Collecting imgaug>=0.4.0
  Downloading imgaug-0.4.0-py2.py3-none-any.whl (948 kB)
     |████████████████████████████████| 948 kB 14.7 MB/s 
Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from albumentations==0.4.6) (3.13)
Requirement already satisfied: opencv-python>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from albumentations==0.4.6) (4.1.2.30)
Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (7.1.2)
Requirement already satisfied: scikit-image>=0.14.2 in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (0.16.2)
Requirement already satisfied: imageio in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (2.4.1)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (3.2.2)
Requirement already satisfied: Shapely in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (1.7.1)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from imgaug>=0.4.0->albumentations==0.4.6) (1.15.0)
Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.14.2->imgaug>=0.4.0->albumentations==0.4.6) (2.5.1)
Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.14.2->imgaug>=0.4.0->albumentations==0.4.6) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug>=0.4.0->albumentations==0.4.6) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug>=0.4.0->albumentations==0.4.6) (1.3.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug>=0.4.0->albumentations==0.4.6) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->imgaug>=0.4.0->albumentations==0.4.6) (2.8.1)
Requirement already satisfied: decorator<5,>=4.3 in /usr/local/lib/python3.7/dist-packages (from networkx>=2.0->scikit-image>=0.14.2->imgaug>=0.4.0->albumentations==0.4.6) (4.4.2)
Building wheels for collected packages: albumentations
  Building wheel for albumentations (setup.py) ... done
  Created wheel for albumentations: filename=albumentations-0.4.6-py3-none-any.whl size=65173 sha256=4a825acce363d9b4b1dec125df7c77d5a1f4589127165dd1a9c4d5b3d61850bf
  Stored in directory: /root/.cache/pip/wheels/cf/34/0f/cb2a5f93561a181a4bcc84847ad6aaceea8b5a3127469616cc
Successfully built albumentations
Installing collected packages: imgaug, albumentations
  Attempting uninstall: imgaug
    Found existing installation: imgaug 0.2.9
    Uninstalling imgaug-0.2.9:
      Successfully uninstalled imgaug-0.2.9
  Attempting uninstall: albumentations
    Found existing installation: albumentations 0.1.12
    Uninstalling albumentations-0.1.12:
      Successfully uninstalled albumentations-0.1.12
Successfully installed albumentations-0.4.6 imgaug-0.4.0
In [14]:
import albumentations as albu

This is where we start defining our augmentations 🤩

In [15]:
def get_training_augmentation():
    train_transform = [

        albu.HorizontalFlip(p=0.5), # horizontal flips with 50% probability

        albu.ShiftScaleRotate(scale_limit=0.5, rotate_limit=0, shift_limit=0.1, p=1, border_mode=0), #scaling and rotation both

        albu.PadIfNeeded(min_height=320, min_width=320, always_apply=True, border_mode=0), #padding the images
        albu.RandomCrop(height=320, width=320, always_apply=True), #randomly cropping the images

        albu.IAAAdditiveGaussianNoise(p=0.2), # adding some gaussian noise with 20% probability

        #the one of block helps us to apply any one of the operations inside the block with a given probability
        albu.OneOf(
            [
                #some color transformations
                albu.CLAHE(p=1),
                albu.RandomBrightness(p=1),
                albu.RandomGamma(p=1),
            ],
            p=0.9,
        ),

        albu.OneOf(
            [
                # image blurring
                albu.IAASharpen(p=1),
                albu.Blur(blur_limit=3, p=1),
                albu.MotionBlur(blur_limit=3, p=1),
            ],
            p=0.9,
        ),

        albu.OneOf(
            [
               #changing the hue and saturation values
                albu.RandomContrast(p=1),
                albu.HueSaturationValue(p=1),
            ],
            p=0.9,
        ),
    ]
    return albu.Compose(train_transform)

OUR NEW IMAGES 👻

In [18]:
augmented_dataset = TreeSegmentationDataset(x_train_dir, y_train_dir, augmentation = get_training_augmentation())

for i in range(5): #visualize any 5 images and their mask
  image, mask = augmented_dataset[1]
  visualize(image = image, mask = mask)