Image Classification of CIFAR100 dataset in PyTorch

Image Classification involves around extraction of classes from all the pixels in a digital image. In this story, we are going into classify the images from cifar100 dataset using Convolutional Neural Networks.

Before going further into the story, I would like to thank jovian ai for providing opportunity to everyone who want to learn something new at no cost. You can visit their site here.

Now let’s get into it.

Introduction to Convolutional Neural Networks

The idea between convolutions is the use of image kernel. The 2D convolution is a fairly simple operation at heart: you start with a kernel, which is simply a small matrix of weights.

This kernel “slides” over the 2D input data, performing an elementwise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel.

Exploring the CIFAR100 Dataset

CIFAR100 Dataset has 100 classes with 600 images in each. There are 500 training images and 100 testing images per class. The 100 classes are further grouped into 20 superclasses.

Downloading the dataset

First, we would import important functions and libraries to ease our work.

There can be different ways to download the datasets.

For downloading it through url,

But if you want to download a dataset from kaggle, you need to first import the opendatasets library.

For example,

While downloading the dataset, you will be asked to provide your Kaggle username and credentials, which you can obtain using the “Create New API Token” button on your account page on Kaggle. Upload the kaggle.json notebook using the files tab or enter the username and key manually when prompted.

We can create training and validation datasets using the ImageFolder class from torchvision. In addition to the ToTensor transform, we'll also apply some other transforms to the images.

There are a few important changes we'll make while creating PyTorch datasets for training and validation:

  1. Use test set for validation: Instead of setting aside a fraction (e.g. 10%) of the data from the training set for validation, we’ll simply use the test set as our validation set.In general, once you have picked the best model architecture & hypeparameters using a fixed validation set, it is a good idea to retrain the same model on the entire dataset just to give it a small final boost in performance.
  2. Channel-wise data normalization: We will normalize the image tensors by subtracting the mean and dividing by the standard deviation across each channel. As a result, the mean of the data across each channel is 0, and standard deviation is 1. Normalizing the data prevents the values from any one channel from disproportionately affecting the losses and gradients while training, simply by having a higher or wider range of values that others.
  3. Randomized data augmentations: We will apply randomly chosen transformations while loading images from the training dataset. Specifically, we will pad each image by 4 pixels, and then take a random crop of size 32 x 32 pixels, and then flip the image horizontally with a 50% probability. Since the transformation will be applied randomly and dynamically each time a particular image is loaded, the model sees slightly different images in each epoch of training, which allows it generalize better.

We are taking the batch size of 64.

But I would suggest to take a relatively large batch size(say 400 or 500).You can try reducing the batch size & restarting the kernel if you face an “out of memory” error.

Next, we can create data loaders for retrieving images in batches.

Let’s take a look at some sample images from the training dataloader. To display the images, we’ll need to denormalize the pixels values to bring them back into the range (0,1).

Let’s see the batch result.

The colors seem out of place because of the normalization. Note that normalization is also applied during inference.

Using a GPU

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device & to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required. These are described in more detail in a previous tutorial.

In [17]:

Based on where you’re running this notebook, your default device could be a CPU (torch.device('cpu')) or a GPU (torch.device('cuda'))

In [18]:

Out[18]:

We can now wrap our training and validation data loaders using DeviceDataLoader for automatically transferring batches of data to the GPU (if available).

In [19]:

Model with Residual Blocks and Batch Normalization

One of the key changes to our CNN model this time is the addition of the resudial block, which adds the original input back to the output feature map obtained by passing the input through one or more convolutional layers.

Here is a very simple Residual block:

In [21]:

torch.Size([64, 3, 32, 32]) torch.Size([64, 3, 32, 32])

In [22]:

In [23]:

We would be applying the model.

Training the model

Before we train the model, we’re going to make a bunch of small but important improvements to our fit function:

  • Learning rate scheduling: Instead of using a fixed learning rate, we will use a learning rate scheduler, which will change the learning rate after every batch of training. There are many strategies for varying the learning rate during training, and the one we’ll use is called the “One Cycle Learning Rate Policy”, which involves starting with a low learning rate, gradually increasing it batch-by-batch to a high learning rate for about 30% of epochs, then gradually decreasing it to a very low value for the remaining epochs.
  • Weight decay: We also use weight decay, which is yet another regularization technique which prevents the weights from becoming too large by adding an additional term to the loss function.
  • Gradient clipping: Apart from the layer weights and outputs, it also helpful to limit the values of gradients to a small range to prevent undesirable changes in parameters due to large gradient values. This simple yet effective technique is called gradient clipping.

Let’s define a fit_one_cycle function to incorporate these changes. We'll also record the learning rate used for each batch.

In [27]:

Out[27]:

We’re now ready to train our model. Instead of SGD (stochastic gradient descent), we’ll use the Adam optimizer which uses techniques like momentum and adaptive learning rates for faster training. You can learn more about optimizers here: https://ruder.io/optimizing-gradient-descent/index.html

In [28]:

In [29]:

Epoch [0], last_lr: 0.00395, train_loss: 2.4573, val_loss: 2.5939, val_acc: 0.2671

Epoch [1], last_lr: 0.00936, train_loss: 2.0291, val_loss: 1.9433, val_acc: 0.3953

Epoch [2], last_lr: 0.00972, train_loss: 1.7109, val_loss: 1.8736, val_acc: 0.4415

Epoch [3], last_lr: 0.00812, train_loss: 1.5905, val_loss: 1.6120, val_acc: 0.5064

Epoch [4], last_lr: 0.00556, train_loss: 1.4769, val_loss: 1.4013, val_acc: 0.5531

Epoch [5], last_lr: 0.00283, train_loss: 1.3077, val_loss: 1.1090, val_acc: 0.6483

Epoch [6], last_lr: 0.00077, train_loss: 1.0841, val_loss: 0.9400, val_acc: 0.6950

Epoch [7], last_lr: 0.00000, train_loss: 0.8953, val_loss: 0.8766, val_acc: 0.7175

CPU times: user 5h 50min 20s, sys: 1min 59s, total: 5h 52min 19s

Wall time: 5h 54min 53s

In [56]:

Our model trained to over **71% accuracy.

Let’s plot the valdation set accuracies to study how the model improves over time.

In [31]:

In [32]:

We can also plot the training and validation losses to study the trend.

In [34]:

In [35]:

It’s clear from the trend that our model isn’t overfitting to the training data just yet. Try removing batch normalization, data augmentation and residual layers one by one to study their effect on overfitting.

Finally, let’s visualize how the learning rate changed over time, batch-by-batch over all the epochs.

In [36]:

In [37]:

As expected, the learning rate starts at a low value, and gradually increases for 30% of the iterations to a maximum value of 0.01, and then gradually decreases to a very small value.

Testing with individual images

While we have been tracking the overall accuracy of a model so far, it’s also a good idea to look at model’s results on some sample images. Let’s test out our model with some images from the predefined test dataset of 10000 images.

In [38]:

In [61]:

Label: aquatic_mammals , Predicted: aquatic_mammals

In [45]:

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Label: flowers , Predicted: flowers

In [58]:

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Label: medium_mammals , Predicted: medium_mammals

Identifying where our model performs poorly can help us improve the model, by collecting more training data, increasing/decreasing the complexity of the model, and changing the hypeparameters.

Save and Commit

Let’s save the weights of the model, record the hyperparameters, and commit our experiment to Jovian.

In [52]:

In [53]:

In [54]:

[jovian] Hyperparams logged.

In [62]:

[jovian] Metrics logged.

Conclusion

Image classification is a major utility for future data science projects.Deep Learning CNN enhances the image classification model by giving us a 71% of accuracy at least time. Although, It can be agreed by everyone that the deep learning model was too complex and it can be simplified for getting a better result in less time.

Further Reading

This story is a part of a deep learning final project in Jovian AI.

  1. Intuitively understanding Convolutions for Deep Learning by Irhum Shafkat

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store