Image Classification of CIFAR100 dataset in PyTorch
Image Classification involves around extraction of classes from all the pixels in a digital image. In this story, we are going into classify the images from cifar100 dataset using Convolutional Neural Networks.
Before going further into the story, I would like to thank jovian ai for providing opportunity to everyone who want to learn something new at no cost. You can visit their site here.
Now let’s get into it.
Introduction to Convolutional Neural Networks
The idea between convolutions is the use of image kernel. The 2D convolution is a fairly simple operation at heart: you start with a kernel, which is simply a small matrix of weights.

This kernel “slides” over the 2D input data, performing an elementwise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel.
Exploring the CIFAR100 Dataset

CIFAR100 Dataset has 100 classes with 600 images in each. There are 500 training images and 100 testing images per class. The 100 classes are further grouped into 20 superclasses.
Downloading the dataset
First, we would import important functions and libraries to ease our work.
import os
import torch
import torchvision
import tarfile
import torch.nn as nn
import numpy as np
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
import torchvision.transforms as tt
from torch.utils.data import random_split
from torchvision.utils import make_grid
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.rcParams['figure.facecolor'] = '#ffffff'
There can be different ways to download the datasets.
For downloading it through url,
# Dowload the dataset
dataset_url = "https://s3.amazonaws.com/fast-ai-imageclas/cifar100.tgz"
download_url(dataset_url, '.')
But if you want to download a dataset from kaggle, you need to first import the opendatasets library.
!pip install opendatasets --upgrade --quietimport opendatasets as od
For example,
od.download('https://www.kaggle.com/asdasdasasdas/garbage-classification')
While downloading the dataset, you will be asked to provide your Kaggle username and credentials, which you can obtain using the “Create New API Token” button on your account page on Kaggle. Upload the kaggle.json
notebook using the files tab or enter the username and key manually when prompted.
# Extract from archive
with tarfile.open('./cifar100.tgz', 'r:gz') as tar:
tar.extractall(path='./data')data_dir = './data/cifar100'
print(os.listdir(data_dir))
classes = os.listdir(data_dir + "/train")
print(classes)
We can create training and validation datasets using the ImageFolder
class from torchvision
. In addition to the ToTensor
transform, we'll also apply some other transforms to the images.
# PyTorch datasets
train_ds = ImageFolder(data_dir+'/train', train_tfms)
valid_ds = ImageFolder(data_dir+'/test', valid_tfms)
There are a few important changes we'll make while creating PyTorch datasets for training and validation:
- Use test set for validation: Instead of setting aside a fraction (e.g. 10%) of the data from the training set for validation, we’ll simply use the test set as our validation set.In general, once you have picked the best model architecture & hypeparameters using a fixed validation set, it is a good idea to retrain the same model on the entire dataset just to give it a small final boost in performance.
- Channel-wise data normalization: We will normalize the image tensors by subtracting the mean and dividing by the standard deviation across each channel. As a result, the mean of the data across each channel is 0, and standard deviation is 1. Normalizing the data prevents the values from any one channel from disproportionately affecting the losses and gradients while training, simply by having a higher or wider range of values that others.
- Randomized data augmentations: We will apply randomly chosen transformations while loading images from the training dataset. Specifically, we will pad each image by 4 pixels, and then take a random crop of size 32 x 32 pixels, and then flip the image horizontally with a 50% probability. Since the transformation will be applied randomly and dynamically each time a particular image is loaded, the model sees slightly different images in each epoch of training, which allows it generalize better.
# Data transforms (normalization & data augmentation)
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_tfms = tt.Compose([tt.RandomCrop(32, padding=4, padding_mode='reflect'),
tt.RandomHorizontalFlip(),
# tt.RandomRotate
# tt.RandomResizedCrop(256, scale=(0.5,0.9), ratio=(1, 1)),
# tt.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
tt.ToTensor(),
tt.Normalize(*stats,inplace=True)])
valid_tfms = tt.Compose([tt.ToTensor(), tt.Normalize(*stats)])
We are taking the batch size of 64.
But I would suggest to take a relatively large batch size(say 400 or 500).You can try reducing the batch size & restarting the kernel if you face an “out of memory” error.
batch_size=64
Next, we can create data loaders for retrieving images in batches.
# PyTorch data loaders
train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=3, pin_memory=True)
valid_dl = DataLoader(valid_ds, batch_size*2, num_workers=3, pin_memory=True)
Let’s take a look at some sample images from the training dataloader. To display the images, we’ll need to denormalize the pixels values to bring them back into the range (0,1)
.
def denormalize(images, means, stds):
means = torch.tensor(means).reshape(1, 3, 1, 1)
stds = torch.tensor(stds).reshape(1, 3, 1, 1)
return images * stds + means
def show_batch(dl):
for images, labels in dl:
fig, ax = plt.subplots(figsize=(12, 12))
ax.set_xticks([]); ax.set_yticks([])
denorm_images = denormalize(images, *stats)
ax.imshow(make_grid(denorm_images[:64], nrow=8).permute(1, 2, 0).clamp(0,1))
break
Let’s see the batch result.
show_batch(train_dl)

The colors seem out of place because of the normalization. Note that normalization is also applied during inference.
Using a GPU
To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device
& to_device
) and a helper class DeviceDataLoader
to move our model & data to the GPU as required. These are described in more detail in a previous tutorial.
In [17]:
def get_default_device():
"""Pick GPU if available, else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')
def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def __init__(self, dl, device):
self.dl = dl
self.device = device
def __iter__(self):
"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device) def __len__(self):
"""Number of batches"""
return len(self.dl)
Based on where you’re running this notebook, your default device could be a CPU (torch.device('cpu')
) or a GPU (torch.device('cuda')
)
In [18]:
device = get_default_device()
device
Out[18]:
device(type='cpu')
We can now wrap our training and validation data loaders using DeviceDataLoader
for automatically transferring batches of data to the GPU (if available).
In [19]:
train_dl = DeviceDataLoader(train_dl, device)
valid_dl = DeviceDataLoader(valid_dl, device)
Model with Residual Blocks and Batch Normalization
One of the key changes to our CNN model this time is the addition of the resudial block, which adds the original input back to the output feature map obtained by passing the input through one or more convolutional layers.

Here is a very simple Residual block:
class SimpleResidualBlock(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
self.relu1 = nn.ReLU()
self.conv2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1)
self.relu2 = nn.ReLU()
def forward(self, x):
out = self.conv1(x)
out = self.relu1(out)
out = self.conv2(out)
return self.relu2(out) + x # ReLU can be applied before or after adding the input
In [21]:
simple_resnet = to_device(SimpleResidualBlock(), device)for images, labels in train_dl:
print(images.shape)
out = simple_resnet(images)
print(out.shape)
break
del simple_resnet, images, labels
torch.cuda.empty_cache()
torch.Size([64, 3, 32, 32]) torch.Size([64, 3, 32, 32])
In [22]:
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))class ImageClassificationBase(nn.Module):
def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss
def validation_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss.detach(), 'val_acc': acc}
def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def epoch_end(self, epoch, result):
print("Epoch [{}], last_lr: {:.5f}, train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_acc']))
In [23]:
def conv_block(in_channels, out_channels, pool=False):
layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)]
if pool: layers.append(nn.MaxPool2d(2))
return nn.Sequential(*layers)class ResNet9(ImageClassificationBase):
def __init__(self, in_channels, num_classes):
super().__init__()
self.conv1 = conv_block(in_channels, 64)
self.conv2 = conv_block(64, 128, pool=True)
self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128))
self.conv3 = conv_block(128, 256, pool=True)
self.conv4 = conv_block(256, 512, pool=True)
self.res2 = nn.Sequential(conv_block(512, 512), conv_block(512, 512))
self.classifier = nn.Sequential(nn.MaxPool2d(4),
nn.Flatten(),
nn.Dropout(0.2),
nn.Linear(512, num_classes))
def forward(self, xb):
out = self.conv1(xb)
out = self.conv2(out)
out = self.res1(out) + out
out = self.conv3(out)
out = self.conv4(out)
out = self.res2(out) + out
out = self.classifier(out)
return out
We would be applying the model.
model = to_device(ResNet9(3, 20), device)
modelResNet9(
(conv1): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv2): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(res1): Sequential(
(0): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(1): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
)
(conv3): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv4): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(res2): Sequential(
(0): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(1): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
)
(classifier): Sequential(
(0): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
(1): Flatten(start_dim=1, end_dim=-1)
(2): Dropout(p=0.2, inplace=False)
(3): Linear(in_features=512, out_features=20, bias=True)
)
)
Training the model
Before we train the model, we’re going to make a bunch of small but important improvements to our fit
function:
- Learning rate scheduling: Instead of using a fixed learning rate, we will use a learning rate scheduler, which will change the learning rate after every batch of training. There are many strategies for varying the learning rate during training, and the one we’ll use is called the “One Cycle Learning Rate Policy”, which involves starting with a low learning rate, gradually increasing it batch-by-batch to a high learning rate for about 30% of epochs, then gradually decreasing it to a very low value for the remaining epochs.
- Weight decay: We also use weight decay, which is yet another regularization technique which prevents the weights from becoming too large by adding an additional term to the loss function.
- Gradient clipping: Apart from the layer weights and outputs, it also helpful to limit the values of gradients to a small range to prevent undesirable changes in parameters due to large gradient values. This simple yet effective technique is called gradient clipping.
Let’s define a fit_one_cycle
function to incorporate these changes. We'll also record the learning rate used for each batch.
@torch.no_grad()
def evaluate(model, val_loader):
model.eval()
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)def get_lr(optimizer):
for param_group in optimizer.param_groups:
return param_group['lr']def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader,
weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD):
torch.cuda.empty_cache()
history = []
# Set up cutom optimizer with weight decay
optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
# Set up one-cycle learning rate scheduler
sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs,
steps_per_epoch=len(train_loader))
for epoch in range(epochs):
# Training Phase
model.train()
train_losses = []
lrs = []
for batch in train_loader:
loss = model.training_step(batch)
train_losses.append(loss)
loss.backward()
# Gradient clipping
if grad_clip:
nn.utils.clip_grad_value_(model.parameters(), grad_clip)
optimizer.step()
optimizer.zero_grad()
# Record & update learning rate
lrs.append(get_lr(optimizer))
sched.step()
# Validation phase
result = evaluate(model, val_loader)
result['train_loss'] = torch.stack(train_losses).mean().item()
result['lrs'] = lrs
model.epoch_end(epoch, result)
history.append(result)
return history
In [27]:
history = [evaluate(model, valid_dl)]
history
Out[27]:
[{'val_acc': 0.061016615480184555, 'val_loss': 2.994988203048706}]
We’re now ready to train our model. Instead of SGD (stochastic gradient descent), we’ll use the Adam optimizer which uses techniques like momentum and adaptive learning rates for faster training. You can learn more about optimizers here: https://ruder.io/optimizing-gradient-descent/index.html
In [28]:
epochs = 8
max_lr = 0.01
grad_clip = 0.1
weight_decay = 1e-4
opt_func = torch.optim.Adam
In [29]:
%%time
history += fit_one_cycle(epochs, max_lr, model, train_dl, valid_dl,
grad_clip=grad_clip,
weight_decay=weight_decay,
opt_func=opt_func)
Epoch [0], last_lr: 0.00395, train_loss: 2.4573, val_loss: 2.5939, val_acc: 0.2671
Epoch [1], last_lr: 0.00936, train_loss: 2.0291, val_loss: 1.9433, val_acc: 0.3953
Epoch [2], last_lr: 0.00972, train_loss: 1.7109, val_loss: 1.8736, val_acc: 0.4415
Epoch [3], last_lr: 0.00812, train_loss: 1.5905, val_loss: 1.6120, val_acc: 0.5064
Epoch [4], last_lr: 0.00556, train_loss: 1.4769, val_loss: 1.4013, val_acc: 0.5531
Epoch [5], last_lr: 0.00283, train_loss: 1.3077, val_loss: 1.1090, val_acc: 0.6483
Epoch [6], last_lr: 0.00077, train_loss: 1.0841, val_loss: 0.9400, val_acc: 0.6950
Epoch [7], last_lr: 0.00000, train_loss: 0.8953, val_loss: 0.8766, val_acc: 0.7175
CPU times: user 5h 50min 20s, sys: 1min 59s, total: 5h 52min 19s
Wall time: 5h 54min 53s
In [56]:
train_time='352:19'
Our model trained to over **71% accuracy.
Let’s plot the valdation set accuracies to study how the model improves over time.
In [31]:
def plot_accuracies(history):
accuracies = [x['val_acc'] for x in history]
plt.plot(accuracies, '-x')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.title('Accuracy vs. No. of epochs');
In [32]:
plot_accuracies(history)

We can also plot the training and validation losses to study the trend.
In [34]:
def plot_losses(history):
train_losses = [x.get('train_loss') for x in history]
val_losses = [x['val_loss'] for x in history]
plt.plot(train_losses, '-bx')
plt.plot(val_losses, '-rx')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.legend(['Training', 'Validation'])
plt.title('Loss vs. No. of epochs');
In [35]:
plot_losses(history)

It’s clear from the trend that our model isn’t overfitting to the training data just yet. Try removing batch normalization, data augmentation and residual layers one by one to study their effect on overfitting.
Finally, let’s visualize how the learning rate changed over time, batch-by-batch over all the epochs.
In [36]:
def plot_lrs(history):
lrs = np.concatenate([x.get('lrs', []) for x in history])
plt.plot(lrs)
plt.xlabel('Batch no.')
plt.ylabel('Learning rate')
plt.title('Learning Rate vs. Batch no.');
In [37]:
plot_lrs(history)

As expected, the learning rate starts at a low value, and gradually increases for 30% of the iterations to a maximum value of 0.01
, and then gradually decreases to a very small value.
Testing with individual images
While we have been tracking the overall accuracy of a model so far, it’s also a good idea to look at model’s results on some sample images. Let’s test out our model with some images from the predefined test dataset of 10000 images.
In [38]:
def predict_image(img, model):
# Convert to a batch of 1
xb = to_device(img.unsqueeze(0), device)
# Get predictions from model
yb = model(xb)
# Pick index with highest probability
_, preds = torch.max(yb, dim=1)
# Retrieve the class label
return train_ds.classes[preds[0].item()]
In [61]:
img, label = valid_ds[10]
plt.imshow(img.permute(1, 2, 0).clamp(0, 1))
print('Label:', train_ds.classes[label], ', Predicted:', predict_image(img, model))

Label: aquatic_mammals , Predicted: aquatic_mammals
In [45]:
img, label = valid_ds[1002]
plt.imshow(img.permute(1, 2, 0))
print('Label:', valid_ds.classes[label], ', Predicted:', predict_image(img, model))

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Label: flowers , Predicted: flowers
In [58]:
img, label = valid_ds[6154]
plt.imshow(img.permute(1, 2, 0))
print('Label:', train_ds.classes[label], ', Predicted:', predict_image(img, model))

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Label: medium_mammals , Predicted: medium_mammals
Identifying where our model performs poorly can help us improve the model, by collecting more training data, increasing/decreasing the complexity of the model, and changing the hypeparameters.
Save and Commit
Let’s save the weights of the model, record the hyperparameters, and commit our experiment to Jovian.
torch.save(model.state_dict(), 'cifar100-resnet9.pth')
In [52]:
!pip install jovian --upgrade --quiet
In [53]:
import jovian
In [54]:
jovian.reset()
jovian.log_hyperparams(arch='resnet9',
epochs=epochs,
lr=max_lr,
scheduler='one-cycle',
weight_decay=weight_decay,
grad_clip=grad_clip,
opt=opt_func.__name__)
[jovian] Hyperparams logged.
In [62]:
jovian.log_metrics(val_loss=history[-1]['val_loss'],
val_acc=history[-1]['val_acc'],
train_loss=history[-1]['train_loss'],
time=train_time)
[jovian] Metrics logged.
torch.save(model.state_dict(), 'cifar100-resnet9.pth')jovian.commit(project=project_name, environment=None, outputs=['cifar100-resnet9.pth'])
Conclusion
Image classification is a major utility for future data science projects.Deep Learning CNN enhances the image classification model by giving us a 71% of accuracy at least time. Although, It can be agreed by everyone that the deep learning model was too complex and it can be simplified for getting a better result in less time.
Further Reading
This story is a part of a deep learning final project in Jovian AI.
- here is the link for the project priyansh213/cifar100-images-classification-project — Jovian.
- You can get the CIFAR100 dataset here.
- Convolutions in Depth by Sylvian Gugger
- Intuitively understanding Convolutions for Deep Learning by Irhum Shafkat