Convolutional Neural Networks #
Applications Include#
Image Classification:
Object Recognition: Identifying objects within an image (e.g., recognizing dogs, cats, cars).
Scene Classification: Categorizing the scene depicted in an image (e.g., beach, forest, city).
Object Detection:
Face Detection: Detecting human faces in images for applications like security, photo tagging, and more.
Pedestrian Detection: Used in autonomous driving and surveillance systems.
Image Segmentation:
Semantic Segmentation: Classifying each pixel in an image into a category (e.g., separating roads, buildings, cars in a cityscape).
Instance Segmentation: Distinguishing between different instances of objects in an image (e.g., identifying and separating multiple people).
Medical Imaging:
Disease Diagnosis: Analyzing medical images (e.g., X-rays, MRIs) to detect diseases like cancer, Alzheimer’s, and retinal diseases.
Organ Segmentation: Identifying and segmenting organs in medical scans for surgical planning and diagnosis.
Self-driving Cars:
Road Sign Recognition: Detecting and classifying traffic signs.
Lane Detection: Identifying lane boundaries on the road.
Obstacle Detection: Detecting pedestrians, other vehicles, and obstacles.
Facial Recognition and Authentication:
Face Recognition: Identifying or verifying a person from an image or video frame.
Emotion Recognition: Analyzing facial expressions to determine emotions.
Robotics and Automation:
Object Grasping: Identifying objects and determining how to grasp them in robotic manipulation.
Navigation: Helping robots understand their environment for autonomous navigation.
Augmented Reality (AR) and Virtual Reality (VR):
Object Tracking: Tracking objects in real-time to augment digital information onto the real world.
Environment Understanding: Understanding and mapping the environment for immersive VR experiences.
Natural Language Processing (NLP):
Text Classification: Classifying text into categories (e.g., spam detection, sentiment analysis).
Character Recognition: Recognizing handwritten or printed text from images (e.g., OCR).
Art and Creativity:
Style Transfer: Applying the style of one image to another (e.g., making a photo look like a painting).
Image Generation: Creating new images from scratch (e.g., GANs for generating realistic photos).
Super-Resolution and Image Enhancement:
Image Denoising: Removing noise from images to enhance quality.
Image Super-Resolution: Increasing the resolution of images for better clarity and detail.
Code Applications#
Example 1 - Fashion MNIST#
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from import DataLoader
import matplotlib.pyplot as plt
import numpy as np
import torch.nn.functional as F
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Define a transform to normalize the data
transform = transforms.Compose([
transforms.Normalize((0.5,), (0.5,))
# Download and load the training and test datasets
train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, transform=transform, download=True)
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
# you can specify batsh size for the test if you plan to check the results at each time of the training process
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)
for X, y in train_loader:
print(f"Shape of X [N, C, H, W]: {X.shape}")
print(f"Shape of y: {y.shape} {y.dtype}")
# Define the class names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Shape of X [N, C, H, W]: torch.Size([60, 1, 28, 28])
Shape of y: torch.Size([60]) torch.int64
# look at some sample images
i = 34
# Function to display images
def imshow(img, label):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.title('Sample of Fashion/Apparel Items')
#plt.title(f'First Item: {class_names[label]}')
# Get some random training images
dataiter = iter(train_loader)
images, labels = next(dataiter)
# Show images
imshow(torchvision.utils.make_grid(images), labels[0])
# Display the name of the label for the first image in the batch
print(f'First Label: {class_names[labels[0]]}')
First Label: Shirt
What happens if we do not use convolutions/feature extractions#
class NeuralNetwork(nn.Module):
def __init__(self):
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 256),
nn.Linear(256, 512),
nn.Linear(512, 10) # output
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=256, bias=True)
(1): ReLU()
(2): Linear(in_features=256, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
X, y =,
# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
if batch % 50 == 0:
loss, current = loss.item(), (batch + 1)*len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y =,
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_loader, model, loss_fn, optimizer)
test(test_loader, model, loss_fn)
The benefit of convolutions:#
# Define the convolutional model
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, 50, kernel_size=3, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
self.layer2 = nn.Sequential(
nn.Conv2d(50, 100, kernel_size=3),
self.fc1 = nn.Linear(100*6*6, 1000)
self.fc2 = nn.Linear(1000, 10)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.view(out.size(0), -1)
out = self.fc1(out)
out = self.fc2(out)
return out
model = CNN().to(device)
# Define the loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Train the model
num_epochs = 5
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images =
labels =
# Forward pass
outputs = model(images)
loss = loss_fn(outputs, labels)
# Backward pass and optimization
if (i+1) % 100 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}')
# Evaluate the model
model.eval() # Set the model to evaluation mode
with torch.no_grad():
correct = 0
total = 0
for images, labels in test_loader:
images =
labels =
outputs = model(images)
_, predicted = torch.max(, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy of the model on the test images: {100 * correct / total:.2f}%')
Accuracy of the model on the test images: 90.53%
epochs = 8
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_loader, model, loss_fn, optimizer)
test(test_loader, model, loss_fn)
Epoch 1
loss: 0.317343 [ 60/60000]
loss: 0.231959 [ 3060/60000]
loss: 0.244312 [ 6060/60000]
loss: 0.281115 [ 9060/60000]
loss: 0.236354 [12060/60000]
loss: 0.261460 [15060/60000]
loss: 0.214243 [18060/60000]
loss: 0.136579 [21060/60000]
loss: 0.212843 [24060/60000]
loss: 0.381205 [27060/60000]
loss: 0.274324 [30060/60000]
loss: 0.379989 [33060/60000]
loss: 0.127020 [36060/60000]
loss: 0.334019 [39060/60000]
loss: 0.230370 [42060/60000]
loss: 0.228620 [45060/60000]
loss: 0.198073 [48060/60000]
loss: 0.182649 [51060/60000]
loss: 0.311502 [54060/60000]
loss: 0.184986 [57060/60000]
Test Error:
Accuracy: 90.8%, Avg loss: 0.259057
Epoch 2
loss: 0.206381 [ 60/60000]
loss: 0.182129 [ 3060/60000]
loss: 0.270273 [ 6060/60000]
loss: 0.239762 [ 9060/60000]
loss: 0.195614 [12060/60000]
loss: 0.351690 [15060/60000]
loss: 0.316848 [18060/60000]
loss: 0.195195 [21060/60000]
loss: 0.115826 [24060/60000]
loss: 0.208101 [27060/60000]
loss: 0.118409 [30060/60000]
loss: 0.148166 [33060/60000]
loss: 0.105474 [36060/60000]
loss: 0.166691 [39060/60000]
loss: 0.071779 [42060/60000]
loss: 0.194438 [45060/60000]
loss: 0.293481 [48060/60000]
loss: 0.247820 [51060/60000]
loss: 0.106475 [54060/60000]
loss: 0.321602 [57060/60000]
Test Error:
Accuracy: 91.8%, Avg loss: 0.236095
Epoch 3
loss: 0.158711 [ 60/60000]
loss: 0.215797 [ 3060/60000]
loss: 0.102631 [ 6060/60000]
loss: 0.183772 [ 9060/60000]
loss: 0.226461 [12060/60000]
loss: 0.219134 [15060/60000]
loss: 0.286669 [18060/60000]
loss: 0.180499 [21060/60000]
loss: 0.189974 [24060/60000]
loss: 0.151949 [27060/60000]
loss: 0.197595 [30060/60000]
loss: 0.252205 [33060/60000]
loss: 0.147336 [36060/60000]
loss: 0.170789 [39060/60000]
loss: 0.269710 [42060/60000]
loss: 0.163103 [45060/60000]
loss: 0.171579 [48060/60000]
loss: 0.203208 [51060/60000]
loss: 0.194353 [54060/60000]
loss: 0.089797 [57060/60000]
Test Error:
Accuracy: 90.7%, Avg loss: 0.249229
Epoch 4
loss: 0.160929 [ 60/60000]
loss: 0.082582 [ 3060/60000]
loss: 0.185329 [ 6060/60000]
loss: 0.239383 [ 9060/60000]
loss: 0.131882 [12060/60000]
loss: 0.116314 [15060/60000]
loss: 0.131452 [18060/60000]
loss: 0.117466 [21060/60000]
loss: 0.230492 [24060/60000]
loss: 0.141239 [27060/60000]
loss: 0.331365 [30060/60000]
loss: 0.078809 [33060/60000]
loss: 0.058115 [36060/60000]
loss: 0.189591 [39060/60000]
loss: 0.359033 [42060/60000]
loss: 0.184165 [45060/60000]
loss: 0.183354 [48060/60000]
loss: 0.203182 [51060/60000]
loss: 0.230083 [54060/60000]
loss: 0.255975 [57060/60000]
Test Error:
Accuracy: 91.9%, Avg loss: 0.229430
Epoch 5
loss: 0.149480 [ 60/60000]
loss: 0.143310 [ 3060/60000]
loss: 0.381732 [ 6060/60000]
loss: 0.154710 [ 9060/60000]
loss: 0.152550 [12060/60000]
loss: 0.115276 [15060/60000]
loss: 0.159885 [18060/60000]
loss: 0.395862 [21060/60000]
loss: 0.184195 [24060/60000]
loss: 0.171736 [27060/60000]
loss: 0.119582 [30060/60000]
loss: 0.332809 [33060/60000]
loss: 0.277288 [36060/60000]
loss: 0.139747 [39060/60000]
loss: 0.184602 [42060/60000]
loss: 0.194151 [45060/60000]
loss: 0.247530 [48060/60000]
loss: 0.154591 [51060/60000]
loss: 0.178196 [54060/60000]
loss: 0.312619 [57060/60000]
Test Error:
Accuracy: 91.7%, Avg loss: 0.229329
Epoch 6
loss: 0.182013 [ 60/60000]
loss: 0.252040 [ 3060/60000]
loss: 0.149042 [ 6060/60000]
loss: 0.328746 [ 9060/60000]
loss: 0.144892 [12060/60000]
loss: 0.244946 [15060/60000]
loss: 0.160196 [18060/60000]
loss: 0.186767 [21060/60000]
loss: 0.117370 [24060/60000]
loss: 0.142045 [27060/60000]
loss: 0.157884 [30060/60000]
loss: 0.161956 [33060/60000]
loss: 0.270296 [36060/60000]
loss: 0.096609 [39060/60000]
loss: 0.249100 [42060/60000]
loss: 0.117343 [45060/60000]
loss: 0.071875 [48060/60000]
loss: 0.102485 [51060/60000]
loss: 0.213012 [54060/60000]
loss: 0.227817 [57060/60000]
Test Error:
Accuracy: 92.0%, Avg loss: 0.229110
Epoch 7
loss: 0.093213 [ 60/60000]
loss: 0.321472 [ 3060/60000]
loss: 0.194250 [ 6060/60000]
loss: 0.331985 [ 9060/60000]
loss: 0.183514 [12060/60000]
loss: 0.173314 [15060/60000]
loss: 0.244925 [18060/60000]
loss: 0.278094 [21060/60000]
loss: 0.287473 [24060/60000]
loss: 0.148316 [27060/60000]
loss: 0.141388 [30060/60000]
loss: 0.152074 [33060/60000]
loss: 0.198113 [36060/60000]
loss: 0.159151 [39060/60000]
loss: 0.196160 [42060/60000]
loss: 0.193496 [45060/60000]
loss: 0.220635 [48060/60000]
loss: 0.122122 [51060/60000]
loss: 0.165055 [54060/60000]
loss: 0.272954 [57060/60000]
Test Error:
Accuracy: 92.1%, Avg loss: 0.228313
Epoch 8
loss: 0.149210 [ 60/60000]
loss: 0.233706 [ 3060/60000]
loss: 0.092144 [ 6060/60000]
loss: 0.312386 [ 9060/60000]
loss: 0.149677 [12060/60000]
loss: 0.345841 [15060/60000]
loss: 0.162732 [18060/60000]
loss: 0.171203 [21060/60000]
loss: 0.222052 [24060/60000]
loss: 0.128415 [27060/60000]
loss: 0.156282 [30060/60000]
loss: 0.282069 [33060/60000]
loss: 0.291207 [36060/60000]
loss: 0.342439 [39060/60000]
loss: 0.071771 [42060/60000]
loss: 0.194775 [45060/60000]
loss: 0.267220 [48060/60000]
loss: 0.212857 [51060/60000]
loss: 0.357989 [54060/60000]
loss: 0.178691 [57060/60000]
Test Error:
Accuracy: 92.1%, Avg loss: 0.233381
Epoch 9
loss: 0.097582 [ 60/60000]
loss: 0.261138 [ 3060/60000]
loss: 0.194356 [ 6060/60000]
loss: 0.157411 [ 9060/60000]
loss: 0.179442 [12060/60000]
loss: 0.143187 [15060/60000]
loss: 0.229545 [18060/60000]
loss: 0.345801 [21060/60000]
loss: 0.294135 [24060/60000]
loss: 0.108510 [27060/60000]
loss: 0.097589 [30060/60000]
loss: 0.089959 [33060/60000]
loss: 0.166250 [36060/60000]
loss: 0.150127 [39060/60000]
loss: 0.193146 [42060/60000]
loss: 0.071435 [45060/60000]
loss: 0.249247 [48060/60000]
loss: 0.113742 [51060/60000]
loss: 0.113938 [54060/60000]
loss: 0.324047 [57060/60000]
Test Error:
Accuracy: 91.8%, Avg loss: 0.241098
Epoch 10
loss: 0.175804 [ 60/60000]
loss: 0.075551 [ 3060/60000]
loss: 0.301739 [ 6060/60000]
loss: 0.112284 [ 9060/60000]
loss: 0.225511 [12060/60000]
loss: 0.211033 [15060/60000]
loss: 0.051665 [18060/60000]
loss: 0.072662 [21060/60000]
loss: 0.121633 [24060/60000]
loss: 0.237526 [27060/60000]
loss: 0.146027 [30060/60000]
loss: 0.200525 [33060/60000]
loss: 0.239444 [36060/60000]
loss: 0.141680 [39060/60000]
loss: 0.026968 [42060/60000]
loss: 0.306713 [45060/60000]
loss: 0.062141 [48060/60000]
loss: 0.169438 [51060/60000]
loss: 0.193799 [54060/60000]
loss: 0.250434 [57060/60000]
Test Error:
Accuracy: 91.7%, Avg loss: 0.233575
Exercise: Apply similar designs for the US Postal Service data (MNIST), and the handwritten letters, EMNIST data.#
# Download and load the training and test datasets
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
# you can specify batsh size for the test if you plan to check the results at each time of the training process
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)
# Function to display images
def imshow(img, label):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.title('Sample of handwritten digits')
#plt.title(f'First Item: {label}')
# Get some random training images
dataiter = iter(train_loader)
images, labels = next(dataiter)
# Show images
imshow(torchvision.utils.make_grid(images), labels[0])
# Display the name of the label for the first image in the batch
print(f'First Label: {labels[0]}')
First Label: 1