Convolutional Neural Networks #
Introduction#
Source: https://www.superannotate.com/blog/guide-to-convolutional-neural-networks
Applications Include#
Image Classification:
Object Recognition: Identifying objects within an image (e.g., recognizing dogs, cats, cars).
Scene Classification: Categorizing the scene depicted in an image (e.g., beach, forest, city).
Object Detection:
Face Detection: Detecting human faces in images for applications like security, photo tagging, and more.
Pedestrian Detection: Used in autonomous driving and surveillance systems.
Image Segmentation:
Semantic Segmentation: Classifying each pixel in an image into a category (e.g., separating roads, buildings, cars in a cityscape).
Instance Segmentation: Distinguishing between different instances of objects in an image (e.g., identifying and separating multiple people).
Medical Imaging:
Disease Diagnosis: Analyzing medical images (e.g., X-rays, MRIs) to detect diseases like cancer, Alzheimer’s, and retinal diseases.
Organ Segmentation: Identifying and segmenting organs in medical scans for surgical planning and diagnosis.
Self-driving Cars:
Road Sign Recognition: Detecting and classifying traffic signs.
Lane Detection: Identifying lane boundaries on the road.
Obstacle Detection: Detecting pedestrians, other vehicles, and obstacles.
Facial Recognition and Authentication:
Face Recognition: Identifying or verifying a person from an image or video frame.
Emotion Recognition: Analyzing facial expressions to determine emotions.
Robotics and Automation:
Object Grasping: Identifying objects and determining how to grasp them in robotic manipulation.
Navigation: Helping robots understand their environment for autonomous navigation.
Augmented Reality (AR) and Virtual Reality (VR):
Object Tracking: Tracking objects in real-time to augment digital information onto the real world.
Environment Understanding: Understanding and mapping the environment for immersive VR experiences.
Natural Language Processing (NLP):
Text Classification: Classifying text into categories (e.g., spam detection, sentiment analysis).
Character Recognition: Recognizing handwritten or printed text from images (e.g., OCR).
Art and Creativity:
Style Transfer: Applying the style of one image to another (e.g., making a photo look like a painting).
Image Generation: Creating new images from scratch (e.g., GANs for generating realistic photos).
Super-Resolution and Image Enhancement:
Image Denoising: Removing noise from images to enhance quality.
Image Super-Resolution: Increasing the resolution of images for better clarity and detail.
Code Applications#
Example 1 - Fashion MNIST#
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
import torch.nn.functional as F
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Define a transform to normalize the data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
# Download and load the training and test datasets
train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, transform=transform, download=True)
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
# you can specify batsh size for the test if you plan to check the results at each time of the training process
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)
for X, y in train_loader:
print(f"Shape of X [N, C, H, W]: {X.shape}")
print(f"Shape of y: {y.shape} {y.dtype}")
break
# Define the class names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Shape of X [N, C, H, W]: torch.Size([60, 1, 28, 28])
Shape of y: torch.Size([60]) torch.int64
# look at some sample images
i = 34
plt.imshow(X[i,0,:],cmap='gray')
plt.show()
print(class_names[y[i].item()])
Dress
# Function to display images
def imshow(img, label):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.title('Sample of Fashion/Apparel Items')
#plt.title(f'First Item: {class_names[label]}')
plt.show()
# Get some random training images
dataiter = iter(train_loader)
images, labels = next(dataiter)
# Show images
imshow(torchvision.utils.make_grid(images), labels[0])
# Display the name of the label for the first image in the batch
print(f'First Label: {class_names[labels[0]]}')
First Label: Shirt
What happens if we do not use convolutions/feature extractions#
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 256),
nn.ReLU(),
nn.Linear(256, 512),
nn.ReLU(),
nn.Linear(512, 10) # output
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
print(model)
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=256, bias=True)
(1): ReLU()
(2): Linear(in_features=256, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
)
)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
loss.backward()
optimizer.step()
optimizer.zero_grad()
if batch % 50 == 0:
loss, current = loss.item(), (batch + 1)*len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_loader, model, loss_fn, optimizer)
test(test_loader, model, loss_fn)
print("Done!")
The benefit of convolutions:#
# Define the convolutional model
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, 50, kernel_size=3, padding=1),
nn.BatchNorm2d(50),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Dropout(p=0.25)
)
self.layer2 = nn.Sequential(
nn.Conv2d(50, 100, kernel_size=3),
nn.BatchNorm2d(100),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Dropout(p=0.25)
)
self.fc1 = nn.Linear(100*6*6, 1000)
self.fc2 = nn.Linear(1000, 10)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.view(out.size(0), -1)
out = self.fc1(out)
out = self.fc2(out)
return out
model = CNN().to(device)
# Define the loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Train the model
num_epochs = 5
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images = images.to(device)
labels = labels.to(device)
# Forward pass
outputs = model(images)
loss = loss_fn(outputs, labels)
# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i+1) % 100 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}')
# Evaluate the model
model.eval() # Set the model to evaluation mode
with torch.no_grad():
correct = 0
total = 0
for images, labels in test_loader:
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy of the model on the test images: {100 * correct / total:.2f}%')
Accuracy of the model on the test images: 90.53%
epochs = 8
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_loader, model, loss_fn, optimizer)
test(test_loader, model, loss_fn)
print("Done!")
Epoch 1
-------------------------------
loss: 0.317343 [ 60/60000]
loss: 0.231959 [ 3060/60000]
loss: 0.244312 [ 6060/60000]
loss: 0.281115 [ 9060/60000]
loss: 0.236354 [12060/60000]
loss: 0.261460 [15060/60000]
loss: 0.214243 [18060/60000]
loss: 0.136579 [21060/60000]
loss: 0.212843 [24060/60000]
loss: 0.381205 [27060/60000]
loss: 0.274324 [30060/60000]
loss: 0.379989 [33060/60000]
loss: 0.127020 [36060/60000]
loss: 0.334019 [39060/60000]
loss: 0.230370 [42060/60000]
loss: 0.228620 [45060/60000]
loss: 0.198073 [48060/60000]
loss: 0.182649 [51060/60000]
loss: 0.311502 [54060/60000]
loss: 0.184986 [57060/60000]
Test Error:
Accuracy: 90.8%, Avg loss: 0.259057
Epoch 2
-------------------------------
loss: 0.206381 [ 60/60000]
loss: 0.182129 [ 3060/60000]
loss: 0.270273 [ 6060/60000]
loss: 0.239762 [ 9060/60000]
loss: 0.195614 [12060/60000]
loss: 0.351690 [15060/60000]
loss: 0.316848 [18060/60000]
loss: 0.195195 [21060/60000]
loss: 0.115826 [24060/60000]
loss: 0.208101 [27060/60000]
loss: 0.118409 [30060/60000]
loss: 0.148166 [33060/60000]
loss: 0.105474 [36060/60000]
loss: 0.166691 [39060/60000]
loss: 0.071779 [42060/60000]
loss: 0.194438 [45060/60000]
loss: 0.293481 [48060/60000]
loss: 0.247820 [51060/60000]
loss: 0.106475 [54060/60000]
loss: 0.321602 [57060/60000]
Test Error:
Accuracy: 91.8%, Avg loss: 0.236095
Epoch 3
-------------------------------
loss: 0.158711 [ 60/60000]
loss: 0.215797 [ 3060/60000]
loss: 0.102631 [ 6060/60000]
loss: 0.183772 [ 9060/60000]
loss: 0.226461 [12060/60000]
loss: 0.219134 [15060/60000]
loss: 0.286669 [18060/60000]
loss: 0.180499 [21060/60000]
loss: 0.189974 [24060/60000]
loss: 0.151949 [27060/60000]
loss: 0.197595 [30060/60000]
loss: 0.252205 [33060/60000]
loss: 0.147336 [36060/60000]
loss: 0.170789 [39060/60000]
loss: 0.269710 [42060/60000]
loss: 0.163103 [45060/60000]
loss: 0.171579 [48060/60000]
loss: 0.203208 [51060/60000]
loss: 0.194353 [54060/60000]
loss: 0.089797 [57060/60000]
Test Error:
Accuracy: 90.7%, Avg loss: 0.249229
Epoch 4
-------------------------------
loss: 0.160929 [ 60/60000]
loss: 0.082582 [ 3060/60000]
loss: 0.185329 [ 6060/60000]
loss: 0.239383 [ 9060/60000]
loss: 0.131882 [12060/60000]
loss: 0.116314 [15060/60000]
loss: 0.131452 [18060/60000]
loss: 0.117466 [21060/60000]
loss: 0.230492 [24060/60000]
loss: 0.141239 [27060/60000]
loss: 0.331365 [30060/60000]
loss: 0.078809 [33060/60000]
loss: 0.058115 [36060/60000]
loss: 0.189591 [39060/60000]
loss: 0.359033 [42060/60000]
loss: 0.184165 [45060/60000]
loss: 0.183354 [48060/60000]
loss: 0.203182 [51060/60000]
loss: 0.230083 [54060/60000]
loss: 0.255975 [57060/60000]
Test Error:
Accuracy: 91.9%, Avg loss: 0.229430
Epoch 5
-------------------------------
loss: 0.149480 [ 60/60000]
loss: 0.143310 [ 3060/60000]
loss: 0.381732 [ 6060/60000]
loss: 0.154710 [ 9060/60000]
loss: 0.152550 [12060/60000]
loss: 0.115276 [15060/60000]
loss: 0.159885 [18060/60000]
loss: 0.395862 [21060/60000]
loss: 0.184195 [24060/60000]
loss: 0.171736 [27060/60000]
loss: 0.119582 [30060/60000]
loss: 0.332809 [33060/60000]
loss: 0.277288 [36060/60000]
loss: 0.139747 [39060/60000]
loss: 0.184602 [42060/60000]
loss: 0.194151 [45060/60000]
loss: 0.247530 [48060/60000]
loss: 0.154591 [51060/60000]
loss: 0.178196 [54060/60000]
loss: 0.312619 [57060/60000]
Test Error:
Accuracy: 91.7%, Avg loss: 0.229329
Epoch 6
-------------------------------
loss: 0.182013 [ 60/60000]
loss: 0.252040 [ 3060/60000]
loss: 0.149042 [ 6060/60000]
loss: 0.328746 [ 9060/60000]
loss: 0.144892 [12060/60000]
loss: 0.244946 [15060/60000]
loss: 0.160196 [18060/60000]
loss: 0.186767 [21060/60000]
loss: 0.117370 [24060/60000]
loss: 0.142045 [27060/60000]
loss: 0.157884 [30060/60000]
loss: 0.161956 [33060/60000]
loss: 0.270296 [36060/60000]
loss: 0.096609 [39060/60000]
loss: 0.249100 [42060/60000]
loss: 0.117343 [45060/60000]
loss: 0.071875 [48060/60000]
loss: 0.102485 [51060/60000]
loss: 0.213012 [54060/60000]
loss: 0.227817 [57060/60000]
Test Error:
Accuracy: 92.0%, Avg loss: 0.229110
Epoch 7
-------------------------------
loss: 0.093213 [ 60/60000]
loss: 0.321472 [ 3060/60000]
loss: 0.194250 [ 6060/60000]
loss: 0.331985 [ 9060/60000]
loss: 0.183514 [12060/60000]
loss: 0.173314 [15060/60000]
loss: 0.244925 [18060/60000]
loss: 0.278094 [21060/60000]
loss: 0.287473 [24060/60000]
loss: 0.148316 [27060/60000]
loss: 0.141388 [30060/60000]
loss: 0.152074 [33060/60000]
loss: 0.198113 [36060/60000]
loss: 0.159151 [39060/60000]
loss: 0.196160 [42060/60000]
loss: 0.193496 [45060/60000]
loss: 0.220635 [48060/60000]
loss: 0.122122 [51060/60000]
loss: 0.165055 [54060/60000]
loss: 0.272954 [57060/60000]
Test Error:
Accuracy: 92.1%, Avg loss: 0.228313
Epoch 8
-------------------------------
loss: 0.149210 [ 60/60000]
loss: 0.233706 [ 3060/60000]
loss: 0.092144 [ 6060/60000]
loss: 0.312386 [ 9060/60000]
loss: 0.149677 [12060/60000]
loss: 0.345841 [15060/60000]
loss: 0.162732 [18060/60000]
loss: 0.171203 [21060/60000]
loss: 0.222052 [24060/60000]
loss: 0.128415 [27060/60000]
loss: 0.156282 [30060/60000]
loss: 0.282069 [33060/60000]
loss: 0.291207 [36060/60000]
loss: 0.342439 [39060/60000]
loss: 0.071771 [42060/60000]
loss: 0.194775 [45060/60000]
loss: 0.267220 [48060/60000]
loss: 0.212857 [51060/60000]
loss: 0.357989 [54060/60000]
loss: 0.178691 [57060/60000]
Test Error:
Accuracy: 92.1%, Avg loss: 0.233381
Epoch 9
-------------------------------
loss: 0.097582 [ 60/60000]
loss: 0.261138 [ 3060/60000]
loss: 0.194356 [ 6060/60000]
loss: 0.157411 [ 9060/60000]
loss: 0.179442 [12060/60000]
loss: 0.143187 [15060/60000]
loss: 0.229545 [18060/60000]
loss: 0.345801 [21060/60000]
loss: 0.294135 [24060/60000]
loss: 0.108510 [27060/60000]
loss: 0.097589 [30060/60000]
loss: 0.089959 [33060/60000]
loss: 0.166250 [36060/60000]
loss: 0.150127 [39060/60000]
loss: 0.193146 [42060/60000]
loss: 0.071435 [45060/60000]
loss: 0.249247 [48060/60000]
loss: 0.113742 [51060/60000]
loss: 0.113938 [54060/60000]
loss: 0.324047 [57060/60000]
Test Error:
Accuracy: 91.8%, Avg loss: 0.241098
Epoch 10
-------------------------------
loss: 0.175804 [ 60/60000]
loss: 0.075551 [ 3060/60000]
loss: 0.301739 [ 6060/60000]
loss: 0.112284 [ 9060/60000]
loss: 0.225511 [12060/60000]
loss: 0.211033 [15060/60000]
loss: 0.051665 [18060/60000]
loss: 0.072662 [21060/60000]
loss: 0.121633 [24060/60000]
loss: 0.237526 [27060/60000]
loss: 0.146027 [30060/60000]
loss: 0.200525 [33060/60000]
loss: 0.239444 [36060/60000]
loss: 0.141680 [39060/60000]
loss: 0.026968 [42060/60000]
loss: 0.306713 [45060/60000]
loss: 0.062141 [48060/60000]
loss: 0.169438 [51060/60000]
loss: 0.193799 [54060/60000]
loss: 0.250434 [57060/60000]
Test Error:
Accuracy: 91.7%, Avg loss: 0.233575
Done!
Exercise: Apply similar designs for the US Postal Service data (MNIST), and the handwritten letters, EMNIST data.#
# Download and load the training and test datasets
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
# you can specify batsh size for the test if you plan to check the results at each time of the training process
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:01<00:00, 5079224.31it/s]
Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 134439.42it/s]
Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:01<00:00, 1257690.00it/s]
Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 5486903.45it/s]
Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw
# Function to display images
def imshow(img, label):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.title('Sample of handwritten digits')
#plt.title(f'First Item: {label}')
plt.show()
# Get some random training images
dataiter = iter(train_loader)
images, labels = next(dataiter)
# Show images
imshow(torchvision.utils.make_grid(images), labels[0])
# Display the name of the label for the first image in the batch
print(f'First Label: {labels[0]}')
First Label: 1