Convolutional Neural Networks #

Introduction#

Slides

Source: https://www.superannotate.com/blog/guide-to-convolutional-neural-networks

Applications Include#

  1. Image Classification:

    • Object Recognition: Identifying objects within an image (e.g., recognizing dogs, cats, cars).

    • Scene Classification: Categorizing the scene depicted in an image (e.g., beach, forest, city).

  2. Object Detection:

    • Face Detection: Detecting human faces in images for applications like security, photo tagging, and more.

    • Pedestrian Detection: Used in autonomous driving and surveillance systems.

  3. Image Segmentation:

    • Semantic Segmentation: Classifying each pixel in an image into a category (e.g., separating roads, buildings, cars in a cityscape).

    • Instance Segmentation: Distinguishing between different instances of objects in an image (e.g., identifying and separating multiple people).

  4. Medical Imaging:

    • Disease Diagnosis: Analyzing medical images (e.g., X-rays, MRIs) to detect diseases like cancer, Alzheimer’s, and retinal diseases.

    • Organ Segmentation: Identifying and segmenting organs in medical scans for surgical planning and diagnosis.

  5. Self-driving Cars:

    • Road Sign Recognition: Detecting and classifying traffic signs.

    • Lane Detection: Identifying lane boundaries on the road.

    • Obstacle Detection: Detecting pedestrians, other vehicles, and obstacles.

  6. Facial Recognition and Authentication:

    • Face Recognition: Identifying or verifying a person from an image or video frame.

    • Emotion Recognition: Analyzing facial expressions to determine emotions.

  7. Robotics and Automation:

    • Object Grasping: Identifying objects and determining how to grasp them in robotic manipulation.

    • Navigation: Helping robots understand their environment for autonomous navigation.

  8. Augmented Reality (AR) and Virtual Reality (VR):

    • Object Tracking: Tracking objects in real-time to augment digital information onto the real world.

    • Environment Understanding: Understanding and mapping the environment for immersive VR experiences.

  9. Natural Language Processing (NLP):

    • Text Classification: Classifying text into categories (e.g., spam detection, sentiment analysis).

    • Character Recognition: Recognizing handwritten or printed text from images (e.g., OCR).

  10. Art and Creativity:

    • Style Transfer: Applying the style of one image to another (e.g., making a photo look like a painting).

    • Image Generation: Creating new images from scratch (e.g., GANs for generating realistic photos).

  11. Super-Resolution and Image Enhancement:

    • Image Denoising: Removing noise from images to enhance quality.

    • Image Super-Resolution: Increasing the resolution of images for better clarity and detail.

Code Applications#


Example 1 - Fashion MNIST#

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
import torch.nn.functional as F
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Define a transform to normalize the data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])
# Download and load the training and test datasets
train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
# you can specify batsh size for the test if you plan to check the results at each time of the training process
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)
for X, y in train_loader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break
# Define the class names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Shape of X [N, C, H, W]: torch.Size([60, 1, 28, 28])
Shape of y: torch.Size([60]) torch.int64
# look at some sample images
i = 34
plt.imshow(X[i,0,:],cmap='gray')
plt.show()
print(class_names[y[i].item()])
_images/df29646be6aac33ad2bcfc99bfa9790a09a8684c5b2d4cd87830ad8d1377b809.png
Dress
# Function to display images
def imshow(img, label):
    img = img / 2 + 0.5  # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.title('Sample of Fashion/Apparel Items')
    #plt.title(f'First Item: {class_names[label]}')
    plt.show()

# Get some random training images
dataiter = iter(train_loader)
images, labels = next(dataiter)

# Show images
imshow(torchvision.utils.make_grid(images), labels[0])

# Display the name of the label for the first image in the batch
print(f'First Label: {class_names[labels[0]]}')
_images/13a4cf3ea682fa32262bfe309a5918b1b24f5932e4d28e7b92537829eba71d1a.png
First Label: Shirt

What happens if we do not use convolutions/feature extractions#

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 10) # output
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 50 == 0:
            loss, current = loss.item(), (batch + 1)*len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_loader, model, loss_fn, optimizer)
    test(test_loader, model, loss_fn)
print("Done!")

The benefit of convolutions:#

# Define the convolutional model
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 50, kernel_size=3, padding=1),
            nn.BatchNorm2d(50),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Dropout(p=0.25)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(50, 100, kernel_size=3),
            nn.BatchNorm2d(100),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(p=0.25)
        )
        self.fc1 = nn.Linear(100*6*6, 1000)
        self.fc2 = nn.Linear(1000, 10)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

model = CNN().to(device)

# Define the loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 5

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = loss_fn(outputs, labels)

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}')
# Evaluate the model
model.eval()  # Set the model to evaluation mode
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Accuracy of the model on the test images: {100 * correct / total:.2f}%')
Accuracy of the model on the test images: 90.53%
epochs = 8
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_loader, model, loss_fn, optimizer)
    test(test_loader, model, loss_fn)
print("Done!")
Epoch 1
-------------------------------
loss: 0.317343  [   60/60000]
loss: 0.231959  [ 3060/60000]
loss: 0.244312  [ 6060/60000]
loss: 0.281115  [ 9060/60000]
loss: 0.236354  [12060/60000]
loss: 0.261460  [15060/60000]
loss: 0.214243  [18060/60000]
loss: 0.136579  [21060/60000]
loss: 0.212843  [24060/60000]
loss: 0.381205  [27060/60000]
loss: 0.274324  [30060/60000]
loss: 0.379989  [33060/60000]
loss: 0.127020  [36060/60000]
loss: 0.334019  [39060/60000]
loss: 0.230370  [42060/60000]
loss: 0.228620  [45060/60000]
loss: 0.198073  [48060/60000]
loss: 0.182649  [51060/60000]
loss: 0.311502  [54060/60000]
loss: 0.184986  [57060/60000]
Test Error: 
 Accuracy: 90.8%, Avg loss: 0.259057 

Epoch 2
-------------------------------
loss: 0.206381  [   60/60000]
loss: 0.182129  [ 3060/60000]
loss: 0.270273  [ 6060/60000]
loss: 0.239762  [ 9060/60000]
loss: 0.195614  [12060/60000]
loss: 0.351690  [15060/60000]
loss: 0.316848  [18060/60000]
loss: 0.195195  [21060/60000]
loss: 0.115826  [24060/60000]
loss: 0.208101  [27060/60000]
loss: 0.118409  [30060/60000]
loss: 0.148166  [33060/60000]
loss: 0.105474  [36060/60000]
loss: 0.166691  [39060/60000]
loss: 0.071779  [42060/60000]
loss: 0.194438  [45060/60000]
loss: 0.293481  [48060/60000]
loss: 0.247820  [51060/60000]
loss: 0.106475  [54060/60000]
loss: 0.321602  [57060/60000]
Test Error: 
 Accuracy: 91.8%, Avg loss: 0.236095 

Epoch 3
-------------------------------
loss: 0.158711  [   60/60000]
loss: 0.215797  [ 3060/60000]
loss: 0.102631  [ 6060/60000]
loss: 0.183772  [ 9060/60000]
loss: 0.226461  [12060/60000]
loss: 0.219134  [15060/60000]
loss: 0.286669  [18060/60000]
loss: 0.180499  [21060/60000]
loss: 0.189974  [24060/60000]
loss: 0.151949  [27060/60000]
loss: 0.197595  [30060/60000]
loss: 0.252205  [33060/60000]
loss: 0.147336  [36060/60000]
loss: 0.170789  [39060/60000]
loss: 0.269710  [42060/60000]
loss: 0.163103  [45060/60000]
loss: 0.171579  [48060/60000]
loss: 0.203208  [51060/60000]
loss: 0.194353  [54060/60000]
loss: 0.089797  [57060/60000]
Test Error: 
 Accuracy: 90.7%, Avg loss: 0.249229 

Epoch 4
-------------------------------
loss: 0.160929  [   60/60000]
loss: 0.082582  [ 3060/60000]
loss: 0.185329  [ 6060/60000]
loss: 0.239383  [ 9060/60000]
loss: 0.131882  [12060/60000]
loss: 0.116314  [15060/60000]
loss: 0.131452  [18060/60000]
loss: 0.117466  [21060/60000]
loss: 0.230492  [24060/60000]
loss: 0.141239  [27060/60000]
loss: 0.331365  [30060/60000]
loss: 0.078809  [33060/60000]
loss: 0.058115  [36060/60000]
loss: 0.189591  [39060/60000]
loss: 0.359033  [42060/60000]
loss: 0.184165  [45060/60000]
loss: 0.183354  [48060/60000]
loss: 0.203182  [51060/60000]
loss: 0.230083  [54060/60000]
loss: 0.255975  [57060/60000]
Test Error: 
 Accuracy: 91.9%, Avg loss: 0.229430 

Epoch 5
-------------------------------
loss: 0.149480  [   60/60000]
loss: 0.143310  [ 3060/60000]
loss: 0.381732  [ 6060/60000]
loss: 0.154710  [ 9060/60000]
loss: 0.152550  [12060/60000]
loss: 0.115276  [15060/60000]
loss: 0.159885  [18060/60000]
loss: 0.395862  [21060/60000]
loss: 0.184195  [24060/60000]
loss: 0.171736  [27060/60000]
loss: 0.119582  [30060/60000]
loss: 0.332809  [33060/60000]
loss: 0.277288  [36060/60000]
loss: 0.139747  [39060/60000]
loss: 0.184602  [42060/60000]
loss: 0.194151  [45060/60000]
loss: 0.247530  [48060/60000]
loss: 0.154591  [51060/60000]
loss: 0.178196  [54060/60000]
loss: 0.312619  [57060/60000]
Test Error: 
 Accuracy: 91.7%, Avg loss: 0.229329 

Epoch 6
-------------------------------
loss: 0.182013  [   60/60000]
loss: 0.252040  [ 3060/60000]
loss: 0.149042  [ 6060/60000]
loss: 0.328746  [ 9060/60000]
loss: 0.144892  [12060/60000]
loss: 0.244946  [15060/60000]
loss: 0.160196  [18060/60000]
loss: 0.186767  [21060/60000]
loss: 0.117370  [24060/60000]
loss: 0.142045  [27060/60000]
loss: 0.157884  [30060/60000]
loss: 0.161956  [33060/60000]
loss: 0.270296  [36060/60000]
loss: 0.096609  [39060/60000]
loss: 0.249100  [42060/60000]
loss: 0.117343  [45060/60000]
loss: 0.071875  [48060/60000]
loss: 0.102485  [51060/60000]
loss: 0.213012  [54060/60000]
loss: 0.227817  [57060/60000]
Test Error: 
 Accuracy: 92.0%, Avg loss: 0.229110 

Epoch 7
-------------------------------
loss: 0.093213  [   60/60000]
loss: 0.321472  [ 3060/60000]
loss: 0.194250  [ 6060/60000]
loss: 0.331985  [ 9060/60000]
loss: 0.183514  [12060/60000]
loss: 0.173314  [15060/60000]
loss: 0.244925  [18060/60000]
loss: 0.278094  [21060/60000]
loss: 0.287473  [24060/60000]
loss: 0.148316  [27060/60000]
loss: 0.141388  [30060/60000]
loss: 0.152074  [33060/60000]
loss: 0.198113  [36060/60000]
loss: 0.159151  [39060/60000]
loss: 0.196160  [42060/60000]
loss: 0.193496  [45060/60000]
loss: 0.220635  [48060/60000]
loss: 0.122122  [51060/60000]
loss: 0.165055  [54060/60000]
loss: 0.272954  [57060/60000]
Test Error: 
 Accuracy: 92.1%, Avg loss: 0.228313 

Epoch 8
-------------------------------
loss: 0.149210  [   60/60000]
loss: 0.233706  [ 3060/60000]
loss: 0.092144  [ 6060/60000]
loss: 0.312386  [ 9060/60000]
loss: 0.149677  [12060/60000]
loss: 0.345841  [15060/60000]
loss: 0.162732  [18060/60000]
loss: 0.171203  [21060/60000]
loss: 0.222052  [24060/60000]
loss: 0.128415  [27060/60000]
loss: 0.156282  [30060/60000]
loss: 0.282069  [33060/60000]
loss: 0.291207  [36060/60000]
loss: 0.342439  [39060/60000]
loss: 0.071771  [42060/60000]
loss: 0.194775  [45060/60000]
loss: 0.267220  [48060/60000]
loss: 0.212857  [51060/60000]
loss: 0.357989  [54060/60000]
loss: 0.178691  [57060/60000]
Test Error: 
 Accuracy: 92.1%, Avg loss: 0.233381 

Epoch 9
-------------------------------
loss: 0.097582  [   60/60000]
loss: 0.261138  [ 3060/60000]
loss: 0.194356  [ 6060/60000]
loss: 0.157411  [ 9060/60000]
loss: 0.179442  [12060/60000]
loss: 0.143187  [15060/60000]
loss: 0.229545  [18060/60000]
loss: 0.345801  [21060/60000]
loss: 0.294135  [24060/60000]
loss: 0.108510  [27060/60000]
loss: 0.097589  [30060/60000]
loss: 0.089959  [33060/60000]
loss: 0.166250  [36060/60000]
loss: 0.150127  [39060/60000]
loss: 0.193146  [42060/60000]
loss: 0.071435  [45060/60000]
loss: 0.249247  [48060/60000]
loss: 0.113742  [51060/60000]
loss: 0.113938  [54060/60000]
loss: 0.324047  [57060/60000]
Test Error: 
 Accuracy: 91.8%, Avg loss: 0.241098 

Epoch 10
-------------------------------
loss: 0.175804  [   60/60000]
loss: 0.075551  [ 3060/60000]
loss: 0.301739  [ 6060/60000]
loss: 0.112284  [ 9060/60000]
loss: 0.225511  [12060/60000]
loss: 0.211033  [15060/60000]
loss: 0.051665  [18060/60000]
loss: 0.072662  [21060/60000]
loss: 0.121633  [24060/60000]
loss: 0.237526  [27060/60000]
loss: 0.146027  [30060/60000]
loss: 0.200525  [33060/60000]
loss: 0.239444  [36060/60000]
loss: 0.141680  [39060/60000]
loss: 0.026968  [42060/60000]
loss: 0.306713  [45060/60000]
loss: 0.062141  [48060/60000]
loss: 0.169438  [51060/60000]
loss: 0.193799  [54060/60000]
loss: 0.250434  [57060/60000]
Test Error: 
 Accuracy: 91.7%, Avg loss: 0.233575 

Done!

Exercise: Apply similar designs for the US Postal Service data (MNIST), and the handwritten letters, EMNIST data.#

# Download and load the training and test datasets
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
# you can specify batsh size for the test if you plan to check the results at each time of the training process
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:01<00:00, 5079224.31it/s]
Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 134439.42it/s]
Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:01<00:00, 1257690.00it/s]
Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 5486903.45it/s]
Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

# Function to display images
def imshow(img, label):
    img = img / 2 + 0.5  # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.title('Sample of handwritten digits')
    #plt.title(f'First Item: {label}')
    plt.show()

# Get some random training images
dataiter = iter(train_loader)
images, labels = next(dataiter)

# Show images
imshow(torchvision.utils.make_grid(images), labels[0])

# Display the name of the label for the first image in the batch
print(f'First Label: {labels[0]}')
_images/606e6ab4f8e29d55bbec7ce28817fe57c57ea52f9b20dfd503350df0ec33a726.png
First Label: 1

Exercise: Develop a CNN for the CiFar10 data.#