{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "## An Introduction to the Multilayer Perceptron \n", "\n", "### Introduction \n", "\n", "The related research started in late 1950s ([F. Rosenblatt](https://news.cornell.edu/stories/2019/09/professors-perceptron-paved-way-ai-60-years-too-soon)) with the “Mark I Perceptron”: the goal was to automatically detect capital letters.\n", "\n", "\n", "\n", "\n", " \n", "
\n", " \"Image \n", " \"Image
\n", "\n", "Image source: https://jennysmoore.wordpress.com/2014/03/31/march-31-network-society-readings/\n", "\n", "\n", "### The Artificial Neuron \n", "\n", "\n", "\n", "\n", "\n", " \n", "
\n", " \"Image \n", " \"Image
\n", "\n", "\n", "Example of a neural network with five neurons:\n", "\n", "\n", "\n", " \n", "
\n", " \"Image \n", " \"Image
\n", "\n", "It is a nature-inspired design. Check out the [playground](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle®Dataset=reg-plane&learningRate=0.03®ularizationRate=0&noise=0&networkShape=4,2&seed=0.61321&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false).\n", "\n", "**Fact:** A frog has enough neurons to learn how to drive your car, but if it did, it would occupy its entire memory and it would not know how to feed itself.\n", "\n", "\n", "\n", "\n", "\n", " \n", "
\n", " \"Image \n", " \"Image
\n", "\n", "### Brief History of Development \n", "\n", "Neural Networks have been a success for computer vision, image analysis and classification problems; however we can use the method for regression, as well.\n", "\n", " - It was able to identify capital letters.\n", "\n", " - From 1960 - 1986, things progressed relatively slowly.\n", "\n", " - D. Rumelhart, [G. Hinton](https://torontolife.com/tech/ai-superstars-google-facebook-apple-studied-guy/) and R. Williams were able to achieve the first back-propagation network (Seminal paper “Learning Representations by back-propagating errors” Nature Vol. 323, 1986 ).\n", "\n", " - 1990, P. Werbos, “Backpropagation Through Time: What It Does and How to Do It”\n", "\n", " - G. Hinton and R. Salakhutdinov examined the capability of neural nets for image recognition in 2006. Explosive research into neural nets from 2006 - today.\n", "\n", "\n", "\n", "### Activation Functions \n", "\n", "Examples of activation functions:\n", "\n", "\n", "\n", "\n", "|Name |Plot | Equation | Derivative |\n", "|----------------------------------------|---------------------------------------------|------------------------------------------------------|--------------------------------------------|\n", "|Identity | ![Identity](https://i.imgur.com/OZZutkF.png) | $$ f(x) = x $$ | $$ f'(x) = 1 $$ |\n", "| Binary step | ![Binary step](https://i.imgur.com/zRVpQIU.png) | $$ f(x) = \\begin{cases} 0 & \\text{for } x < 0 \\\\ 1 & \\text{for } x \\ge 0 \\end{cases} $$ | $$ f'(x) = \\begin{cases} 0 & \\text{for } x \\ne 0 \\\\ ? & \\text{for } x = 0 \\end{cases} $$ |\n", "| Logistic (a.k.a Soft step) | ![Logistic](https://i.imgur.com/VtjWCF6.png) | $$ f(x) = \\frac{1}{1 + e^{-x}} $$ | $$ f'(x) = f(x)(1 - f(x)) $$ |\n", "| TanH | ![TanH](https://i.imgur.com/F7dSp0r.png) | $$ f(x) = \\tanh(x) = \\frac{2}{1 + e^{-2x}} - 1 $$ | $$ f'(x) = 1 - f(x)^2 $$ |\n", "| ArcTan | ![ArcTan](https://i.imgur.com/gY6mIAO.png) | $$ f(x) = \\tan^{-1}(x) $$ | $$ f'(x) = \\frac{1}{x^2 + 1} $$ |\n", "| Rectified Linear Unit (ReLU) | ![ReLU](https://i.imgur.com/GFNymDd.png) | $$ f(x) = \\begin{cases} 0 & \\text{for } x < 0 \\\\ x & \\text{for } x \\ge 0 \\end{cases} $$ | $$ f'(x) = \\begin{cases} 0 & \\text{for } x < 0 \\\\ 1 & \\text{for } x \\ge 0 \\end{cases} $$ |\n", "| Parametric Rectified Linear Unit (PReLU) | ![PReLU](https://i.imgur.com/UkIefvc.png) | $$ f(x) = \\begin{cases} \\alpha x & \\text{for } x < 0 \\\\ x & \\text{for } x \\ge 0 \\end{cases} $$ | $$ f'(x) = \\begin{cases} \\alpha & \\text{for } x < 0 \\\\ 1 & \\text{for } x \\ge 0 \\end{cases} $$ |\n", "| Exponential Linear Unit (ELU) | ![ELU](https://i.imgur.com/C5Qbkak.png) | $$ f(x) = \\begin{cases} \\alpha (e^x - 1) & \\text{for } x < 0 \\\\ x & \\text{for } x \\ge 0 \\end{cases} $$ | $$ f'(x) = \\begin{cases} f(x) + \\alpha & \\text{for } x < 0 \\\\ 1 & \\text{for } x \\ge 0 \\end{cases} $$ |\n", "| SoftPlus | ![SoftPlus](https://i.imgur.com/nzGthI4.png) | $$ f(x) = \\log(1 + e^x) $$ | $$ f'(x) = \\frac{1}{1 + e^{-x}} $$ |\n", "| Swish | ![Swish](https://i.imgur.com/55GMfys.png) | $$f(x) = \\frac{x}{1+e^-{\\beta x}}$$ | $$f'(x)=f(x)\\cdot\\left[1+\\beta - \\beta\\cdot\\sigma(\\beta x)\\right]$$ where $\\sigma$ is the logistic sigmoid. |\n", "\n", "\n", "### Backpropagation \n", "\n", "Backpropagation is the process of updating the weights in a direction that is based on the calculation of the gradient of the loss function.\n", "\n", "\n", "\n", "\n", " \n", "
\n", " \"Image \n", " \"Image
\n", "\n", "\n", "### Gradient Descent Methods for Optimization \n", "\n", "All the current optimization algorithms are based on a variant of gradient descent.\n", "\n", "Let's denote the vector of weights at step $t$ by $w_t$ and the gradient of the objective function with respect to the weights by $g_t$. The idea is that the gradient descent algorithm updates the weights under the following principle:\n", "\n", "$$\\large w_t = w_{t-1} - \\eta\\cdot g_{t,t-1}$$\n", "\n", "When the objective function (whose gradient with respect to the weights) is represented by $g_t$ and has multiple local minima, or it has a very shallow region containing the minima, the plain gradient descent algorithm may not converge to the position sought for. To remediate this deficiency research proposed alternatives by varying the way we evaluate the learning rate each step or how we compute the \"velocity\" for updating the weights:\n", "\n", "$$\\Large w_t = w_{t-1} - \\eta_t\\cdot v_t$$ \n", "\n", "In the equation above, $\\eta_t$ is an adaptive learning rate and $v_t$ a modified gradient.\n", "\n", "### Dynamic Learning Rates \n", "\n", "We can consider an exponential decay, such as\n", "\n", "$$\\large \\eta_t = \\eta_0 e^{-\\lambda\\cdot t}$$\n", "\n", "or a polynomial decay\n", "\n", "$$\\large \\eta_t = \\eta_0 (\\beta t+1)^{-\\alpha}$$\n", "\n", "### Momentum Gradient Descent \n", "\n", "$$\\large g_{t,t-1} = \\partial_w \\frac{1}{|\\text{B}_t|}\\sum_{i\\in \\text{B}_t}f(x_i,w_{t-1})=\\frac{1}{|\\text{B}_t|}\\sum_{i\\in \\text{B}_t}h_{i,t-1}$$\n", "\n", "where\n", "\n", "$$\\large v_t = \\beta v_{t-1} + g_{t,t-1}$$\n", "\n", "and $\\beta\\in (0,1).$\n", "\n", "For an explicit formula, we have\n", "\n", "$$\\large v_t = \\sum_{\\tau=0}^{t-1} \\beta^{\\tau}g_{t-\\tau,t-\\tau-1}$$\n", "\n", "\n", "and\n", "\n", "$$\\large w_t = w_{t-1} - \\alpha v_t$$\n", "\n", "where\n", "\n", "$$\\large \\alpha = \\frac{\\eta}{1-\\beta}$$\n", "\n", "\n", "### AdaGrad (Adaptive Gradient Descent) \n", "\n", "$$\\large s_t = s_{t-1} + g_{t}^2$$\n", "\n", "and\n", "\n", "$$\\large w_t= w_{t-1} - \\frac{\\eta}{\\sqrt{s_t+\\epsilon}}\\cdot g_t$$\n", "\n", "### RMSProp (Root Mean Square Propagation) \n", "\n", "$$\\large s_t = \\gamma\\cdot s_{t-1} + (1-\\gamma)\\cdot g_{t}^2$$\n", "\n", "and\n", "\n", "$$\\large w_t= w_{t-1} - \\frac{\\eta}{\\sqrt{s_t+\\epsilon}}\\cdot g_t$$\n", "\n", "Thus, we have\n", "\n", "$$\\large s_t = (1-\\gamma)\\cdot (g_t^2+\\gamma g_{t-1}^2+\\gamma^2 g_{t-2} +\\gamma^3 g_{t-2} + ... + \\gamma^{\\tau} g_0)$$\n", "\n", "### ADAM (Adaptive Momentum Gradient Descent) \n", "\n", "$$\\large v_t = \\beta_1 v_{t-1} +(1-\\beta_1) g_t$$\n", "\n", "and\n", "\n", "$$\\large s_t = \\beta_2 s_{t-1} + (1-\\beta_2) g_t^2 $$\n", "\n", "We further consider\n", "\n", "$$\\large \\hat{v}_t = \\frac{v_t}{1-\\beta_1}, \\text{and } \\hat{s}_t = \\frac{s_t}{1-\\beta_2} $$\n", "\n", "\n", "and\n", "\n", "$$\\large \\hat{g}_{t} = \\frac{\\eta\\cdot \\hat{v}_t}{\\sqrt{\\hat{s}_t}+\\epsilon}$$\n", "\n", "The updates to the weights are implemented as follows\n", "\n", "$$\\large w_t = w_{t-1} - \\hat{g}_t$$\n", "\n", "\n", "\n", "### Multilayer Perceptron Instructional Videos \n", "\n", "The first 2 are required. The last 2 are optional but highly recommended, even if you have not had any calculus or linear algebra!\n", "\n", "1. [But what is a Neural Network?](https://www.youtube.com/watch?v=aircAruvnKk)\n", "\n", "2. [Gradient descent, how neural networks learn](https://www.youtube.com/watch?v=IHZwWFHWa-w)\n", "\n", "3. [What is backpropagation really doing?](https://www.youtube.com/watch?v=Ilg3gGewQ5U)\n", "\n", "4. [Backpropagation calculus](https://www.youtube.com/watch?v=tIeHLnjs5U8)\n", "\n", "\n", "### Test Your Understanding \n", "\n", "1. What is a multilayer perceptron?\n", "2. What is a hidden layer?\n", "3. The network in the video was on the small-ish side, having only 2 hidden layers with 16 neurons each. How many total parameters (i.e. weights and biases) have to be determined during the training process for this network?\n", "4. Without reference to the calculus involved, do you understand the concept of gradient descent?\n", "\n", "\n", "\n", "\n" ], "metadata": { "id": "4D_g8i1kdovH" } }, { "cell_type": "markdown", "source": [ "### Code Applications \n", "\n", "---\n", "\n", "#### References:\n", "\n", "1. [Programming PyTorch for Deep Learning](https://www.amazon.com/Programming-PyTorch-Deep-Learning-Applications/dp/1492045357?asc_campaign=343db101986d123592617b298c3e663c&asc_source=01HPSC557H8MR8GX011W62TB1D&tag=snx192-20)\n", "2. [Getting Things Done with PyTorch](https://github.com/curiousily/Getting-Things-Done-with-Pytorch)\n" ], "metadata": { "id": "3zRyus4-un7J" } }, { "cell_type": "markdown", "source": [ "### Setup" ], "metadata": { "id": "Aunr_fxE2ABj" } }, { "cell_type": "code", "source": [ "import os\n", "if 'google.colab' in str(get_ipython()):\n", " print('Running on CoLab')\n", " from google.colab import drive\n", " drive.mount('/content/drive')\n", " os.chdir('/content/drive/My Drive/Data Sets')\n", "else:\n", " print('Running locally')\n", " os.chdir('../Data')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "DhFZT38M6s9_", "outputId": "5a33a15f-ec0a-46a5-940d-5003c2fa9798" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Running on CoLab\n", "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" ] } ] }, { "cell_type": "code", "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "\n", "from torch.utils.data import DataLoader, TensorDataset\n", "from sklearn.datasets import load_wine\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.metrics import accuracy_score" ], "metadata": { "id": "Cpnj3zsDunPv" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Example of a Neural Net in PyTorch for Classification" ], "metadata": { "id": "y6EUDXKQ2GnM" } }, { "cell_type": "code", "source": [ "# Load and preprocess the data\n", "wine_data = load_wine()\n", "X = wine_data.data\n", "y = wine_data.target\n", "\n", "# Standardize the features\n", "scaler = StandardScaler()\n", "X = scaler.fit_transform(X)\n", "\n", "# Split the dataset into training and test sets\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n", "\n", "# Convert the data to PyTorch tensors\n", "X_train_tensor = torch.tensor(X_train, dtype=torch.float64)\n", "X_test_tensor = torch.tensor(X_test, dtype=torch.float64)\n", "y_train_tensor = torch.tensor(y_train, dtype=torch.long)\n", "y_test_tensor = torch.tensor(y_test, dtype=torch.long)\n", "\n", "# Create DataLoader for training and testing\n", "train_dataset = TensorDataset(X_train_tensor, y_train_tensor)\n", "test_dataset = TensorDataset(X_test_tensor, y_test_tensor)\n", "train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)\n", "test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)\n", "\n", "# Define a simple neural network\n", "class WineNet(nn.Module):\n", " def __init__(self,n_features):\n", " super(WineNet, self).__init__()\n", " self.fc1 = nn.Linear(n_features, 50).double()\n", " self.fc2 = nn.Linear(50, 30).double()\n", " self.fc3 = nn.Linear(30, 3).double()\n", "\n", " def forward(self, x):\n", " x = torch.relu(self.fc1(x))\n", " x = torch.relu(self.fc2(x))\n", " x = self.fc3(x)\n", " return x\n", "\n", "# Initialize the model, loss function, and optimizer\n", "model = WineNet(X_train.shape[1])\n", "criterion = nn.CrossEntropyLoss()\n", "optimizer = optim.Adam(model.parameters(), lr=0.001)\n", "\n", "# Train the model\n", "num_epochs = 100\n", "for epoch in range(num_epochs):\n", " model.train()\n", " for X_batch, y_batch in train_loader:\n", " optimizer.zero_grad()\n", " outputs = model(X_batch)\n", " loss = criterion(outputs, y_batch)\n", " loss.backward()\n", " optimizer.step()\n", "\n", " if (epoch+1) % 10 == 0:\n", " print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')\n", "\n", "# Evaluate the model on the test set\n", "model.eval()\n", "with torch.no_grad():\n", " y_pred_list = []\n", " y_true_list = []\n", " for X_batch, y_batch in test_loader:\n", " outputs = model(X_batch)\n", " _, y_pred = torch.max(outputs, 1)\n", " y_pred_list.append(y_pred)\n", " y_true_list.append(y_batch)\n", "\n", " y_pred = torch.cat(y_pred_list)\n", " y_true = torch.cat(y_true_list)\n", " accuracy = accuracy_score(y_true.numpy(), y_pred.numpy())\n", " print(f'Accuracy on test set: {accuracy:.4f}')\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "F5N9m-fsmSYD", "outputId": "5e4557b0-ff36-493d-d3ee-60636a593718" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Epoch [10/100], Loss: 0.5738\n", "Epoch [20/100], Loss: 0.1444\n", "Epoch [30/100], Loss: 0.0459\n", "Epoch [40/100], Loss: 0.0263\n", "Epoch [50/100], Loss: 0.0081\n", "Epoch [60/100], Loss: 0.0122\n", "Epoch [70/100], Loss: 0.0090\n", "Epoch [80/100], Loss: 0.0048\n", "Epoch [90/100], Loss: 0.0050\n", "Epoch [100/100], Loss: 0.0043\n", "Accuracy on test set: 0.9815\n" ] } ] }, { "cell_type": "markdown", "source": [ "### Example of Classification w/ Class Imbalance\n", "\n", "Reference: https://github.com/curiousily/Getting-Things-Done-with-Pytorch/blob/master/04.first-neural-network.ipynb" ], "metadata": { "id": "abSQUWCW4zsi" } }, { "cell_type": "code", "source": [], "metadata": { "id": "DXlOgFMM2jWN" }, "execution_count": null, "outputs": [] } ] }