Regression With Fusion Data#

The aim of this notebook is to regress the time until a disruption occurs given a current state in time.

In reality, our dataset is comprised of multiple shots, where each shot has individual time readings.

Shot Number

Time Step

Feature 1

Feature 2

1

\(t_0\)

-

-

1

\(t_1\)

-

-

1

\(t_2\)

-

-

1

-

-

1

\(t_m\)

-

-

Shot Number

Time Step

Feature 1

Feature 2

—————–

———————-

—————–

—————–

N

\(t_0\)

-

-

N

\(t_1\)

-

-

N

\(t_2\)

-

-

N

-

-

N

\(t_m\)

-

-

In good approximation, we can treat individual readings in time as independent measurements and regress directly upon them.

You can think of this as a sort of state estimation:

Given state \(S_i\), independent of \(S_{j \neq i}\), we want to regress how long the stability of our fusion process will last.

Lets load our data and take a look at the features. We also need to drop some features as these will bias the regression.

We will drop:

Disruptive - this is a binary label indicating whether or not a disruption has occured.

Shot - this is an indexing variable for individual shots.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("full_db-complete_classified_0.1s.csv",sep=',',index_col=None)
df_regression = df.drop(columns=['disruptive','shot']) # Drop disruptive - binary la
df_regression = df_regression[df.time_until_disrupt  > 0]
df_regression = df_regression.sample(frac=1.0) # Random shuffle shots
df_regression.head()
z_error radiated_fraction beta_p lower_gap n_e ssep Wmhd p_icrf upper_gap beta_n ... time_until_disrupt n_equal_1_normalized Greenwald_fraction li v_loop time q95 p_oh p_rad ip_error
148403 -0.000308 0.348539 0.294560 0.049893 2.203347e+20 -0.025936 21782.460938 3.840209e+01 0.093674 0.742089 ... 0.0160 0.001137 0.604698 1.259797 -2.511963 1.0899 2.940394 1.125851e+06 392415.960065 271289.0625
169792 0.000219 0.018243 0.257212 0.078944 1.575422e+20 -0.013643 44130.542969 3.537796e+06 0.103161 0.439488 ... 0.2849 0.001021 0.309502 1.414968 -0.851196 1.4200 4.553412 4.030026e+05 71890.617371 104532.6250
275818 0.000992 0.415253 0.135165 0.044478 7.909377e+19 -0.018565 23948.123047 3.787166e+01 0.097886 0.223583 ... 1.2241 0.000308 0.148950 1.505030 -0.808014 0.5200 4.437774 8.496335e+05 352828.804445 -13796.6875
44013 0.000420 -0.329507 0.092925 0.052791 7.200679e+19 0.010766 26898.298828 1.013809e+02 0.087245 0.192169 ... 0.1745 0.000827 0.106209 1.382588 0.796661 0.6400 3.791784 -1.451691e+06 478308.292191 -234166.7500
22454 0.001223 0.350599 0.189295 0.055474 1.547137e+20 -0.007201 53719.265625 7.274901e+05 0.108699 0.408583 ... 0.8579 0.000525 0.233198 1.368598 -1.063354 0.6800 3.532992 1.178322e+06 668175.674777 -209515.4375

5 rows × 29 columns

Lets visualize our data.

df_regression.hist(figsize=(15,15),bins=50,density=True)
array([[<AxesSubplot:title={'center':'z_error'}>,
        <AxesSubplot:title={'center':'radiated_fraction'}>,
        <AxesSubplot:title={'center':'beta_p'}>,
        <AxesSubplot:title={'center':'lower_gap'}>,
        <AxesSubplot:title={'center':'n_e'}>],
       [<AxesSubplot:title={'center':'ssep'}>,
        <AxesSubplot:title={'center':'Wmhd'}>,
        <AxesSubplot:title={'center':'p_icrf'}>,
        <AxesSubplot:title={'center':'upper_gap'}>,
        <AxesSubplot:title={'center':'beta_n'}>],
       [<AxesSubplot:title={'center':'zcur'}>,
        <AxesSubplot:title={'center':'q0'}>,
        <AxesSubplot:title={'center':'qstar'}>,
        <AxesSubplot:title={'center':'n_over_ncrit'}>,
        <AxesSubplot:title={'center':'ip'}>],
       [<AxesSubplot:title={'center':'kappa'}>,
        <AxesSubplot:title={'center':'dipprog_dt'}>,
        <AxesSubplot:title={'center':'p_lh'}>,
        <AxesSubplot:title={'center':'n_equal_1_mode'}>,
        <AxesSubplot:title={'center':'time_until_disrupt'}>],
       [<AxesSubplot:title={'center':'n_equal_1_normalized'}>,
        <AxesSubplot:title={'center':'Greenwald_fraction'}>,
        <AxesSubplot:title={'center':'li'}>,
        <AxesSubplot:title={'center':'v_loop'}>,
        <AxesSubplot:title={'center':'time'}>],
       [<AxesSubplot:title={'center':'q95'}>,
        <AxesSubplot:title={'center':'p_oh'}>,
        <AxesSubplot:title={'center':'p_rad'}>,
        <AxesSubplot:title={'center':'ip_error'}>, <AxesSubplot:>]],
      dtype=object)
_images/7c91f83be44b250eb6e978024e73e4cc488b86dde4ba99d01f901ca150b3a2c0.png

Preprocessing#

We will create a train test split for this dataset, using the traditional 70/15/15% split.

We are also going to add a flag for taking the logarithm of time.

from sklearn.preprocessing import MinMaxScaler

############# Important Flag ##############
log_time = False
###########################################

def create_train_test_split(dataset,log_time=False):
    
    def return_x_y(dataframe,log_time=False):
        x = dataframe.drop(columns=['time_until_disrupt']).to_numpy()
        if log_time:
            y = np.log(dataframe['time_until_disrupt'].values)
        else:
            y = dataframe['time_until_disrupt'].values
        
        return x,y 
    
    Nt = int(0.7 * len(dataset))
    train = dataset[:Nt]
    test_val = dataset[Nt:]
    
    Nv = int(0.5 *len(test_val))
    val = test_val[:Nv]
    test = test_val[Nv:]
    print("Total dataset size: ",len(dataset))
    print("Number of Training data points: ",len(train))
    print("Number of Validation data points: ",len(val))
    print("Number of Testing data points: ",len(test))
    
    train_x,train_y = return_x_y(train,log_time)
    val_x,val_y = return_x_y(val,log_time)
    test_x,test_y = return_x_y(test,log_time)
    
    return train_x,train_y,val_x,val_y,test_x,test_y

train_x,train_y,val_x,val_y,test_x,test_y = create_train_test_split(df_regression,log_time=log_time)
Total dataset size:  63501
Number of Training data points:  44450
Number of Validation data points:  9525
Number of Testing data points:  9526

Lets scale our data. Here we will use MinMax scaling on the interval (-1,1). You can experiment with other scaling such as StandardScalers (z-score normalization) if you wish.

feature_scaler = MinMaxScaler((-1,1))
feature_scaler.fit(df_regression.drop(columns=['time_until_disrupt']).to_numpy()) # Lets use global statistics for simplicity here.

x_train_scaled = feature_scaler.transform(train_x)
x_val_scaled = feature_scaler.transform(val_x)
x_test_scaled = feature_scaler.transform(test_x)

print("Features")
print("Training:",x_train_scaled.max(),x_train_scaled.min())
print("Validation:",x_val_scaled.max(),x_val_scaled.min())
print("Testing:",x_test_scaled.max(),x_test_scaled.min())

if log_time: # Declared above
    target_scaler = MinMaxScaler((0,1))
    target_scaler.fit(np.log(df_regression['time_until_disrupt'].values.reshape(-1,1))) # Using global statistics for simplicity here
    
    y_train_scaled = target_scaler.transform(train_y.reshape(-1,1)).flatten()
    y_val_scaled = target_scaler.transform(val_y.reshape(-1,1)).flatten()
    y_test_scaled = target_scaler.transform(test_y.reshape(-1,1)).flatten()
else:
    y_train_scaled = train_y.copy()
    y_val_scaled = val_y.copy()
    y_test_scaled = test_y.copy()
    
print(" ")
print("Targets")
print("Training:",y_train_scaled.max(),y_train_scaled.min())
print("Validation:",y_val_scaled.max(),y_val_scaled.min())
print("Testing:",y_test_scaled.max(),y_test_scaled.min())
Features
Training: 1.0000000000000004 -1.0000000000000002
Validation: 1.0 -1.0
Testing: 1.0000000000000002 -1.0
 
Targets
Training: 1.7724999487400055 0.0004999637603759
Validation: 1.7924999594688416 0.0001000165939331
Testing: 1.7524999380111694 0.0009999275207519

Model Definition#

We will use a simple Deep Neural Network (DNN), commonly referred to as a Multi-Layer Perceptron (MLP).

We are going to make this as explicit as possible for clarity. We could use nn.Sequential to package this more neatly if we so choose, and also be more dynamic in our model creation.

import torch
from torch import nn
from torch.nn import BatchNorm1d

class MLP(nn.Module):
    def __init__(self,input_shape,output_shape):
        super(MLP, self).__init__()

        self.init_layer = nn.Linear(input_shape,64)
        self.B0 = nn.BatchNorm1d(64)
        self.L1 = nn.Linear(64,256)
        self.B1 = nn.BatchNorm1d(256)
        self.L2 = nn.Linear(256,512)
        self.B2 = nn.BatchNorm1d(512)
        self.L3 = nn.Linear(512,256)
        self.B3 = nn.BatchNorm1d(256)
        self.L4 = nn.Linear(256,128)
        self.B4 = nn.BatchNorm1d(128)
        self.output = nn.Linear(128,output_shape)
        self.activation = nn.SELU()
        self.output_activation = nn.ReLU()
    def forward(self,x):

        x = self.activation(self.B0(self.init_layer(x))) # SELU(BNorm(Linear(x)))
        x = self.activation(self.B1(self.L1(x)))
        x = self.activation(self.B2(self.L2(x)))
        x = self.activation(self.B3(self.L3(x)))
        x = self.activation(self.B4(self.L4(x)))
        x = self.output_activation(self.output(x)).squeeze(1) # Time must be > 0, ReLU is good choice here.

        return x
     
regressor = MLP(input_shape=x_train_scaled.shape[1],output_shape=1) # Singular output
print(regressor)
MLP(
  (init_layer): Linear(in_features=28, out_features=64, bias=True)
  (B0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (L1): Linear(in_features=64, out_features=256, bias=True)
  (B1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (L2): Linear(in_features=256, out_features=512, bias=True)
  (B2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (L3): Linear(in_features=512, out_features=256, bias=True)
  (B3): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (L4): Linear(in_features=256, out_features=128, bias=True)
  (B4): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (output): Linear(in_features=128, out_features=1, bias=True)
  (activation): SELU()
  (output_activation): ReLU()
)

Lets see if we get output.

regressor(torch.tensor(x_train_scaled[:3]).float())
tensor([1.1473, 0.0000, 0.0000], grad_fn=<SqueezeBackward1>)

We are going to use built in pytorch functions for creating our data loading pipeline. Since we have simple tabular data, we can use TensorDataset(x,y).

Note that we are going to need to convert these numpy arrays to torch.tensor() objects first.

from torch.utils.data import TensorDataset, DataLoader

train_dataset = TensorDataset(torch.tensor(x_train_scaled),torch.tensor(y_train_scaled))
val_dataset = TensorDataset(torch.tensor(x_val_scaled),torch.tensor(y_val_scaled))
test_dataset = TensorDataset(torch.tensor(x_test_scaled),torch.tensor(y_test_scaled))

We can also build custom datasets, which you will see in next weeks lectures on normalizing flows. We will explain in more detail how those work then, but for now you should have knowledge of the basic method of getting data from these objects.

train_dataset.__getitem__(0)
(tensor([ 0.1039, -0.6623,  0.1553, -0.8241, -0.2839,  0.0161, -0.5669, -0.9937,
         -0.7154,  0.4758,  0.1006, -0.9606, -0.5235, -0.7937, -0.3066,  0.1707,
          0.5250, -0.9998, -0.8750, -0.7934, -0.3216, -0.0954,  0.0335, -0.0196,
         -0.6106, -0.5459, -0.9736,  0.4466], dtype=torch.float64),
 tensor(0.0160, dtype=torch.float64))

All pytorch datasets must have a getitem() function. This will be imporant when we want to pass these objects to a DataLoader(). Again, we will go into more detail on all of this next week. But lets get familiar with the basic usage.

We know that when we train deep learning models we will use batches of inputs to compute gradients. An efficient way to batch your data is to use the DataLoader() functions from pytorch. This will pass a series of random indices to your TensorDataset’s getitem() function and form batches.

train_loader = DataLoader(train_dataset,batch_size=10)

for i,data in enumerate(train_loader):
    x = data[0]
    y = data[1]
    print("Batch {0}".format(i),"-> x shape: ",x.shape," y shape: ",y.shape)
    break
Batch 0 -> x shape:  torch.Size([10, 28])  y shape:  torch.Size([10])

Now lets create a function to return dataloaders for the three datasets we have created.

# Create dataloaders to iterate.
# We take as input the three TensorDatasets along with the batch sizes we want to use.
def CreateLoaders(train_dataset,val_dataset,test_dataset,train_batch,val_batch,test_batch):
    train_loader = DataLoader(train_dataset,
                            batch_size=train_batch,
                            shuffle=True)
    val_loader =  DataLoader(val_dataset,
                            batch_size=val_batch,
                            shuffle=False)
    test_loader =  DataLoader(test_dataset,
                            batch_size=val_batch,
                            shuffle=False)
    return train_loader,val_loader,test_loader

train_loader,val_loader,test_loader = CreateLoaders(train_dataset,val_dataset,test_dataset,10,10,10)

Training Function#

We almost have everything we need to start training our model.

  1. Datasets in a pytorch format.

  2. DataLoaders for batch creation.

  3. A training function - lets build this.

Lets write the training function in such a way that we can use it for any network.

There are two main components:

Training loop

Validation loop

We also need a clever way to store our parameters. Lets use a dictionary to do this.

Note that we are going to assume GPU usage here, so use your notebook on sciclone!

You can see the information regarding your GPU with the command below:

!nvidia-smi
Wed Jun  5 10:27:28 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.68                 Driver Version: 551.68         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090      WDDM  |   00000000:01:00.0  On |                  Off |
|  0%   34C    P8             19W /  450W |    2156MiB /  24564MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      5640    C+G   ...oogle\Chrome\Application\chrome.exe      N/A      |
|    0   N/A  N/A      5756    C+G   ...5n1h2txyewy\ShellExperienceHost.exe      N/A      |
|    0   N/A  N/A      6936    C+G   ...ems\FX\SubAgent\AlienFXSubAgent.exe      N/A      |
|    0   N/A  N/A      8076    C+G   C:\Windows\explorer.exe                     N/A      |
|    0   N/A  N/A      8160      C   ...James\.conda\envs\ptorch\python.exe      N/A      |
|    0   N/A  N/A      9568    C+G   ...nt.CBS_cw5n1h2txyewy\SearchHost.exe      N/A      |
|    0   N/A  N/A      9592    C+G   ...2txyewy\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A     12132    C+G   ...t.LockApp_cw5n1h2txyewy\LockApp.exe      N/A      |
|    0   N/A  N/A     17240    C+G   ...Programs\Microsoft VS Code\Code.exe      N/A      |
|    0   N/A  N/A     23292    C+G   ...__8wekyb3d8bbwe\WindowsTerminal.exe      N/A      |
|    0   N/A  N/A     31464    C+G   ...siveControlPanel\SystemSettings.exe      N/A      |
|    0   N/A  N/A     31864    C+G   ...ekyb3d8bbwe\PhoneExperienceHost.exe      N/A      |
|    0   N/A  N/A     33840    C+G   ...on\125.0.2535.85\msedgewebview2.exe      N/A      |
|    0   N/A  N/A     37792    C+G   ...wekyb3d8bbwe\XboxGameBarWidgets.exe      N/A      |
|    0   N/A  N/A     42028    C+G   ...crosoft\Edge\Application\msedge.exe      N/A      |
+-----------------------------------------------------------------------------------------+

If you don’t see an output with the above command please let me know and we will sort it.

config = {"seed":8,
          "name": "MyRegressionModel",
          "run_val":1,
          "model": {
               "input_shape":28,
               "output_shape": 1,
           },
          "optimizer": {
                "lr": 7e-4,
            },
          "num_epochs": 100,
          "dataloader": {
            "train": {
                "batch_size": 64
            },
            "val": {
                "batch_size": 64
            },
            "test": {
                "batch_size": 64
            },
          },
            "output": {
                "dir":"./"
            }
}

Below is our training function. It will operate as follows:

  1. We set seeds for commonly used packages for reproducability.

  2. Create a directory with name corresponding to the name field in the above dictionary.

  3. We create our dataloaders - we only need train/val, but test will be created anyways and not used.

  4. We create an instance of our MLP (funciton defined above)

  5. We create an instance of an optimizer - Adam.

  6. Define a learning rate scheduler - CosineAnnealing. We will see what this looks like graphically.

  7. Define our loss function - SmoothL1Loss - https://pytorch.org/docs/stable/generated/torch.nn.SmoothL1Loss.html

  8. For each epoch iterate over all batches in the training loader and train.

  9. At the end of each epoch, iterate over the validation loader - DO NOT apply gradients.

import os
import json
import random
import pkbar
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.nn as nn
import datetime
import shutil
import time

def trainer(config,train_dataset,val_dataset,test_dataset):
    # Setup random seed
    torch.manual_seed(config['seed'])
    np.random.seed(config['seed'])
    random.seed(config['seed'])
    torch.cuda.manual_seed(config['seed'])

    # Create experiment name
    exp_name = config['name'] 
    print(exp_name)

    # Create directory structure
    output_folder = config['output']['dir']
    output_path = os.path.join(output_folder,exp_name)
    if os.path.exists(output_path):
        timestamp = time.time()
        dt_object = datetime.datetime.fromtimestamp(timestamp)
        formatted_time = dt_object.strftime('%H_%M_%S')
        output_path = output_path +"_"+ str(formatted_time)
        
    os.mkdir(output_path)
    
    with open(os.path.join(output_path,'config.json'),'w') as outfile:
        json.dump(config, outfile)


       # Load the dataset
    print('Creating Loaders.')
    train_batch_size = config['dataloader']['train']['batch_size']
    val_batch_size = config['dataloader']['val']['batch_size']
    test_batch_size = config['dataloader']['val']['batch_size']

    train_loader,val_loader,test_loader = CreateLoaders(train_dataset,val_dataset,test_dataset,train_batch_size,val_batch_size,test_batch_size)
    
    history = {'train_loss':[],'val_loss':[],'lr':[]}


    print("Training Size: {0}".format(len(train_loader.dataset)))
    print("Validation Size: {0}".format(len(val_loader.dataset)))

    # Create the model
    input_shape = config['model']['input_shape']
    output_shape = config['model']['output_shape']
    net = MLP(input_shape=input_shape,output_shape=output_shape)
    t_params = sum(p.numel() for p in net.parameters())
    print("Network Parameters: ",t_params)
    device = torch.device('cuda')
    net.to('cuda')

    # Optimizer
    num_epochs=int(config['num_epochs'])
    lr = float(config['optimizer']['lr'])

    optimizer = optim.Adam(list(filter(lambda p: p.requires_grad, net.parameters())), lr=lr)
    num_steps = len(train_loader) * num_epochs
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer, T_max=num_steps, last_epoch=-1,
                                                           eta_min=0)

    startEpoch = 0
    global_step = 0

    print('===========  Optimizer  ==================:')
    print('      LR:', lr)
    print('      num_epochs:', num_epochs)
    print('')
    
    
    # Loss Function 
    loss_function = nn.SmoothL1Loss()
    for epoch in range(startEpoch,num_epochs):

        kbar = pkbar.Kbar(target=len(train_loader), epoch=epoch, num_epochs=num_epochs, width=20, always_stateful=False)

        net.train()
        running_loss = 0.0

        for i, data in enumerate(train_loader):
            input  = data[0].to('cuda').float()
            y = data[1].to('cuda').float()

            optimizer.zero_grad()

            with torch.set_grad_enabled(True):
                y_hat = net(input)
                
            loss = loss_function(y,y_hat)

            loss.backward()
            #torch.nn.utils.clip_grad_norm_(net.parameters(), max_norm=0.5,error_if_nonfinite=True)
            optimizer.step()
            scheduler.step()

            running_loss += loss.item() * input.shape[0]
            kbar.update(i, values=[("loss", loss.item())])
            global_step += 1


        history['train_loss'].append(running_loss / len(train_loader.dataset))
        history['lr'].append(scheduler.get_last_lr()[0])


        ######################
        ## validation phase ##
        ######################
        if bool(config['run_val']):
            net.eval()
            val_loss = 0.0
            with torch.no_grad():
                for i, data in enumerate(val_loader):
                    input  = data[0].to('cuda').float()
                    y = data[1].to('cuda').float()
                    y_hat = net(input)
                    loss = loss_function(y,y_hat)

                    val_loss += loss

            val_loss = val_loss.cpu().numpy() / len(val_loader)

            history['val_loss'].append(val_loss)

            kbar.add(1, values=[("val_loss", val_loss.item())])

            name_output_file = config['name']+'_epoch{:02d}_val_loss_{:.6f}.pth'.format(epoch, val_loss)

        else:
            kbar.add(1,values=[('val_loss',0.)])
            name_output_file = config['name']+'_epoch{:02d}_train_loss_{:.6f}.pth'.format(epoch, running_loss / len(train_loader.dataset))

        filename = os.path.join(output_path, name_output_file)

        checkpoint={}
        checkpoint['net_state_dict'] = net.state_dict()
        checkpoint['optimizer'] = optimizer.state_dict()
        checkpoint['scheduler'] = scheduler.state_dict()
        checkpoint['epoch'] = epoch
        checkpoint['history'] = history
        checkpoint['global_step'] = global_step

        torch.save(checkpoint,filename)

        print('')
trainer(config,train_dataset,val_dataset,test_dataset)
MyRegressionModel
Creating Loaders.
Training Size: 44450
Validation Size: 9525
Network Parameters:  316865
===========  Optimizer  ==================:
      LR: 0.0007
      num_epochs: 100

Epoch: 1/100
695/695 [====================] - 5s 7ms/step - loss: 0.0512 - val_loss: 0.0361

Epoch: 2/100
695/695 [====================] - 8s 11ms/step - loss: 0.0374 - val_loss: 0.0366

Epoch: 3/100
695/695 [====================] - 8s 11ms/step - loss: 0.0353 - val_loss: 0.0324

Epoch: 4/100
695/695 [====================] - 8s 11ms/step - loss: 0.0336 - val_loss: 0.0309

Epoch: 5/100
695/695 [====================] - 8s 11ms/step - loss: 0.0318 - val_loss: 0.0278

Epoch: 6/100
695/695 [====================] - 8s 11ms/step - loss: 0.0311 - val_loss: 0.0306

Epoch: 7/100
695/695 [====================] - 8s 11ms/step - loss: 0.0296 - val_loss: 0.0296

Epoch: 8/100
695/695 [====================] - 7s 11ms/step - loss: 0.0295 - val_loss: 0.0273

Epoch: 9/100
695/695 [====================] - 8s 11ms/step - loss: 0.0284 - val_loss: 0.0298

Epoch: 10/100
695/695 [====================] - 8s 11ms/step - loss: 0.0276 - val_loss: 0.0254

Epoch: 11/100
695/695 [====================] - 8s 11ms/step - loss: 0.0269 - val_loss: 0.0241

Epoch: 12/100
695/695 [====================] - 8s 11ms/step - loss: 0.0263 - val_loss: 0.0258

Epoch: 13/100
695/695 [====================] - 8s 11ms/step - loss: 0.0255 - val_loss: 0.0235

Epoch: 14/100
695/695 [====================] - 8s 11ms/step - loss: 0.0248 - val_loss: 0.0254

Epoch: 15/100
695/695 [====================] - 8s 11ms/step - loss: 0.0243 - val_loss: 0.0220

Epoch: 16/100
695/695 [====================] - 7s 11ms/step - loss: 0.0240 - val_loss: 0.0247

Epoch: 17/100
695/695 [====================] - 8s 11ms/step - loss: 0.0234 - val_loss: 0.0240

Epoch: 18/100
695/695 [====================] - 8s 11ms/step - loss: 0.0230 - val_loss: 0.0222

Epoch: 19/100
695/695 [====================] - 8s 11ms/step - loss: 0.0226 - val_loss: 0.0206

Epoch: 20/100
695/695 [====================] - 8s 11ms/step - loss: 0.0219 - val_loss: 0.0214

Epoch: 21/100
695/695 [====================] - 8s 11ms/step - loss: 0.0216 - val_loss: 0.0211

Epoch: 22/100
695/695 [====================] - 7s 11ms/step - loss: 0.0214 - val_loss: 0.0203

Epoch: 23/100
695/695 [====================] - 8s 11ms/step - loss: 0.0208 - val_loss: 0.0213

Epoch: 24/100
695/695 [====================] - 8s 11ms/step - loss: 0.0204 - val_loss: 0.0194

Epoch: 25/100
695/695 [====================] - 8s 11ms/step - loss: 0.0200 - val_loss: 0.0212

Epoch: 26/100
695/695 [====================] - 7s 11ms/step - loss: 0.0196 - val_loss: 0.0180

Epoch: 27/100
695/695 [====================] - 8s 11ms/step - loss: 0.0193 - val_loss: 0.0208

Epoch: 28/100
695/695 [====================] - 7s 11ms/step - loss: 0.0188 - val_loss: 0.0198

Epoch: 29/100
695/695 [====================] - 8s 11ms/step - loss: 0.0190 - val_loss: 0.0201

Epoch: 30/100
695/695 [====================] - 8s 11ms/step - loss: 0.0185 - val_loss: 0.0169

Epoch: 31/100
695/695 [====================] - 8s 11ms/step - loss: 0.0182 - val_loss: 0.0168

Epoch: 32/100
695/695 [====================] - 7s 11ms/step - loss: 0.0179 - val_loss: 0.0175

Epoch: 33/100
695/695 [====================] - 8s 11ms/step - loss: 0.0179 - val_loss: 0.0169

Epoch: 34/100
695/695 [====================] - 8s 11ms/step - loss: 0.0175 - val_loss: 0.0162

Epoch: 35/100
695/695 [====================] - 8s 11ms/step - loss: 0.0173 - val_loss: 0.0149

Epoch: 36/100
695/695 [====================] - 8s 11ms/step - loss: 0.0169 - val_loss: 0.0162

Epoch: 37/100
695/695 [====================] - 8s 11ms/step - loss: 0.0168 - val_loss: 0.0168

Epoch: 38/100
695/695 [====================] - 8s 11ms/step - loss: 0.0166 - val_loss: 0.0163

Epoch: 39/100
695/695 [====================] - 7s 11ms/step - loss: 0.0165 - val_loss: 0.0148

Epoch: 40/100
695/695 [====================] - 8s 11ms/step - loss: 0.0161 - val_loss: 0.0154

Epoch: 41/100
695/695 [====================] - 8s 11ms/step - loss: 0.0159 - val_loss: 0.0148

Epoch: 42/100
695/695 [====================] - 7s 11ms/step - loss: 0.0157 - val_loss: 0.0142

Epoch: 43/100
695/695 [====================] - 8s 11ms/step - loss: 0.0156 - val_loss: 0.0147

Epoch: 44/100
695/695 [====================] - 8s 11ms/step - loss: 0.0151 - val_loss: 0.0144

Epoch: 45/100
695/695 [====================] - 8s 11ms/step - loss: 0.0151 - val_loss: 0.0140

Epoch: 46/100
695/695 [====================] - 8s 11ms/step - loss: 0.0149 - val_loss: 0.0147

Epoch: 47/100
695/695 [====================] - 8s 11ms/step - loss: 0.0148 - val_loss: 0.0141

Epoch: 48/100
695/695 [====================] - 8s 11ms/step - loss: 0.0145 - val_loss: 0.0142

Epoch: 49/100
695/695 [====================] - 8s 11ms/step - loss: 0.0143 - val_loss: 0.0130

Epoch: 50/100
695/695 [====================] - 8s 11ms/step - loss: 0.0142 - val_loss: 0.0139

Epoch: 51/100
695/695 [====================] - 8s 11ms/step - loss: 0.0141 - val_loss: 0.0134

Epoch: 52/100
695/695 [====================] - 8s 11ms/step - loss: 0.0139 - val_loss: 0.0141

Epoch: 53/100
695/695 [====================] - 8s 11ms/step - loss: 0.0139 - val_loss: 0.0133

Epoch: 54/100
695/695 [====================] - 8s 11ms/step - loss: 0.0136 - val_loss: 0.0133

Epoch: 55/100
695/695 [====================] - 8s 11ms/step - loss: 0.0133 - val_loss: 0.0120

Epoch: 56/100
695/695 [====================] - 8s 11ms/step - loss: 0.0131 - val_loss: 0.0122

Epoch: 57/100
695/695 [====================] - 8s 11ms/step - loss: 0.0132 - val_loss: 0.0128

Epoch: 58/100
695/695 [====================] - 8s 11ms/step - loss: 0.0130 - val_loss: 0.0124

Epoch: 59/100
695/695 [====================] - 7s 11ms/step - loss: 0.0128 - val_loss: 0.0121

Epoch: 60/100
695/695 [====================] - 8s 11ms/step - loss: 0.0125 - val_loss: 0.0122

Epoch: 61/100
695/695 [====================] - 8s 11ms/step - loss: 0.0126 - val_loss: 0.0119

Epoch: 62/100
695/695 [====================] - 7s 11ms/step - loss: 0.0125 - val_loss: 0.0117

Epoch: 63/100
695/695 [====================] - 8s 11ms/step - loss: 0.0122 - val_loss: 0.0119

Epoch: 64/100
695/695 [====================] - 8s 11ms/step - loss: 0.0120 - val_loss: 0.0119

Epoch: 65/100
695/695 [====================] - 8s 11ms/step - loss: 0.0123 - val_loss: 0.0112

Epoch: 66/100
695/695 [====================] - 8s 11ms/step - loss: 0.0120 - val_loss: 0.0113

Epoch: 67/100
695/695 [====================] - 8s 11ms/step - loss: 0.0118 - val_loss: 0.0121

Epoch: 68/100
695/695 [====================] - 8s 11ms/step - loss: 0.0116 - val_loss: 0.0108

Epoch: 69/100
695/695 [====================] - 8s 11ms/step - loss: 0.0115 - val_loss: 0.0115

Epoch: 70/100
695/695 [====================] - 8s 11ms/step - loss: 0.0115 - val_loss: 0.0117

Epoch: 71/100
695/695 [====================] - 8s 11ms/step - loss: 0.0114 - val_loss: 0.0111

Epoch: 72/100
695/695 [====================] - 8s 11ms/step - loss: 0.0112 - val_loss: 0.0104

Epoch: 73/100
695/695 [====================] - 8s 11ms/step - loss: 0.0112 - val_loss: 0.0106

Epoch: 74/100
695/695 [====================] - 7s 11ms/step - loss: 0.0110 - val_loss: 0.0108

Epoch: 75/100
695/695 [====================] - 8s 11ms/step - loss: 0.0112 - val_loss: 0.0105

Epoch: 76/100
695/695 [====================] - 8s 11ms/step - loss: 0.0109 - val_loss: 0.0104

Epoch: 77/100
695/695 [====================] - 8s 11ms/step - loss: 0.0108 - val_loss: 0.0109

Epoch: 78/100
695/695 [====================] - 8s 11ms/step - loss: 0.0108 - val_loss: 0.0104

Epoch: 79/100
695/695 [====================] - 8s 11ms/step - loss: 0.0107 - val_loss: 0.0104

Epoch: 80/100
695/695 [====================] - 8s 11ms/step - loss: 0.0106 - val_loss: 0.0104

Epoch: 81/100
695/695 [====================] - 8s 11ms/step - loss: 0.0104 - val_loss: 0.0102

Epoch: 82/100
695/695 [====================] - 8s 11ms/step - loss: 0.0105 - val_loss: 0.0100

Epoch: 83/100
695/695 [====================] - 8s 11ms/step - loss: 0.0104 - val_loss: 0.0098

Epoch: 84/100
695/695 [====================] - 8s 11ms/step - loss: 0.0102 - val_loss: 0.0100

Epoch: 85/100
695/695 [====================] - 8s 11ms/step - loss: 0.0103 - val_loss: 0.0099

Epoch: 86/100
695/695 [====================] - 8s 11ms/step - loss: 0.0101 - val_loss: 0.0099

Epoch: 87/100
695/695 [====================] - 7s 11ms/step - loss: 0.0102 - val_loss: 0.0099

Epoch: 88/100
695/695 [====================] - 8s 11ms/step - loss: 0.0101 - val_loss: 0.0100

Epoch: 89/100
695/695 [====================] - 8s 11ms/step - loss: 0.0101 - val_loss: 0.0099

Epoch: 90/100
695/695 [====================] - 7s 11ms/step - loss: 0.0100 - val_loss: 0.0099

Epoch: 91/100
695/695 [====================] - 8s 11ms/step - loss: 0.0099 - val_loss: 0.0103

Epoch: 92/100
695/695 [====================] - 8s 11ms/step - loss: 0.0101 - val_loss: 0.0098

Epoch: 93/100
695/695 [====================] - 7s 11ms/step - loss: 0.0099 - val_loss: 0.0108

Epoch: 94/100
695/695 [====================] - 8s 11ms/step - loss: 0.0100 - val_loss: 0.0104

Epoch: 95/100
695/695 [====================] - 8s 11ms/step - loss: 0.0098 - val_loss: 0.0109

Epoch: 96/100
695/695 [====================] - 8s 11ms/step - loss: 0.0099 - val_loss: 0.0098

Epoch: 97/100
695/695 [====================] - 8s 11ms/step - loss: 0.0099 - val_loss: 0.0097

Epoch: 98/100
695/695 [====================] - 8s 11ms/step - loss: 0.0098 - val_loss: 0.0095

Epoch: 99/100
695/695 [====================] - 8s 11ms/step - loss: 0.0098 - val_loss: 0.0095

Epoch: 100/100
695/695 [====================] - 8s 11ms/step - loss: 0.0098 - val_loss: 0.0096

Lets take a look at what the learning rate scheduler has done for us:

dicte = torch.load(os.path.join(config['name'],os.listdir("MyRegressionModel")[-1]))
print(dicte.keys())
dict_keys(['net_state_dict', 'optimizer', 'scheduler', 'epoch', 'history', 'global_step'])

Within each .pth file, we have stored the weights at a specific epoch, along with a variety of other useful information. First lets plot the training and validation losses, along with the learning rate.

train_loss = dicte['history']['train_loss']
val_loss = dicte['history']['val_loss']
learning_rate = dicte['history']['lr']

plt.plot(train_loss,'r-',label="Training Loss")
plt.plot(val_loss,'b-',label='Validation Loss')
plt.legend(fontsize=20)
plt.xlabel('Epoch',fontsize=20)
plt.ylabel('Loss',fontsize=20)
plt.tick_params(axis='both', labelsize=18)
plt.show()

plt.plot(learning_rate,'k-',label=r"Learning Rate - $\eta$")
plt.legend(fontsize=20)
plt.xlabel('Epoch',fontsize=20)
plt.ylabel(r'$\eta$',fontsize=20)
plt.tick_params(axis='both', labelsize=18)
plt.show()
_images/251e1f68836889b8c946c80f4607393a88d06ae5336cf949e8c9201073fe09b7.png _images/a7585b27f8e720f4f618c844118b4f2efb6bc1df8d4348a619846962a675ea9a.png

Notice how we have decreased the learning rate at each step corresponding to a cosine distribution. Specifically, we follow the formula below:

\(\eta_{t} = \eta_{\text{min}} + \frac{1}{2} (\eta_{\text{max}} - \eta_{\text{min}}) \left(1 + \cos\left(\frac{T_{\text{cur}}}{T_{\text{max}}} \pi\right)\right)\)

\(\eta_{t+1} = \eta_{t} + \frac{1}{2} (\eta_{\text{max}} - \eta_{\text{min}}) \left(1 - \cos\left(\frac{T_{\text{cur}}}{T_{\text{max}}} \pi\right)\right)\)

where:

  • \(\eta_{t}\) is the learning rate at time (t),

  • \(\eta_{t+1}\) is the learning rate at time (t+1),

  • \(\eta_{\text{min}}\) is the minimum learning rate,

  • \(\eta_{\text{max}}\) is the maximum learning rate,

  • \(T_{\text{cur}}\) is the current time step,

  • \(T_{\text{max}}\) is the maximum time step.

with \(T_{\text{cur}} \neq (2k+1)T_{\text{max}}\),

and \( T_{\text{cur}} = (2k+1)T_{\text{max}}. \)

Testing the Performance#

We have trained our model, and at each epoch we have saved the weights of the model to .pth file. This will be located in a folder corresponding to the “name” field of the config dictionary.

Lets load in the model, and look at some metrics.

input_shape = config['model']['input_shape']
output_shape = config['model']['output_shape']
net = MLP(input_shape=input_shape,output_shape=output_shape)
t_params = sum(p.numel() for p in net.parameters())
print("Network Parameters: ",t_params)
device = torch.device('cuda')
net.to('cuda')

dicte = torch.load(os.path.join(config['name'],os.listdir("MyRegressionModel")[-1]))
net.load_state_dict(dicte["net_state_dict"])
Network Parameters:  316865
<All keys matched successfully>
test_loader = DataLoader(test_dataset,batch_size=config['dataloader']['test']['batch_size'])
net.eval() # Eval mode 

predictions = []
y_true = []
kbar = pkbar.Kbar(target=len(test_loader), width=20, always_stateful=False)
for i,data in enumerate(test_loader):
    x = data[0].to('cuda').float()
    
    
    with torch.set_grad_enabled(False): # Same as with torch.no_grad():
        y_hat = net(x).detach().cpu().numpy()
 
        if log_time: # Decalared above
            y_hat = np.exp(target_scaler.inverse_transform(y_hat.reshape(-1,1))).flatten()
            y_true.append(np.exp(target_scaler.inverse_transform(data[1].numpy().reshape(-1,1))).flatten())
        else:
            y_true.append(data[1].numpy())
            
    predictions.append(y_hat)
    
    kbar.update(i)
    
predictions = np.concatenate(predictions)
y_true = np.concatenate(y_true)
127/149 [================>...] - ETA: 0s

Lets look at the relative residuals.

relative_residuals = (predictions - y_true) / (y_true + 1e-7)

plt.hist(relative_residuals,density=True,color='blue',histtype='step',label='Relative Residuals',range=[-1,1],bins=100)
plt.title('Relative Residuals - Testing Data',fontsize=25,pad=10)
plt.xlabel(r'$\frac{\hat{y} - y_{true.}}{y_{true.}}$',fontsize=25,labelpad=10)
plt.ylabel('Density',fontsize=25,labelpad=10)
plt.tick_params(axis='both', labelsize=18)
_images/4a61da6984ceaa5f432b14ea07fe5d7df8df6fa0679b4a5a4662c1f80110e0be.png

Uh oh. We have a large spike at -1. Can anyone hypothesize why this could be the case?

Can you look into the predicted and true values to get some intuition?