Another Neural Net Regression

 If you search the web for neural net regression, you will find a number of examples. Most begin by generating some data from  a linear or sinusoidal function and adding a bit of noise. They then fit the resulting data with a neural net created with a tool such as PyTorch

In this post, I want to try to accomplish two things: learn a bit about neural nets and explore PyTorch. Neural nets are often discussed in relation to deep learning. Deep learning typically involves large amounts of data and multiple network layers with complex architectures that are used in areas such as computer vision or speech recognition. The example we are considering isn't particularly deep, but I hope it illustrates a a few simple principles of neural nets. PyTorch is a Python machine learning toolkit developed by Facebook. Its main features are tensor computing and automatic numerical differentiation.

Rather than using generated data, I want to use real data and see what happens as I use a net for regression on the data. The data I will use is Temperature anomaly estimates for the period 1850 to 2021 from the NASA Global Climate Change site. We have seen this data before: here, here, and here.


A Simple Model

We will create a simple neural net with three linear hidden layers connected by LeakyReLU activation functions. The linear layers apply a linear transformation ${y = x{A^T} + b }$ to the data. A LeakyRelU function is linear for positive values and has a small negative slope for negative values.

class Net(torch.nn.Module):    
    def __init__(self, layer1_out = 200, layer2_out = 100):
        super(Net, self).__init__()
        self.linear1 = torch.nn.Linear(1, layer1_out)
        self.linear2 = torch.nn.Linear(layer1_out, layer2_out)
        self.linear3 = torch.nn.Linear(layer2_out, 1)

        self.leaky_rlu = torch.nn.LeakyReLU()

    def forward(self, x):
        x = self.linear1(x)
        x = self.leaky_rlu(x)
        x = self.linear2(x)
        x = self.leaky_rlu(x)
        x = self.linear3(x)
        return x

layer1_out and layer2_out and the output and input sizes of the linear layers.

Alternatively, we cound have used PyTorch's convenient function for creating a net.

net = torch.nn.Sequential(
        torch.nn.Linear(1, 200),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(200, 100),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(100, 1),
    )

The other necessary features of a PyTorch net are the the loss function, optimizer, and the data loader.  The loss function compares the output of the net to the desired output; in our case, the input data values. It returns a number indicating some measure of how well the net matches the input. We will use a mean squared error loss. Once the loss is calculated, the optimizer performs backpropagation by stochastic gradient descent to adjust the weights of the layers (A and b in the linear equation above).  The data loader does what it sounds like; it feed chunks of data to the net. We will use a loader that randomly selects data, shuffles it, and runs a number of worker processes to process the data.

# construct the model
    net = Net(layer1_out = layer1_out, layer2_out = layer2_out)

    optimizer = torch.optim.Adam(net.parameters(), lr = lr)
    loss_func = torch.nn.MSELoss()  # mean squared loss

    # set up data for loading
    dataset = Data.TensorDataset(x, y)
    loader = Data.DataLoader(
        dataset = dataset, 
        batch_size = batch_size, 
        shuffle = True, 
        num_workers = workers,)

Once the components are set up, we loop through running the net, calculate loss, optimize and backpropagate. We do this many times and in our case select the model with the minimum loss. We could also just accept the last model from the loop.

# start training
    best_model = None
    min_loss = sys.maxsize
    for epoch in range(epochs):
        for step, (batch_x, batch_y) in enumerate(loader): 
        
            prediction = net(batch_x)

            loss = loss_func(prediction, batch_y)     
            print('epoch:', epoch, 'step:', step, 'loss:', loss.data.numpy())
            if loss.data.numpy() < min_loss:
                min_loss = loss.data.numpy()
                best_model = copy.deepcopy(net)

            optimizer.zero_grad()   # clear gradients for next iteration
            loss.backward()         # backpropagation, compute gradients
            optimizer.step()        # apply gradients

The other parts of the program simply read teh data and plot the output.

And the results...

After running for 2000 epochs, the model with the lowest loss produced this fit.



The fit is somewhat choppy. I think is to be expected because it's essentially a piecewise linear model. It follows the trends in the data reasonably well. Amore sophisticated model might to better.

 These are the setting for the model:

input_file: /mnt/d/Documents/analytic_garden/Torch/data/test2.csv
output_path: /mnt/d/Documents/analytic_garden/Torch/output
batch_size: 20
epochs: 2000
learning_rate: 0.01
layer1_out: 200
layer2_out: 100
workers: 10
rand_seed: 1645299534
R^2: 0.944360090530363
Best model loss: 0.007875178

Computationally, achieving this fit is expensive. We're running 10 processes and a relatively large number of elements in the layers. 

A R-squared of 0.94 is reasonable fit.  However, if we look at the residuals, they are far from the normal distribution. For a  normal distribution of the residuals, the points would lie along the line.


The code can be found on GitHub.

No comments:

Post a Comment