TimeGPT-1

I didn't think this would work. I was right. TimeGPT-1 from Nixtla is a pretrained transformer model for time series forecasting and anomaly detection. It was trained on thousands of timeseries. The idea is that just like text, time series have recognizable patterns and an GPT can leverage these patterns to predict the next number in the series just as an LLM predicts the next token in a series of tokens.

I was interested in seeing how it performs on recent temperature anomaly data. Temperature anomaly is the difference of a temperature from a reference value. Here's an example.

The monthly temperature anomaly data for the plot was downloaded from https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.0.2.0/download.html.  The plot shows the temperature anomaly for June of each year. The data represents temperature anomalies (deg C) relative to the average global temperature from 1961-1990. 

Notice the jump in the temperature anomaly at the last two points (2023 and 2024). 

The reason I though that TimeGPT-1 would not work well on this data is because global temperature is driven by complex forcing functions. This isn't a flaw in TimeGPT, Global temperature can't easily be analyzed simply by the statistical properties of the series. It requires a fairly complex physical model to make accurate predictions. 

Forecasting

Nixtlar provides an API for accessing TimeGPT-1. The first step is to get an API key from https://dashboard.nixtla.io/sign_in. If the API key is stored in the OS environment, TimeGPT will read it, otherwise you can pass it to Nixtla via the API.

Most of the examples on the Nixtla site are in Python, but there is also an API for R. I decided to try the R version because I haven't written any R code in a while.

TimeGPT-1 expects at least three columns in the data frame passed to the API calls: ds for time values, y for the time series values, and unique_id to indicate the data group. In our case, we will use a single identifier for the whole set. The column names can be renamed when passed in the API calls.

First, we read the HadCRUT data into a data frame.

> library(tidyverse)
> df_temp_anomaly <- read_csv("data/HadCRUT.5.0.2.0.analysis.summary_series.global.monthly.csv", name_repair = "universal")
> head(df_temp_anomaly)
# A tibble: 6 × 4
  Time    Anomaly..deg.C. Lower.confidence.limit..2.5.. Upper.confidence.limit..97.5..
  <chr>             <dbl>                         <dbl>                          <dbl>
1 1850-01          -0.675                        -0.982                        -0.367 
2 1850-02          -0.333                        -0.701                         0.0341
3 1850-03          -0.591                        -0.934                        -0.249 
4 1850-04          -0.589                        -0.898                        -0.279 
5 1850-05          -0.509                        -0.762                        -0.256 
6 1850-06          -0.344                        -0.609                        -0.0790
> 

Next, we will do some reformatting to get the data into a format compatible with TimeGPT-1.

> df3 <- df_temp_anomaly %>% 
      select(c(Time, Anomaly..deg.C.)) %>% 
      filter(grepl("-06", Time)) %>% 
      add_column(unique_id = "1") %>% 
      rename(ds = Time) %>% 
      mutate(ds = paste(ds, "-15 12:00:00", sep = '')) %>% 
      rename(y = Anomaly..deg.C.) %>% 
      mutate(ds = as.POSIXct(ds))
> head(df3)
# A tibble: 6 × 3
  ds                        y unique_id
  <dttm>                <dbl> <chr>    
1 1850-06-15 12:00:00 -0.344  1        
2 1851-06-15 12:00:00 -0.137  1        
3 1852-06-15 12:00:00 -0.0837 1        
4 1853-06-15 12:00:00 -0.142  1        
5 1854-06-15 12:00:00 -0.299  1        
6 1855-06-15 12:00:00 -0.333  1        
> 

Finally, call the API functions to forecast the temperature anomaly for 2023 and 2024. h is the number of steps to forecast.
  
> library(nixtlar)
> nixtla_client_fcst <- nixtla_client_forecast(df3[1:173, ], h = 2, level = c(80,95), freq ="Y")
> nixtla_client_plot(df3, nixtla_client_fcst, max_insample_length = 200)
> 

nixtla_client_fcst contains the predictions.

> head(nixtla_client_fcst)
  unique_id                  ds   TimeGPT TimeGPT-lo-95 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-95
1         1 2023-06-15 12:00:00 0.8353620     0.5658436     0.6342132      1.036511      1.104880
2         1 2024-06-15 12:00:00 0.8588773     0.5448156     0.5791281      1.138626      1.172939
> 

The plot reveals how far off the predictions were.


The jump during the 2023-2024 period couldn't be accurately predicted. That's not surprising. I don't think any algorithm that relies only on past data could forecast how rapidly the global temperature is increasing.

We can try and see how the forecasts fit the past. Here are the results for fitting six years after 2000.
 
> nixtla_client_fcst <- nixtla_client_forecast(df3[1:151, ], h = 6, level = c(80,95), freq ="Y")
> nixtla_client_plot(df3, nixtla_client_fcst, max_insample_length = 200)


This is a bit better, but still can't keep up with the global climate forcing.

Despite not doing well on the HadCRUT data, I think this approach has potential for series that aren't driven by complex dynamic forces.

Privacy

When you use the TimeGPT-1 API, your data is sent to Nixtla. If you are concerned about third parties using your proprietary data, read the Nixtla privacy notice carefully. In particular,

As noted above, we may use Content you provide us to improve our Services, for example to train the models that power TimeGPT. Fill this form to opt out of our use of your Content to train our models.

Pricing

Nixtla has an interesting pricing structure for TimGPT. There is an initial 30-day free trial that allows a generous 10,000 API calls. After the trial period, users on the free tier are allowed 50 API calls. Otherwise, the price is $99/10,000 AI calls, There are increased rates for more API calls per month.

No comments:

Post a Comment