The Analytic Garden

Bayesian Optimization

Suppose you have a box that generates numbers. Inside the box is some mechanism that generates the numbers when you ask for them. You give the box an X value and it spits out a y. You can't open the box and you don't have a description of the method used to generate numbers. You would like to model what is going on inside the box. In particular you would like to find the optimum (maximum or minimum) of the possible output and you would like to get an idea of the internal generating function.

It's not too hard to accomplish getting the optimum and an approximation to the function if it's not expensive to generate a number from the box. You could generate 10,000 or a million X values in some range and plot the y values. Pick the maximum for the optimum and the plot gives you a picture of the generating function.

What if the generating those Y values is expensive. For example if instead of a box, we had to go into a lab and do an experiment to get a value. We would like to find an approximate optimum and function plot with as small cost as possible. This is where Bayesian optimization shines.

Bayesian optimization is a probabilistic global optimization strategy. It doesn't assume convexity or any particular functional form. It can be used to optimize an unknown (or at least minimally known) function with multiple local optima. It's used in AI to optimize hyperparameters.

Bayesian optimization requires two basic procedures: a surrogate model and an acquisition function. The surrogate is an approximate model of the black box function. It is designed to be sampled efficiently. The acquisition function is a method of selecting the next sample.

To make this more concrete. let's suppose we sample a small number of X values at random from a uniform distribution and extract some y values from the box. Our goal is to find an approximate probability distribution for the black box function f.

In other words, we want

\[P(f|D) = \frac{{P(D|f)P(f)}}{{P(D)}}\]

D is the data we have seen so far.

We can simplify a bit by eliminating the normalization $P(D)$.

\[P(f|D) \propto P(D|f)P(f)\]

$P(f|D)$ is our current estimate of the probability distribution generated by f given the current data.

Objective Function

There are a number of articles and packages for Bayesian optimization in R. For example, here, here, and here. The code that follows was inspired by this excellent blog post. The code from that post is written in Python. What follows is a recasting of the approach into R.

Here is our black box function. noise_level is the standard deviation of Gaussian noise. We will assume that the range of the function is [0, 1].

# Black box function with noise
#' objective - generate data for optimization
#'
#' @param x - x x 1 matrix of floats
#' @param noise_level - sd of Gaussian noise 
#'
#' @returns a black box function of floats
objective <- function(x, noise_level = 0.1) {
  x <- as.numeric(x)
  noise = rnorm(length(x), mean = 0, sd = noise_level)
  return(sin(12 * pi * x) / (1+x) + 2 * cos(7 * pi * x)* x ^5 + noise)
}

We can plot the optimization function and some mildly noisy data.

> X <- seq(0, 1, length.out = 100)
> y <- objective(X)
> y_exact <- objective(X, noise_level = 0)
> df_opt <- data.frame(x = X, y = y, y_exact = y_exact)
> ggplot(df_opt) + geom_line(aes(x = x, y = y_exact)) + geom_point(aes(x = x, y = y)) + ggtitle('Objective Function)
>

We see that there are multiple local optima with a maximum somewhere near x = 0.9.

Surrogate Function

The surrogate is an approximation to the objective function. It summarizes $P(f|D)$. The two most common methods used to approximate the objective are a Gaussian Process or a random forest. For our example we will use a Gaussian process from the R package GPfit.

Given a Gaussian process model and some data X, the surrogate is simple. It uses the model to predict new y values.

# Surrogate prediction function
#' surrogate - approximate the objective, P(f | data)
#'
#' @param model - a GP model
#' @param X - an n x 1 matrix
#'
#' @returns a list of the predicted y values and standard deviation
surrogate <- function(model, X) {
  pred <- predict(model, X)   # predict from GPfit
  return(list(mean=pred$Y_hat, sd=sqrt(pred$MSE)))
}

The Acquisition Function

The acquisition function retrieves y values from the surrogate for specific X samples. This can be accomplished in a number of ways. For example, we could use a simple random sample. In practice there are three common methods: probability of improvement (PI), expected improvement (EI), and upper confidence bound (UCB). PI is not as efficient as the other two, but it's easy to calculate. We will use PI.

\[PI = \frac{{cdf(\mu - {\mu _{best}})}}{\sigma }\]

${cdf}$ is the normal cumulative distribution function. $mu$ is the mean of surrogate function for the sample, and ${{\mu _{best}}}$ is mean of the surrogate for the best sample so far.

# Acquisition function: Probability of Improvement
#' acquisition - calculate the probability that a sample worth evaluating given 
#' the current data.
#' This function calculates the expected improvement (PI) of adding new samples
#'
#' @param X - an n x 1 matrix
#' @param Xsamples an n x 1 matrix of samples
#' @param model - the current GP model
#'
#' @returns the expected improvement (PI) from Xsamples
acquisition <- function(X, model) {
  Xsamples <- matrix(runif(100), ncol=1)
  
  yhat <- surrogate(model, X)$mean
  best <- max(yhat)  # best so far
  
  pred <- surrogate(model, Xsamples) # get the mean y and sd of the predictions
  mu <- pred$mean
  std <- pred$sd
  probs <- pnorm((mu - best) / (std + 1e-9))  # 1e-09 to avoind divide by 0
  
  ix <- which.max(probs)
  return(Xsamples[ix, 1])
}

Putting It All Together

To perform the optimization, we will generate a small set of samples at random. We then loop for a fixed number of iterations using the acquisition function to chose a new point, add it to the data and update the model. Finally we will uses the surrogate to estimate the new function mean.

In addition, we will calculate the objective for comparison and plot the fit so far.

Here's the whole thing.

# Load libraries
library(GPfit)
library(ggplot2)

# Black box function with noise
#' objective - generate data for optimization
#'
#' @param x - x x 1 matrix of floats
#' @param noise_level - sd of Gaussian noise 
#'
#' @returns a black box function of floats
objective <- function(x, noise_level = 0.1) {
  x <- as.numeric(x)
  noise = rnorm(length(x), mean = 0, sd = noise_level)
  return(sin(12 * pi * x) / (1+x) + 2 * cos(7 * pi * x)* x ^5 + noise)
}

# Surrogate prediction function
#' surrogate - approximate the objective, P(f | data)
#'
#' @param model - a GP model
#' @param X - an n x 1 matrix
#'
#' @returns a list of the predicted y values and standard deviation
surrogate <- function(model, X) {
  pred <- predict(model, X)
  return(list(mean=pred$Y_hat, sd=sqrt(pred$MSE)))
}

# Acquisition function: Probability of Improvement
#' acquisition - calculate the probability that a sample worth evaluating given 
#' the current data.
#' This function calculates the expected improvement (PI) of adding new samples
#'
#' @param X - an n x 1 matrix
#' @param Xsamples an n x 1 matrix of samples
#' @param model - the current GP model
#'
#' @returns the expected improvement (PI) from Xsamples
acquisition <- function(X, model) {
  Xsamples <- matrix(runif(100), ncol=1)
  
  yhat <- surrogate(model, X)$mean
  best <- max(yhat)  # best so far
  
  pred <- surrogate(model, Xsamples) # get the mean y and sd of the predictions
  mu <- pred$mean
  std <- pred$sd
  probs <- pnorm((mu - best) / (std + 1e-9))  # 1e-09 to avoind divide by 0
  
  ix <- which.max(probs)
  return(Xsamples[ix, 1])
}

# Plotting function
#' plot_model - plot the current model
#'  The objective is plotted without noise for comparison
#'
#' @param X - an n 1 matrix
#' @param y - current y predictions
#' @param model - GP model
#' @param title - optional title for plot
#'
#' @returns a ggplot object for printing
plot_model <- function(X, y, model, title = NULL) {
  # get the current surrogate model
  Xsamples <- seq(0, 1, length.out=1000)
  Xsamples_matrix <- matrix(Xsamples, ncol=1)
  preds <- surrogate(model, Xsamples_matrix)$mean
  
  df <- data.frame(x=as.vector(X), y=as.vector(y)) # sampled points so far
  pred_df <- data.frame(x=Xsamples, y=preds)       # predicted model
  y_obj <- objective(Xsamples, noise = 0)          # objective function
  df_obj <- data.frame(x = Xsamples, y = y_obj)
  
  p <- ggplot() +
        geom_point(data=df, aes(x=x, y=y, color='Samples')) +
        geom_line(data=pred_df, aes(x=x, y=y, color='Prediction')) +
        geom_line(data = df_obj, aes(x = x, y =y, color = 'Objective')) +
        scale_color_manual(name = 'Bayesian Optimization',
                           breaks = c('Samples', 'Prediction', 'Objective'),
                           values = c('Samples' = 'blue', 'Prediction' = 'red', 'Objective' = 'green'))
  if(! is.null(title)) {
    p <- p + ggtitle(title)
  }
  
  return(p)
}

# Initial data
# set.seed(42)
X <- matrix(runif(5), ncol=1)
# y <- apply(X, 1, objective)
y <- objective(X)

# Fit initial GP model
model <- GP_fit(X, y)

# Initial plot
print(plot_model(X, y, model))

# Optimization loop
# plot the mode every 10 iterations
for (i in 1:100) {
  x_next <- acquisition(X, model)
  y_next <- objective(x_next)
  X <- rbind(X, x_next)
  y <- c(y, y_next)
  model <- GP_fit(X, y)
  est <- surrogate(model, matrix(x_next, ncol=1))$mean
  cat(sprintf(">x=%.3f, f()=%.6f, actual=%.3f\n", x_next, est, y_next))
  
  if(i %% 10 == 0) {
    p <- plot_model(X, y, model, title = paste('Iteration:', i))
    # png(paste('/mnt/c/Users/bbth/OneDrive/analytic_garden/Bayes_opt/plots/plot_', i, '.png'))
    print(p)
    # dev.off()
  }
}

# Final plot
print(plot_model(X, y, model, title ='Final'))

# Best result
ix <- which.max(y)
cat(sprintf("Best Result: x=%.3f, y=%.3f\n", X[ix], y[ix]))

Here are the plots. As the acquisition function add more points, the surrogate quickly fit the function.

Here's the final plot. The predictions fit the function well and there are a large number of samples around the optimal peak. The final optimum is x=0.864, y=1.654.

H5N1 Update

By Ahmed Mostafa, Elsayed M. Abdelwhab, Thomas C. Mettenleiter, and Stephan Pleschka - mdpi.com/1999-4915/10/9/497/htm, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=92987475

H5N1, bird flu, continues to infect wild and domestic birds, cattle, cats, and humans. So far, luck has been with us and H5N1 hasn't become a serious threat to humans. That is unless you don't consider the price of eggs and the contribution that issue made to the current chaos and incompetence in Washington DC.

On Feb. 28, 2025, I downloaded 6,623 H5N1 HA sequences in FASTA format from GISAID in order to look at the current state of mutations in the virus data. The analysis below is similar to posts here and here.

I read the sequences into a dataframe with the FASTA header information becoming the columns of the dataframe.

> df_2025_02_28 <- fasta2dataframe("data/gisaid_epiflu_sequence_HA_2025_02_28.fasta")

The fasta2dataframe function is described in this post.

fasta2dataframe filters out all sequences with length below 1700. Next I wrote the filtered sequences to a new FASTA file.

> library(LocaTT)
> write.fasta(names = df_2025_02_28$seq.name, sequences = df_2025_02_28$seq.text, file = "data/gisaid_epiflu_sequence_HA_2025_02_28_len.fasta")

The sequences were aligned with MAFFT.

time mafft --6merpair --maxambiguous 0.05 --preservecase --thread -1 --addfragments data/gisaid_epiflu_sequence_HA_2025_02_28_len.fasta data/HA_reference.fasta > data/gisaid_epiflu_sequence_HA_2025_02_28_aln.fasta

I made some modification to simplify the Python program gisaid_H5N1_mutations.py.The updated program is available on GitHub. The first 10 rows of the output CSV file are show below.

reference_position	alignment_pos	ref_freq	alt_freq	ref_nucleotide	codon_pos	ref_aa	aa_name	ref_codon	alt_nucleotide	alt_aa	alt_aa_name	alt_codon	mutation	synonomous
4	37	0.8040201005025126	0.1564070351758794	G	4	E	Glu	GAG	A	K	Lys	AAG	E4K	non_syn
6	39	0.7986809045226131	0.1727386934673367	G	4	E	Glu	GAG	A	E	Glu	GAA	E4E	syn
18	51	0.6105527638190955	0.37170226130653267	A	16	L	Leu	CTA	T	L	Leu	CTT	L16L	syn
31	64	0.7842336683417085	0.198178392	G	31	V	Val	GTT	A	I	Ile	ATT	V31I	non_syn
93	126	0.6526381909547738	0.3473618090452261	A	91	Q	Gln	CAA	G	Q	Gln	CAG	Q91Q	syn
123	156	0.7956972361809045	0.20100502512562815	T	121	T	Thr	ACT	C	T	Thr	ACC	T121T	syn
154	187	0.8285175879396985	0.17132537688442212	A	154	T	Thr	ACA	G	A	Ala	GCA	T154A	non_syn
171	206	0.5855841708542714	0.39777010050251255	A	169	L	Leu	CTA	C	L	Leu	CTC	L169L	syn
174	210	0.5860552763819096	0.41363065326633164	C	172	C	Cys	TGC	T	C	Cys	TGT	C172C	syn
177	213	0.5910804020100503	0.40891959798994976	C	175	D	Asp	GAC	T	D	Asp	GAT	D175D	syn

Luckily, again no new reported mutations that seem likely to increase the ability of the virus to invade human cells.

The code can be found on GitHub.

TimeGPT-1

I didn't think this would work. I was right. TimeGPT-1 from Nixtla is a pretrained transformer model for time series forecasting and anomaly detection. It was trained on thousands of timeseries. The idea is that just like text, time series have recognizable patterns and an GPT can leverage these patterns to predict the next number in the series just as an LLM predicts the next token in a series of tokens.

I was interested in seeing how it performs on recent temperature anomaly data. Temperature anomaly is the difference of a temperature from a reference value. Here's an example.

The monthly temperature anomaly data for the plot was downloaded from https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.0.2.0/download.html. The plot shows the temperature anomaly for June of each year. The data represents temperature anomalies (deg C) relative to the average global temperature from 1961-1990.

Notice the jump in the temperature anomaly at the last two points (2023 and 2024).

The reason I though that TimeGPT-1 would not work well on this data is because global temperature is driven by complex forcing functions. This isn't a flaw in TimeGPT, Global temperature can't easily be analyzed simply by the statistical properties of the series. It requires a fairly complex physical model to make accurate predictions.

Forecasting

Nixtlar provides an API for accessing TimeGPT-1. The first step is to get an API key from https://dashboard.nixtla.io/sign_in. If the API key is stored in the OS environment, TimeGPT will read it, otherwise you can pass it to Nixtla via the API.

Most of the examples on the Nixtla site are in Python, but there is also an API for R. I decided to try the R version because I haven't written any R code in a while.

TimeGPT-1 expects at least three columns in the data frame passed to the API calls: ds for time values, y for the time series values, and unique_id to indicate the data group. In our case, we will use a single identifier for the whole set. The column names can be renamed when passed in the API calls.

First, we read the HadCRUT data into a data frame.

> library(tidyverse)

> df_temp_anomaly <- read_csv("data/HadCRUT.5.0.2.0.analysis.summary_series.global.monthly.csv", name_repair = "universal")
> head(df_temp_anomaly)
# A tibble: 6 × 4
  Time    Anomaly..deg.C. Lower.confidence.limit..2.5.. Upper.confidence.limit..97.5..
  <chr>             <dbl>                         <dbl>                          <dbl>
1 1850-01          -0.675                        -0.982                        -0.367 
2 1850-02          -0.333                        -0.701                         0.0341
3 1850-03          -0.591                        -0.934                        -0.249 
4 1850-04          -0.589                        -0.898                        -0.279 
5 1850-05          -0.509                        -0.762                        -0.256 
6 1850-06          -0.344                        -0.609                        -0.0790
>

Next, we will do some reformatting to get the data into a format compatible with TimeGPT-1.

> df3 <- df_temp_anomaly %>% 
      select(c(Time, Anomaly..deg.C.)) %>% 
      filter(grepl("-06", Time)) %>% 
      add_column(unique_id = "1") %>% 
      rename(ds = Time) %>% 
      mutate(ds = paste(ds, "-15 12:00:00", sep = '')) %>% 
      rename(y = Anomaly..deg.C.) %>% 
      mutate(ds = as.POSIXct(ds))
> head(df3)
# A tibble: 6 × 3
  ds                        y unique_id
  <dttm>                <dbl> <chr>    
1 1850-06-15 12:00:00 -0.344  1        
2 1851-06-15 12:00:00 -0.137  1        
3 1852-06-15 12:00:00 -0.0837 1        
4 1853-06-15 12:00:00 -0.142  1        
5 1854-06-15 12:00:00 -0.299  1        
6 1855-06-15 12:00:00 -0.333  1        
>

Finally, call the API functions to forecast the temperature anomaly for 2023 and 2024. h is the number of steps to forecast.

> library(nixtlar)
> nixtla_client_fcst <- nixtla_client_forecast(df3[1:173, ], h = 2, level = c(80,95), freq ="Y")
> nixtla_client_plot(df3, nixtla_client_fcst, max_insample_length = 200)
>

nixtla_client_fcst contains the predictions.

> head(nixtla_client_fcst)
  unique_id                  ds   TimeGPT TimeGPT-lo-95 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-95
1         1 2023-06-15 12:00:00 0.8353620     0.5658436     0.6342132      1.036511      1.104880
2         1 2024-06-15 12:00:00 0.8588773     0.5448156     0.5791281      1.138626      1.172939
>

The plot reveals how far off the predictions were.

The jump during the 2023-2024 period couldn't be accurately predicted. That's not surprising. I don't think any algorithm that relies only on past data could forecast how rapidly the global temperature is increasing.

We can try and see how the forecasts fit the past. Here are the results for fitting six years after 2000.

> nixtla_client_fcst <- nixtla_client_forecast(df3[1:151, ], h = 6, level = c(80,95), freq ="Y")
> nixtla_client_plot(df3, nixtla_client_fcst, max_insample_length = 200)

This is a bit better, but still can't keep up with the global climate forcing.

Despite not doing well on the HadCRUT data, I think this approach has potential for series that aren't driven by complex dynamic forces.

Privacy

When you use the TimeGPT-1 API, your data is sent to Nixtla. If you are concerned about third parties using your proprietary data, read the Nixtla privacy notice carefully. In particular,

As noted above, we may use Content you provide us to improve our Services, for example to train the models that power TimeGPT. Fill this form to opt out of our use of your Content to train our models.

Pricing

Nixtla has an interesting pricing structure for TimGPT. There is an initial 30-day free trial that allows a generous 10,000 API calls. After the trial period, users on the free tier are allowed 50 API calls. Otherwise, the price is $99/10,000 AI calls, There are increased rates for more API calls per month.

Test Driving DeepSeek -R1 Part 3

Generated by Janus Pro 7B https://www.yeschat.ai/features/janus-pro

I have trouble with PyMC. PyMC is a probabilistic programming library for Python for building Bayesian models. Whenever I try to use it, I end up searching the web or asking an LLM to help with current syntax. It seems that there are two active versions, PyMC3 and PyMC5. The apis for the versions differ. PyMC5 is still under development. My trouble is that LLMs confuse the two and mix up the syntax of the versions. Searching the web leads to more confusion because many posts don't indicate which version they are referring to.

ChatGPT o3-mini has just been released. I wanted to see how DeepSeek-R1 stacked against it for a simple programming task using PyMC. I gave both the same simple prompt.

generate python code to locate change points in a time series using pymc5

ChatGPT o3-mini did a reasonable job. The code worked without modifications. It generated some nice graphs using ArviZ to illustrate the solution.

import numpy as np
import matplotlib.pyplot as plt
import pymc as pm
import arviz as az

# ---------------------------
# Generate synthetic time series data
# ---------------------------
np.random.seed(42)

n = 100  # total number of time points
true_cp = 60  # true change point position (index)
x = np.arange(n)

# Create data: first segment with mean 0, second segment with mean 3
y = np.concatenate([
    np.random.normal(loc=0, scale=1, size=true_cp),
    np.random.normal(loc=3, scale=1, size=n - true_cp)
])

# ---------------------------
# Define the PyMC model
# ---------------------------
with pm.Model() as model:
    # Change point: a discrete random variable between 0 and n
    cp = pm.DiscreteUniform("cp", lower=0, upper=n)

    # Parameters for the two segments:
    mu1 = pm.Normal("mu1", mu=0, sigma=10)
    mu2 = pm.Normal("mu2", mu=0, sigma=10)
    sigma1 = pm.HalfNormal("sigma1", sigma=1)
    sigma2 = pm.HalfNormal("sigma2", sigma=1)
    
    # Use pm.math.switch to select parameters based on whether the time index is
    # before or after the change point. Note: the comparison (x < cp) works because
    # PyMC will broadcast x (a NumPy array) to the appropriate tensor.
    mu = pm.math.switch(x < cp, mu1, mu2)
    sigma = pm.math.switch(x < cp, sigma1, sigma2)
    
    # Likelihood: assume normally distributed observations with piecewise mean/sigma.
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)
    
    # Sample from the posterior
    trace = pm.sample(2000, tune=1000, return_inferencedata=True, target_accept=0.95)

# ---------------------------
# Examine and visualize the results
# ---------------------------
# Print a summary for the change point
print(az.summary(trace, var_names=["cp"]))

# Plot trace and posterior distributions for key parameters
az.plot_trace(trace, var_names=["cp", "mu1", "mu2", "sigma1", "sigma2"])
plt.tight_layout()
plt.show()

# Plot the observed data along with the inferred change point
cp_posterior = trace.posterior["cp"].values.flatten()
# Use the median of the posterior for the change point estimate
cp_estimate = int(np.median(cp_posterior))
print(f"Estimated change point: {cp_estimate}")

plt.figure(figsize=(10, 6))
plt.plot(x, y, "o-", label="Data")
plt.axvline(cp_estimate, color="red", linestyle="--", label=f"Estimated CP (index {cp_estimate})")
plt.axvline(true_cp, color="green", linestyle=":", label=f"True CP (index {true_cp})")
plt.xlabel("Time Index")
plt.ylabel("Observed Value")
plt.legend()
plt.title("Change Point Detection using PyMC 5")
plt.show()

DeeepSeek generated successful code. It didn't supply the plots that ChatGT produced, but the code was adequate and the plots could have been added easily.

import numpy as np
import pymc as pm
import arviz as az
import matplotlib.pyplot as plt

# Generate synthetic time series data with a change point
np.random.seed(42)
n_points = 100
change_point = 50
time = np.arange(n_points)

# Create a time series with a change in mean
mu1, mu2 = 0, 5  # Means before and after the change point
sigma = 1  # Standard deviation
data = np.concatenate([
    np.random.normal(mu1, sigma, change_point),
    np.random.normal(mu2, sigma, n_points - change_point)
])

# Plot the time series
plt.figure(figsize=(10, 4))
plt.plot(time, data, label="Time Series Data")
plt.axvline(change_point, color="red", linestyle="--", label="True Change Point")
plt.title("Time Series with Change Point")
plt.xlabel("Time")
plt.ylabel("Value")
plt.legend()
plt.show()

# Bayesian model for change point detection
with pm.Model() as model:
    # Prior for the change point (uniform over the time range)
    tau = pm.DiscreteUniform("tau", lower=0, upper=n_points - 1)

    # Priors for the means before and after the change point
    mu1 = pm.Normal("mu1", mu=0, sigma=10)
    mu2 = pm.Normal("mu2", mu=0, sigma=10)

    # Standard deviation (assumed constant)
    sigma = pm.HalfNormal("sigma", sigma=1)

    # Mean of the time series (changes at the change point)
    mean = pm.math.switch(time >= tau, mu2, mu1)

    # Likelihood (observed data)
    likelihood = pm.Normal("likelihood", mu=mean, sigma=sigma, observed=data)

    # Sample from the posterior
    trace = pm.sample(2000, tune=1000, chains=2, return_inferencedata=True)

# Plot the posterior distribution of the change point
az.plot_posterior(trace, var_names=["tau"], hdi_prob=0.95)
plt.axvline(change_point, color="red", linestyle="--", label="True Change Point")
plt.title("Posterior Distribution of Change Point")
plt.xlabel("Time")
plt.legend()
plt.show()

# Summary of the posterior distribution
summary = az.summary(trace, var_names=["tau", "mu1", "mu2", "sigma"])
print(summary)

This was a very simple test, but DeepSeek continues to amaze.

Test Driving DeepSeek -R1 Part 2

In a previous post, I looked at DeepSeek-R1's code generating capabilities. In this post, I will examine it's ability to answer a more general question.

Willy Vlautin

Willy Vlautin

Photo Larry D. Moore, CC BY 4.0, Wikimedia Commons.

Willy Vlautin is my favorite contemporary American author. Vlautin is the author of several novels including The Motel Life, Northline, Lean on Pete, The Free, Don't Skip Out On Me, The Night Always Comes, and The Horse. He is also a songwriter and musician, recording and writing with the bands Richmond Fontaine and The Delines.

I have read all of Vlautin's novels. I decided to ask ChatGPT and DeepSeek for authors similar to Willy Vlautin.

recommend some authors similar to Willy Vlautin

First, I tried ChatGPT o1-min. Here was the response.

recommend some authors similar to Willy Vlautin

If you appreciate Willy Vlautin's gritty, character-driven narratives that delve into the struggles of everyday life, you might enjoy the following authors who explore similar themes and styles:

1. **Richard Ford**
   - *Notable Works:* *The Sportswriter*, *Independence Day*
   - *Why Similar:* Ford's novels often focus on the complexities of middle-class American life with deep psychological insight, much like Vlautin's exploration of working-class struggles.

2. **Denis Johnson**
   - *Notable Works:* *Tree of Smoke*, *Jesus' Son*
   - *Why Similar:* Johnson's writing combines lyrical prose with raw, gritty subject matter, capturing the depths of human experience in a manner reminiscent of Vlautin.

3. **Charles Bukowski**
   - *Notable Works:* *Post Office*, *Factotum*
   - *Why Similar:* Bukowski's semi-autobiographical novels depict the bleak and often harsh realities of life, echoing Vlautin's unflinching portrayal of his characters' lives.

4. **Raymond Carver**
   - *Notable Works:* *What We Talk About When We Talk About Love*, *Cathedral*
   - *Why Similar:* Carver's minimalist style and focus on ordinary, struggling individuals parallel Vlautin's narrative approach and themes.

5. **Hubert Selby Jr.**
   - *Notable Works:* *Last Exit to Brooklyn*, *Requiem for a Dream*
   - *Why Similar:* Selby Jr.'s intense, raw storytelling and exploration of marginalized lives align closely with Vlautin's thematic interests.

6. **John Steinbeck**

- *Notable Works:* *The Grapes of Wrath*, *East of Eden*
   - *Why Similar:* Steinbeck's profound empathy for the working class and his ability to depict societal struggles resonate with Vlautin's narratives.

7. **Jonathan Tropper**
   - *Notable Works:* *This Book Will Save Your Life*, *The Married Man*
   - *Why Similar:* Tropper blends humor with poignant character studies, exploring personal and socioeconomic challenges in a way that complements Vlautin's storytelling.

8. **Cormac McCarthy**
   - *Notable Works:* *The Road*, *No Country for Old Men*
   - *Why Similar:* McCarthy's stark, powerful prose and focus on survival and morality mirror the intense, character-focused narratives found in Vlautin's work.

9. **Steve Jones**
   - *Notable Works:* *Fool's Gold*, *Women’s Prison*
   - *Why Similar:* Jones writes about marginalized individuals and subcultures with authenticity and depth, similar to Vlautin's focus on the fringes of society.

10. **Emma Donoghue**
    - *Notable Works:* *Room*, *Frog Music*
    - *Why Similar:* Donoghue's ability to delve deep into her characters' psyches and present their struggles with empathy and realism aligns well with Vlautin's literary style.

These authors offer a range of perspectives and styles that complement Willy Vlautin's focus on the human condition, societal challenges, and richly drawn characters. Exploring their works should provide a satisfying continuation of the themes and emotional depth you enjoy in Vlautin's novels.

It's a good selection of authors and they do have similar styles and themes.

Here's what the DeepSeek-R1 web version returned for the same prompt.

Willy Vlautin is known for his gritty, heartfelt novels that often explore themes of working-class struggles, loneliness, and resilience. If you enjoy his writing, you might appreciate authors with similar styles or themes. Here are a few recommendations:
1. **Larry Brown** - Known for his raw and unflinching portrayals of rural life and working-class characters, Brown's works like *Joe* and *Fay* resonate with the same emotional depth as Vlautin's.
2. **Daniel Woodrell** - Often called the "poet of the Ozarks," Woodrell's novels, such as *Winter's Bone* and *The Outlaw Album*, delve into the lives of marginalized people in rural America, much like Vlautin's stories.
3. **Kent Haruf** - Haruf's novels, including *Plainsong* and *Eventide*, focus on small-town life and the quiet struggles of ordinary people, offering a similar sense of empathy and humanity.
4. **Raymond Carver** - A master of minimalist fiction, Carver's short stories, like those in *What We Talk About When We Talk About Love*, capture the quiet desperation and resilience of everyday people.
5. **Denis Johnson** - Johnson's works, such as *Jesus' Son* and *Train Dreams*, often explore themes of addiction, redemption, and the lives of the downtrodden, with a poetic and gritty edge.
6. **Richard Ford** - Known for his *Bascombe* series and *Canada*, Ford's writing often examines the complexities of ordinary lives and the landscapes of the American West.
7. **Annie Proulx** - Proulx's novels, like *The Shipping News* and *Brokeback Mountain*, focus on the struggles of individuals in harsh environments, with a strong sense of place and character.
8. **Donald Ray Pollock** - His works, such as *The Devil All the Time* and *Knockemstiff*, are dark, gritty, and deeply rooted in the lives of working-class characters in rural America.
These authors share a commitment to exploring the lives of ordinary people with compassion and authenticity, much like Willy Vlautin. If you enjoy his storytelling, you'll likely find something to appreciate in their works as well.

This seems like a good answerer and similar to ChatGPT.

I tried the same prompt on the 7B version of DeepSeek-R1.

# Default 7B model (4.7GB - ideal for consumer GPUs)
$ ollama run deepseek-r1

======================================================================


>>> recommend some authors similar to Willy Vlautin
<think>
Okay, so I need to find some authors similar to Willy Vlautin. Let me start by understanding what
makes him unique or popular. From the previous response, it seems he's known for his dark fantasy
and gothic themes with strong female leads. His writing has emotional depth and world-building that
feels expansive.

First, I'll think about other authors in the same genre—dark fantasy and gothic fiction. Authors
like Brandon Sanderson come to mind because he's well-known for his dark fantasy, but maybe not as
much for the emotional aspects. Then there's Anne Bishop; she writes urban fantasy with strong
female characters, which is similar.

I should also consider authors who blend fantasy with mystery or psychological elements since those
could be related to Willy Vlautin's style. Maybe someone like Joe Abercrombie? He's known for dark
fantasy and has strong, complex female protagonists, so that might fit.

Looking at the previous suggestions, I see names like Alafarea Baumeister and TamsynIIDalglan. Those
seem to be in similar sub-genres within gothic or urban fantasy. I should check if these authors are
frequently compared with Vlautin.

I wonder about authors who focus on dark themes but have lighter elements, so perhaps someone like
Sarah Reeswig? She's written some steampunk and has strong female leads, which might be comparable.

Also, there's the author "V," who wrote "The House of Hounds." That sounds like it fits into a
similar speculative fiction sub-genre with strong characters and world-building.

I should also think about newer authors or those who have gained popularity recently in these
genres. Maybe someone like C.J. Box if they're into psychological suspense, though he's more on the
thriller side.

Overall, I need to compile a list of authors known for dark fantasy, gothic fiction, strong female
characters, emotional depth, and well-crafted worlds. Making sure each author has elements that
match Willy Vlautin's style will be key.
</think>

Here is a curated list of authors who share similarities with Willy Vlautin, focusing on dark
fantasy, gothic fiction, strong female leads, emotional depth, and expansive world-building:

1. **Brandon Sanderson** - Renowned for his dark fantasy, though more action-oriented, his intricate
worlds resonate similarly.

2. **Anne Bishop** - Known for urban fantasy with compelling female protagonists, her themes align
with Vlautin's dark and atmospheric style.

3. **Joe Abercrombie** - Celebrated for dark fantasy and complex female characters, his work shares
emotional and world-building depth.

4. **Alafarea Baumeister** - Specializes in gothic horror with strong narratives that evoke the same
rich, mysterious atmospheres.

5. **TamsynIIDalglan** - Focuses on gothic fantasy with dark elements and complex female
protagonists, fitting Willy Vlautin's style.

6. **Sarah Reeswig** - Combines steampunk with dark themes and strong female leads, offering a
unique blend within speculative fiction.

7. **V** (Author of "The House of Hounds") - Known for speculative fiction with intricate worlds and
compelling characters.

8. **C.J. Box** - While primarily in psychological suspense, his work delves into thriller elements
suitable for Vlautin's genre overlap.

Each author offers a unique perspective within their respective genres, contributing to a rich
tapestry of storytelling that resonates with fans of Willy Vlautin's style.

Not even close. I would never describe Vlautin's work as dark fantasy or gothic!

I tried the same prompt with the 32B parameter model. It was slow but the results weren't bad.

$ ollama run deepseek-r1:32b
>>> recommend authors similar to Willy Vlautin

>>> recommend authors similar to Willy Vlautin
<think>
Okay, so I need to find authors similar to Willy Vlautin. Let me think about what makes his writing
unique. He's known for gritty settings and flawed characters, often dealing with themes of isolation
and the American experience. His stories have a raw, emotional intensity.

First, I should consider other authors who write in a similar genre—maybe literary fiction or
hardboiled crime. People like Denis Johnson come to mind because he also writes about gritty,
morally complex characters. Then there's Jim Thompson; his noir style might align well with
Vlautin's work.

I should think about themes as well. Vlautin often explores addiction and the struggles of
working-class people. Authors like Tom DePastino or George Pelecanos focus on similar issues. Oh,
and Donald Ray Pollock has that gritty rural setting which could be a good match.

The pacing is another aspect—Vlautin's stories move quickly with a lot of tension. Bret Easton Ellis
writes in a way that's fast-paced and intense too. Maybe Raymond Chandler? He's more classic noir
but still captures that edgy vibe.

I should also consider the emotional depth. Richard Price has that ability to dig deep into
characters' psyches, which is something Vlautin does well. Denis Johnson again fits here with his
focus on damaged individuals.

Some newer authors might be similar too. Jason Mott or C.E. Morgan come to mind because they deal
with tough subjects and have a strong narrative voice.

I need to make sure I cover a range of recommendations, not just the obvious ones. Maybe include
some international authors as well, like Jo Nesbø, who has that dark, emotional storytelling style.

Wait, did I miss any? Let me see—Ellison Cobb writes about addiction and isolation, which fits. And
Megan Abbott's noir with psychological depth could resonate with Vlautin fans too.

I think that covers a good range of authors across different sub-genres but all share elements like
gritty settings, complex characters, and emotional intensity.
</think>

To find authors similar to Willy Vlautin, consider the following recommendations, each capturing
elements of his unique style:

1. **Denis Johnson** - Known for gritty, morally complex characters and themes of isolation.

2. **Jim Thompson** - A master of noir with a focus on flawed characters in intense situations.

3. **Tom DePastino** - Explores addiction and the struggles of working-class individuals.

4. **George Pelecanos** - Focuses on urban settings and moral ambiguity, blending crime with social
commentary.

5. **Donald Ray Pollock** - Offers a gritty rural perspective with dark humor and complex
characters.

6. **Bret Easton Ellis** - Known for fast-paced, intense narratives exploring societal issues.

7. **Raymond Chandler** - Classic noir style with a focus on edgy, atmospheric storytelling.

8. **Richard Price** - Delves deep into characters' psyches, capturing their struggles and
complexities.

9. **Jason Mott** - Deals with tough subjects like addiction and poverty with emotional depth.

10. **C.E. Morgan** - Combines lyrical prose with exploration of social issues in rural settings.

11. **Jo Nesbø** (Norway) - Offers dark, psychological crime stories with a focus on family
dynamics.

12. **Ellison Cobb** - Writes about addiction and isolation, capturing the rawness of life's
struggles.

13. **Megan Abbott** - Known for noir fiction with psychological depth and strong narrative tension.

These authors share Vlautin's themes of gritty settings, flawed characters, emotional intensity, and
exploration of complex social issues.

Not a bad list. I don't know if I would have included Jim Thompson. He novels tend to lack the sympathy for his characters that Vlautin has. I'm not familiar with some of the authors. This response gives me some new authors to check out.