Test Driving DeepSeek -R1 Part 3

 


I have trouble with PyMC. PyMC is a probabilistic programming library for Python for building Bayesian models. Whenever I try to use it, I end up searching the web or asking an LLM to help with current syntax. It seems that there are two active versions, PyMC3 and PyMC5. The apis for the versions differ. PyMC5 is still under development. My trouble is that LLMs confuse the two and mix up the syntax of the versions. Searching the web leads to more confusion because many posts don't indicate which version they are referring to.  

ChatGPT o3-mini has just been released. I wanted to see how DeepSeek-R1 stacked against it for a simple programming task using PyMC. I gave both the same simple prompt.

generate python code to locate change points in a time series using pymc5

ChatGPT o3-mini did a reasonable job. The code worked without modifications. It generated some nice graphs using ArviZ to illustrate the solution.

import numpy as np
import matplotlib.pyplot as plt
import pymc as pm
import arviz as az

# ---------------------------
# Generate synthetic time series data
# ---------------------------
np.random.seed(42)

n = 100  # total number of time points
true_cp = 60  # true change point position (index)
x = np.arange(n)

# Create data: first segment with mean 0, second segment with mean 3
y = np.concatenate([
    np.random.normal(loc=0, scale=1, size=true_cp),
    np.random.normal(loc=3, scale=1, size=n - true_cp)
])

# ---------------------------
# Define the PyMC model
# ---------------------------
with pm.Model() as model:
    # Change point: a discrete random variable between 0 and n
    cp = pm.DiscreteUniform("cp", lower=0, upper=n)

    # Parameters for the two segments:
    mu1 = pm.Normal("mu1", mu=0, sigma=10)
    mu2 = pm.Normal("mu2", mu=0, sigma=10)
    sigma1 = pm.HalfNormal("sigma1", sigma=1)
    sigma2 = pm.HalfNormal("sigma2", sigma=1)
    
    # Use pm.math.switch to select parameters based on whether the time index is
    # before or after the change point. Note: the comparison (x < cp) works because
    # PyMC will broadcast x (a NumPy array) to the appropriate tensor.
    mu = pm.math.switch(x < cp, mu1, mu2)
    sigma = pm.math.switch(x < cp, sigma1, sigma2)
    
    # Likelihood: assume normally distributed observations with piecewise mean/sigma.
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)
    
    # Sample from the posterior
    trace = pm.sample(2000, tune=1000, return_inferencedata=True, target_accept=0.95)

# ---------------------------
# Examine and visualize the results
# ---------------------------
# Print a summary for the change point
print(az.summary(trace, var_names=["cp"]))

# Plot trace and posterior distributions for key parameters
az.plot_trace(trace, var_names=["cp", "mu1", "mu2", "sigma1", "sigma2"])
plt.tight_layout()
plt.show()

# Plot the observed data along with the inferred change point
cp_posterior = trace.posterior["cp"].values.flatten()
# Use the median of the posterior for the change point estimate
cp_estimate = int(np.median(cp_posterior))
print(f"Estimated change point: {cp_estimate}")

plt.figure(figsize=(10, 6))
plt.plot(x, y, "o-", label="Data")
plt.axvline(cp_estimate, color="red", linestyle="--", label=f"Estimated CP (index {cp_estimate})")
plt.axvline(true_cp, color="green", linestyle=":", label=f"True CP (index {true_cp})")
plt.xlabel("Time Index")
plt.ylabel("Observed Value")
plt.legend()
plt.title("Change Point Detection using PyMC 5")
plt.show()

DeeepSeek generated successful code. It didn't supply the plots that ChatGT produced, but the code was adequate and the plots could have been added easily.

import numpy as np
import pymc as pm
import arviz as az
import matplotlib.pyplot as plt

# Generate synthetic time series data with a change point
np.random.seed(42)
n_points = 100
change_point = 50
time = np.arange(n_points)

# Create a time series with a change in mean
mu1, mu2 = 0, 5  # Means before and after the change point
sigma = 1  # Standard deviation
data = np.concatenate([
    np.random.normal(mu1, sigma, change_point),
    np.random.normal(mu2, sigma, n_points - change_point)
])

# Plot the time series
plt.figure(figsize=(10, 4))
plt.plot(time, data, label="Time Series Data")
plt.axvline(change_point, color="red", linestyle="--", label="True Change Point")
plt.title("Time Series with Change Point")
plt.xlabel("Time")
plt.ylabel("Value")
plt.legend()
plt.show()

# Bayesian model for change point detection
with pm.Model() as model:
    # Prior for the change point (uniform over the time range)
    tau = pm.DiscreteUniform("tau", lower=0, upper=n_points - 1)

    # Priors for the means before and after the change point
    mu1 = pm.Normal("mu1", mu=0, sigma=10)
    mu2 = pm.Normal("mu2", mu=0, sigma=10)

    # Standard deviation (assumed constant)
    sigma = pm.HalfNormal("sigma", sigma=1)

    # Mean of the time series (changes at the change point)
    mean = pm.math.switch(time >= tau, mu2, mu1)

    # Likelihood (observed data)
    likelihood = pm.Normal("likelihood", mu=mean, sigma=sigma, observed=data)

    # Sample from the posterior
    trace = pm.sample(2000, tune=1000, chains=2, return_inferencedata=True)

# Plot the posterior distribution of the change point
az.plot_posterior(trace, var_names=["tau"], hdi_prob=0.95)
plt.axvline(change_point, color="red", linestyle="--", label="True Change Point")
plt.title("Posterior Distribution of Change Point")
plt.xlabel("Time")
plt.legend()
plt.show()

# Summary of the posterior distribution
summary = az.summary(trace, var_names=["tau", "mu1", "mu2", "sigma"])
print(summary)


This was a very simple test, but DeepSeek continues to amaze.

No comments:

Post a Comment