Generated by Janus Pro 7B https://www.yeschat.ai/features/janus-pro
I have trouble with PyMC. PyMC is a probabilistic programming library for Python for building Bayesian models. Whenever I try to use it, I end up searching the web or asking an LLM to help with current syntax. It seems that there are two active versions, PyMC3 and PyMC5. The apis for the versions differ. PyMC5 is still under development. My trouble is that LLMs confuse the two and mix up the syntax of the versions. Searching the web leads to more confusion because many posts don't indicate which version they are referring to.
ChatGPT o3-mini has just been released. I wanted to see how DeepSeek-R1 stacked against it for a simple programming task using PyMC. I gave both the same simple prompt.
generate python code to locate change points in a time series using pymc5
ChatGPT o3-mini did a reasonable job. The code worked without modifications. It generated some nice graphs using ArviZ to illustrate the solution.
import numpy as np import matplotlib.pyplot as plt import pymc as pm import arviz as az # --------------------------- # Generate synthetic time series data # --------------------------- np.random.seed(42) n = 100 # total number of time points true_cp = 60 # true change point position (index) x = np.arange(n) # Create data: first segment with mean 0, second segment with mean 3 y = np.concatenate([ np.random.normal(loc=0, scale=1, size=true_cp), np.random.normal(loc=3, scale=1, size=n - true_cp) ]) # --------------------------- # Define the PyMC model # --------------------------- with pm.Model() as model: # Change point: a discrete random variable between 0 and n cp = pm.DiscreteUniform("cp", lower=0, upper=n) # Parameters for the two segments: mu1 = pm.Normal("mu1", mu=0, sigma=10) mu2 = pm.Normal("mu2", mu=0, sigma=10) sigma1 = pm.HalfNormal("sigma1", sigma=1) sigma2 = pm.HalfNormal("sigma2", sigma=1) # Use pm.math.switch to select parameters based on whether the time index is # before or after the change point. Note: the comparison (x < cp) works because # PyMC will broadcast x (a NumPy array) to the appropriate tensor. mu = pm.math.switch(x < cp, mu1, mu2) sigma = pm.math.switch(x < cp, sigma1, sigma2) # Likelihood: assume normally distributed observations with piecewise mean/sigma. y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y) # Sample from the posterior trace = pm.sample(2000, tune=1000, return_inferencedata=True, target_accept=0.95) # --------------------------- # Examine and visualize the results # --------------------------- # Print a summary for the change point print(az.summary(trace, var_names=["cp"])) # Plot trace and posterior distributions for key parameters az.plot_trace(trace, var_names=["cp", "mu1", "mu2", "sigma1", "sigma2"]) plt.tight_layout() plt.show() # Plot the observed data along with the inferred change point cp_posterior = trace.posterior["cp"].values.flatten() # Use the median of the posterior for the change point estimate cp_estimate = int(np.median(cp_posterior)) print(f"Estimated change point: {cp_estimate}") plt.figure(figsize=(10, 6)) plt.plot(x, y, "o-", label="Data") plt.axvline(cp_estimate, color="red", linestyle="--", label=f"Estimated CP (index {cp_estimate})") plt.axvline(true_cp, color="green", linestyle=":", label=f"True CP (index {true_cp})") plt.xlabel("Time Index") plt.ylabel("Observed Value") plt.legend() plt.title("Change Point Detection using PyMC 5") plt.show()
DeeepSeek generated successful code. It didn't supply the plots that ChatGT produced, but the code was adequate and the plots could have been added easily.
import numpy as np import pymc as pm import arviz as az import matplotlib.pyplot as plt # Generate synthetic time series data with a change point np.random.seed(42) n_points = 100 change_point = 50 time = np.arange(n_points) # Create a time series with a change in mean mu1, mu2 = 0, 5 # Means before and after the change point sigma = 1 # Standard deviation data = np.concatenate([ np.random.normal(mu1, sigma, change_point), np.random.normal(mu2, sigma, n_points - change_point) ]) # Plot the time series plt.figure(figsize=(10, 4)) plt.plot(time, data, label="Time Series Data") plt.axvline(change_point, color="red", linestyle="--", label="True Change Point") plt.title("Time Series with Change Point") plt.xlabel("Time") plt.ylabel("Value") plt.legend() plt.show() # Bayesian model for change point detection with pm.Model() as model: # Prior for the change point (uniform over the time range) tau = pm.DiscreteUniform("tau", lower=0, upper=n_points - 1) # Priors for the means before and after the change point mu1 = pm.Normal("mu1", mu=0, sigma=10) mu2 = pm.Normal("mu2", mu=0, sigma=10) # Standard deviation (assumed constant) sigma = pm.HalfNormal("sigma", sigma=1) # Mean of the time series (changes at the change point) mean = pm.math.switch(time >= tau, mu2, mu1) # Likelihood (observed data) likelihood = pm.Normal("likelihood", mu=mean, sigma=sigma, observed=data) # Sample from the posterior trace = pm.sample(2000, tune=1000, chains=2, return_inferencedata=True) # Plot the posterior distribution of the change point az.plot_posterior(trace, var_names=["tau"], hdi_prob=0.95) plt.axvline(change_point, color="red", linestyle="--", label="True Change Point") plt.title("Posterior Distribution of Change Point") plt.xlabel("Time") plt.legend() plt.show() # Summary of the posterior distribution summary = az.summary(trace, var_names=["tau", "mu1", "mu2", "sigma"]) print(summary)
This was a very simple test, but DeepSeek continues to amaze.
No comments:
Post a Comment