https://en.wikipedia.org/wiki/Chromosome#/media/File:Chromosome_structure.png
I have been interested in evolutionary computation (EC) for quite a while. Evolutionary computation is a family of algorithms for global optimization inspired by biological evolution. Genetic algorithms (GA) are a form of EC commonly used for the solution of optimization problems. The method is inspired by natural selection.
In a genetic algorithm, a population of candidate solutions to the optimization problem are encoded as chromosomes, an analogy to biological chromosomes. Each chromosome is an encoding of the parameters needed to generate a possible value of the function being optimized. The encoding might be a bitmap, a vector, program code, or any representation that can be used to calculate the objective function. For each chromosome, a fitness or score is calculated. The score is the value of the objective function for that chromosome. The full evaluation of the set of chromosome is called a generation. A selection method chooses candidate chromosomes to form the next generations. A crossover operation is applied to pairs of chromosomes to form new child chromosomes. A mutation operator can be applied to the new population to provide more variation in the solutions generated. The process is repeated for a fixed number of generations or until the objective function stops changing.
Genetic algorithms are not algorithms. They are heuristics. For any given problem, a representation must be chosen and mutations and crossover operations defined. Defining these can be the difficult part of applying a GA to a specific problem. In addition, there are hyperparameters that must be defined including the population size, the number of generations for the process to run, and depending on the encoding method other possible parameters.
GAs can be useful for problems that have a complex fitness landscape. This is because mutation and crossover, can move the population away from local optima where a traditional algorithm might get stuck.
Change Point
The change point problem is finding the positions where the underlying probability distribution changes occur in a timeseries. I have looked at the change point problem previously, for example here. A genetic algorithm is not a good method for finding change points. If you really want to find the change points in series, use something like the ruptures library for python or one of the many R packages for change point detection. GAs tend to be more computationally expensive than methods specifically designed specifically for change point detection.
Nevertheless, I want to try two things: play with a GA and see if an LLM could generate satisfactory genetic algorithm code. I needed a test problem and since I have had experience with change point detection, it seemed like a relatively easy problem to tackle.
In what follows, we will use the terms change point and breakpoint interchangeably.
Vibe Coding
Vibe coding is a development method that uses AI to generate code from natural language prompts. It usually involves attacking a problem with less planning than traditional software development methods. It can be very effective for smaller self-contained projects. It may be a an inappropriate approach for complex mult-developer projects.
I asked ChatGPT 5 to generate python code to find the change points in a timeseries using a GA. To my surprise, it generated working code. I made extensive changes to the original code, but the framework came from ChatGPT.
GA
initialize population gen = 0 best = None
fitness = -Inf while gen < total_generations calculate fitness for each chromosome new_pop = [] while size new_pop < size previous generation select 2 chromosomes apply crossover operator apply mutation operator add 2 new chromosomes to new_pop population = new_pop if best fitness in population > best; best = best in population gen = gen + 1 report best
Data
def generate_piecewise_linear(n: int = 200, K: int = 3, x_range: tuple[float, float] = (0, 10), slopes: np.ndarray | None = None, intercept: float = 0.0, noise_std: float = 0.1, random_state: int | None = None) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]: """ Generate synthetic piecewise linear data with K segments. Partially generated by ChatGPT 5. Parameters ---------- n : int Number of x points. K : int Number of breakpoints (=> K+1 segments). x_range : tuple Range of x values (min, max). slopes : list or None Slopes for each segment. If None, random slopes are drawn. intercept : float Starting intercept (at x = x_range[0]). noise_std : float Standard deviation of Gaussian noise. random_state : int or None Random seed. Returns ------- x : ndarray Sorted x values. y_true : ndarray True (noise-free) piecewise linear values. y : ndarray Noisy observations. breakpoints : ndarray Breakpoint x-locations (excluding endpoints). """ seed = random_state if seed is None: seed = int(time.time()) rng = np.random.default_rng(seed) rng = np.random.default_rng(random_state) x = np.linspace(x_range[0], x_range[1], n) # choose K breakpoints randomly within range breaks = np.sort(rng.uniform(x_range[0], x_range[1], size=K)) # slopes if slopes is None: slopes = rng.uniform(-1.5, 1.5, size=K+1) # compute true y values segment by segment y_true = np.empty_like(x) current_intercept = intercept seg_start = x_range[0] for j in range(K+1): if j < K: seg_end = breaks[j] else: seg_end = x_range[1] idx = (x >= seg_start) & (x <= seg_end) y_true[idx] = slopes[j] * (x[idx] - seg_start) + current_intercept # update intercept for continuity current_intercept = slopes[j] * (seg_end - seg_start) + current_intercept seg_start = seg_end # add Gaussian noise y = y_true + rng.normal(0, noise_std, size=n) print("Breakpoints:", breaks) print("Slopes:", slopes) print("c0:", intercept) return x, y_true, y, breaks
The Problem
The Population
We will represent the chromosome, i.e. possible solutions to the optimization problem, as an array of real numbers. For a solution with K changepoints (K+1 segments), the chromosome would be:
\[\left[ {brea{k_0},brea{k_1}...brea{k_{K - 1}},slop{e_0},slop{e_1}...slop{e_K},{c_0}} \right]\]
To start the GA process we generate an array of n chromosomes.
Evolution
The evolutionary process follows the pseudocode show above. A model, stored as a dictionary is returned at the end of the evolution.
def evolve(x: np.ndarray, y: np.ndarray, rng: np.random._generator.Generator, K: int = 3, lam: float = 0.0, pop_size: int = 200, generations: int = 400, elite: int = 5, eta = 100) -> dict: """Evolve a population of chromosomes Parameters ---------- x : np.ndarray sequence of x values y : np.ndarray the seuences of y values rng : np.random._generator.Generator the random number generator K : int, optional the number of change points (breaks), by default 3 lam : float, optional _description_, by default 0.0 pop_size : int, optional population size, by default 200 generations : int, optional number of generations to evolve, by default 400 elite : int, optional the number of top chromosomes to carry to next generation, by default 5 eta : int, optional parameter for the SBX crossover, by default 100 Returns ------- dictionary with key-value pairs "breakpoints", "slopes", "c0", "predict": a function for calculating the objective function, "loss": mse loss (best fitness), "BIC": Bayesian Information Criterion """ slope_scale = float(np.std(y)) / max(float(np.std(x)), 1e-8) pop = init_population(rng, pop_size = pop_size, K = K, slope_scale = slope_scale) fitness = np.array([loss(g, x, y, K, lam) for g in pop]) for _ in range(generations): # Elitism elite_idx = np.argsort(fitness)[:elite] new_pop = [pop[i].copy() for i in elite_idx] # Tournament selection def tournament(): i, j = rng.integers(0, pop_size, 2) return pop[i] if fitness[i] < fitness[j] else pop[j] # Create offspring while len(new_pop) < pop_size: p1, p2 = tournament(), tournament() c1, c2 = crossover(p1, p2, rng, eta = eta) c1 = mutate(c1, rng, K=K) c2 = mutate(c2, rng, K=K) new_pop.extend([c1, c2]) pop = np.array(new_pop[:pop_size]) fitness = np.array([loss(g, x, y, K, lam) for g in pop]) # print(fitness[np.argmin(fitness)], pop[np.argmin(fitness)]) best = pop[np.argmin(fitness)] x_min, x_max = x.min(), x.max() b, m, c0 = decode_genome(best, K, x_min, x_max) mse = float(np.min(fitness)) rss = mse * len(y) bic = len(y) * np.log(float(rss)/len(y)) + (2 * K + 2) * np.log(len(y)) return {"breakpoints": b, "slopes": m, "c0": c0, "predict": lambda X: piecewise_predict(np.asarray(X), b, m, c0), "loss": mse, "BIC": bic}
Crossover
def crossover(p1: np.ndarray, p2: np.ndarray, rng: np.random._generator.Generator, eta: float = 10.0, prob: float = 0.8) -> tuple[np.ndarray, np.ndarray]: """SBX crossover Parameters ---------- p1 : np.ndarray a chromosome p2 : np.ndarray a chromosome rng : np.random._generator.Generator random number generatoe eta : float, optional distribution index by default 10.0 prob : float, optional crossover probability, by default 0.8 Returns ------- tuple[np.ndarray, np.ndarray]
a pair of chromosomes,eith the original pair or the new children """ if rng.random() > prob: return p1.copy(), p2.copy() c1, c2 = p1.copy(), p2.copy() for i in range(len(p1)): u = rng.random() if u <= 0.5: beta = (2*u)**(1/(eta+1)) else: beta = (1/(2*(1-u)))**(1/(eta+1)) c1[i] = 0.5*((1+beta)*p1[i] + (1-beta)*p2[i]) c2[i] = 0.5*((1-beta)*p1[i] + (1+beta)*p2[i]) return c1, c2
Mutation
def mutate(g: np.ndarray, rng: np.random._generator.Generator, sigma_knots: float = 0.05, sigma_slopes: float = 0.1, sigma_c0: float = 0.1, p: float = 0.2, K: int =1 ) -> np.ndarray: """Gaussian mutation. Parameters ---------- g : np.ndarray chromosome rng : np.random._generator.Generator random number generator sigma_knots : float, optional st dev of Gaussian probability for joints, by default 0.05 sigma_slopes : float, optional st dev of Gaussian for slopes, by default 0.1 sigma_c0 : float, optional st dev of Gaussian for c0, by default 0.1 p : float, optional probability of a mutation, by default 0.2 K : int, optional number of breakpoints, by default 1 Returns ------- np.ndarray a new chromosome """ out = g.copy() for i in range(len(g)): if rng.random() < p: if i < K: # breakpoint gene val = out[i] + rng.normal(0, sigma_knots) # reflect to keep in (0,1) if val < 0: val = -val if val > 1: val = 2 - val out[i] = np.clip(val, 0, 1) elif i < K + (K+1): # slope gene out[i] += rng.normal(0, sigma_slopes) else: # c0 out[i] += rng.normal(0, sigma_c0) return out
Getting Stuck
How Many Change Points?
Results
$ python src/ga_cp_bic.py /mnt/c/Users/bbth/OneDrive/analytic_garden/GA/data/test2.csv -R 5 -m 7 Num Breakpoints: 0 Restart: 1 MSE: 0.768200131276766 BIC: -93.49906777840822 Restart: 2 MSE: 0.768200131276766 BIC: -93.49906777840822 Restart: 3 MSE: 0.768200131276766 BIC: -93.49906777840822 Restart: 4 MSE: 0.768200131276766 BIC: -93.49906777840822 Restart: 5 MSE: 0.768200131276766 BIC: -93.49906777840822 Train MSE: 0.768200131276766 Random number seed: 1757530319 Breaks: [] Slopes: [0.87498511] Initial slope: -1.062104184879057 BIC: -93.49906777840822 Num Breakpoints: 1 Restart: 1 MSE: 0.26713128422814697 BIC: -504.0401578908922 Restart: 2 MSE: 0.26713135999662774 BIC: -504.0400444358617 Restart: 3 MSE: 0.2671315303845627 BIC: -504.0397892986428 Restart: 4 MSE: 0.26713128688267707 BIC: -504.0401539160226 Restart: 5 MSE: 0.2671313067687818 BIC: -504.04012413874835 Train MSE: 0.26713128422814697 Random number seed: 1757530319 Breaks: [3.03255148] Slopes: [-0.10502121 1.15256772] Initial slope: 0.7821082655597978 BIC: -504.0401578908922 Num Breakpoints: 2 Restart: 1 MSE: 0.07598078790615184 BIC: -994.9611172618687 Restart: 2 MSE: 0.07603601788638055 BIC: -994.6704652783811 Restart: 3 MSE: 0.07604884651200713 BIC: -994.6029838720873 Restart: 4 MSE: 0.07597941263212685 BIC: -994.9683574419693 Restart: 5 MSE: 0.07601333506921201 BIC: -994.7898097774975 Train MSE: 0.07597941263212685 Random number seed: 1757530319 Breaks: [3.34024974 7.51526035] Slopes: [-0.18064767 1.49689334 0.30740931] Initial slope: 0.861732960505384 BIC: -994.9683574419693 Num Breakpoints: 3 Restart: 1 MSE: 0.010960306302913442 BIC: -1757.4583038258043 Restart: 2 MSE: 0.07604734720171356 BIC: -982.6279408939248 Restart: 3 MSE: 0.07596282888136476 BIC: -983.0727444260882 Restart: 4 MSE: 0.010562970908833194 BIC: -1772.2285653062725 Restart: 5 MSE: 0.010432261329197197 BIC: -1777.209172976474 Train MSE: 0.010432261329197197 Random number seed: 1757530319 Breaks: [1.33971739 3.00878892 7.51557945] Slopes: [ 1.12909401 -1.00764137 1.49937372 0.30578179] Initial slope: -0.07847355995515726 BIC: -1777.209172976474 Num Breakpoints: 4 Restart: 1 MSE: 0.07622839759249472 BIC: -969.6938400876858 Restart: 2 MSE: 0.011545398555128799 BIC: -1724.6726804846996 Restart: 3 MSE: 0.010440330102803444 BIC: -1764.9169857229363 Restart: 4 MSE: 0.07604291515918285 BIC: -970.6683244955417 Restart: 5 MSE: 0.07601372148631955 BIC: -970.82191817677 Train MSE: 0.010440330102803444 Random number seed: 1757530319 Breaks: [0.80667405 1.33855579 3.05499137 7.48528168] Slopes: [ 0.93920416 1.18604746 -0.94027751 1.51162942 0.31542384] Initial slope: 0.013945172373571815 BIC: -1764.9169857229363 Num Breakpoints: 5 Restart: 1 MSE: 0.010644732063791634 BIC: -1745.178485874649 Restart: 2 MSE: 0.012666979023629087 BIC: -1675.6051248217907 Restart: 3 MSE: 0.011837283458410761 BIC: -1702.702870828842 Restart: 4 MSE: 0.010654131471211147 BIC: -1744.8254376281354 Restart: 5 MSE: 0.01087133533109335 BIC: -1736.7527213735639 Train MSE: 0.010644732063791634 Random number seed: 1757530319 Breaks: [1.33809471 2.99997902 5.39033556 7.10730355 7.50918906] Slopes: [ 1.15367754 -1.01715291 1.50646275 1.46105877 1.61661661 0.3051271 ] Initial slope: -0.11025318081226192 BIC: -1745.178485874649 Num Breakpoints: 6 Restart: 1 MSE: 0.011239317804984892 BIC: -1711.4543482865681 Restart: 2 MSE: 0.013798197447378584 BIC: -1629.4064223827509 Restart: 3 MSE: 0.0134995356186553 BIC: -1638.1594934394984 Restart: 4 MSE: 0.019203054716150737 BIC: -1497.193861462054 Restart: 5 MSE: 0.011973759948817077 BIC: -1686.1345740932345 Train MSE: 0.011239317804984892 Random number seed: 1757530319 Breaks: [1.1595025 1.46699623 3.0337865 7.62772597 8.77150888 9.86875214] Slopes: [ 0.97415186 0.58236226 -0.97489684 1.49773889 0.13340346 0.42822295 -0.27696669] Initial slope: -0.005901498216313833 BIC: -1711.4543482865681 Num Breakpoints: 7 Restart: 1 MSE: 0.012618147112265589 BIC: -1653.1842689184527 Restart: 2 MSE: 0.015023512816917873 BIC: -1583.3920808586547 Restart: 3 MSE: 0.01117985689526189 BIC: -1701.5932118075225 Restart: 4 MSE: 0.016076479643951244 BIC: -1556.2957539720474 Restart: 5 MSE: 0.015254660080210364 BIC: -1577.2846646775888 Train MSE: 0.01117985689526189 Random number seed: 1757530319 Breaks: [1.31181149 2.40743245 2.67010934 3.0885786 3.33149654 7.12396452 7.68387703] Slopes: [ 1.14648308 -0.93700508 -0.90226644 -0.61927955 1.38019676 1.51134006 1.14414582 0.2976773 ] Initial slope: -0.11001194593707966 BIC: -1701.5932118075225 ==================================== Best Model Train MSE: 0.010432261329197197 Random number seed: 1757530319 Breaks: [1.33971739 3.00878892 7.51557945] Slopes: [ 1.12909401 -1.00764137 1.49937372 0.30578179] Initial slope: -0.07847355995515726 BIC: -1777.209172976474 $
No comments:
Post a Comment