Surrogate model
Low-fidelity model that approximates a high-fidelity (expensive) objective function. Approximates the costly objective function to reduce computational cost.
Definition
Fidelity: The degree to which a model accurately represents the real-world process it is intended to simulate.
- : upper level objective (actual objective, expensive)
- : surrogate model (approximation, cheap)
- : noise
Optimization steps
If some hyperparameter is selected, the model can be trained to obtain the test objective function value .
However, this process is time-consuming and subject to noise due to training randomness .
Therefore, instead of directly optimizing , the goal is to create a surrogate model based on a few computed results and use it for more efficient optimization.
- Approximate a function
- Select next hyperparameter
Radial Basis Functions (RBF) + Stochastic Sampling
- euclidean distance is being used
- radial = distance from a center point
- : center points (previously evaluated hyperparameters)
- : average (expected) test objective function values at those points
- : function that depends only on the distance from the center (ex: Gaussian kernel, , etc.)
- : coefficients that determine the influence of each center
- : linear polynomial tail (linear trend representing the overall trend)
Conditions
For the surrogate model to work properly, the following two conditions must be satisfied:
-
Interpolation Condition: The surrogate model must match the actual experimental values at all experimental points.
-
Trend Separation Condition: To prevent the RBF part from mixing with the linear trend,
Solving for coefficients
For parameters and determined by solving a linear system of equations:
- To solve for , the matrix on the left needs to be invertible.
- This happens if and only if the , where is the number of hyperparameters (dimension of ).
where:
- : RBF values based on distances between points
- : matrix formed by appending a column of ones to the coordinates of each point
- : upper level evaluation; vector of average objective function values obtained at each experimental point
Solving this system yields the coefficients and needed to construct the surrogate model .
Optimization Loop
- Choose a few hyperparameters and compute the actual objective function values .
- Solve the above system to build the surrogate model.
- Based on this model, select a new .
- Evaluate that in practice and return to step 2.
How to select the next to evaluate?
The next evaluation point is selected by balancing exploitation (choosing points with low predicted values) and exploration (choosing points in less explored areas).
- Local Search: Generate candidates by (small random value) around the best point found so far.
- Global Search: Randomly sample candidates from the entire search space.
- For each 2M candidates, compute two scores:
- RBF Score: The predicted value from the surrogate model .
- Distance Score: The minimum distance to any previously evaluated point.
- Scale both scores to the range [0, 1].
- Compute weighted scores
- Select the candidate with the lowest weighted score as the next evaluation point .
This method ensures a balance between exploring new areas and exploiting known good areas in the hyperparameter space.
- RBF score focuses on areas predicted to be good (exploitation).
- Distance score encourages sampling in less explored areas (exploration).
Note (Stochastic Sampling)
vector inside
In practice, since the expectation cannot be directly known, the same hyperparameter is run multiple times and averaged.
- This helps reduce noise caused by randomness.
- For important candidate points, the number of repetitions can be increased to estimate the average more accurately.
- If the noise is severe, a “regression” approach can be used instead of “interpolation” to allow for some deviation (e.g., ridge regression).
Code Examples
import numpy as np
# -------------------------# Example Objective Function# -------------------------def objective(x): # quadratic function with noise return np.sum(np.array(x)**2) + np.random.randn()*0.1
# -------------------------# RBF Surrogate# -------------------------class RBF_Surrogate: def __init__(self, phi="cubic"): if phi == "gaussian": self.phi = lambda r: np.exp(-(r**2)) elif phi == "linear": self.phi = lambda r: r else: # cubic self.phi = lambda r: r**3
def fit(self, X, y): self.X = np.array(X) self.y = np.array(y) N, d = self.X.shape
Phi = np.zeros((N, N)) for i in range(N): for j in range(N): Phi[i, j] = self.phi(np.linalg.norm(self.X[i] - self.X[j]))
P = np.hstack([self.X, np.ones((N, 1))]) A = np.block([ [Phi, P], [P.T, np.zeros((d+1, d+1))] ]) b = np.concatenate([self.y, np.zeros(d+1)])
sol = np.linalg.solve(A, b) self.lmbda = sol[:N] self.beta = sol[N:]
def predict(self, X_new): X_new = np.atleast_2d(X_new) y_pred = np.zeros(len(X_new))
for k, xk in enumerate(X_new): rbf_sum = 0 for i in range(len(self.X)): r = np.linalg.norm(xk - self.X[i]) rbf_sum += self.lmbda[i] * self.phi(r) trend = np.dot(self.beta[:-1], xk) + self.beta[-1] y_pred[k] = rbf_sum + trend return y_pred
# -------------------------# Data Storage# -------------------------class DataStore: def __init__(self, dim, m_init=5, bounds=(-2, 2)): self.dim = dim self.bounds = bounds self.m = m_init
self.S = np.random.uniform(bounds[0], bounds[1], size=(m_init, dim)) self.Y = np.array([objective(s) for s in self.S]) self.rbf = RBF_Surrogate() self.update_rbf()
def update_rbf(self): self.rbf.fit(self.S, self.Y)
def add_point(self, x): y = objective(x) self.S = np.vstack([self.S, x]) self.Y = np.append(self.Y, y) self.m += 1 self.update_rbf()
# -------------------------# Candidate Selection (RBF Score + Distance Score)# -------------------------def RBF_score(candidates, data, omega=0.5): s = data.rbf.predict(candidates) dist = np.array([np.min(np.linalg.norm(data.S - c, axis=1)) for c in candidates])
# 0~1 scaling s_scaled = (s - s.min()) / (s.max() - s.min() + 1e-8) d_scaled = (dist.max() - dist) / (dist.max() - dist.min() + 1e-8)
score = omega*d_scaled + (1-omega)*s_scaled return score
# -------------------------# Optimization Loop# -------------------------def run_rbf(dim=2, n_iter=10): np.random.seed(0) data = DataStore(dim=dim)
for it in range(n_iter): best_idx = np.argmin(data.Y) best_x = data.S[best_idx]
# generate candidates local_candidates = best_x + np.random.uniform(-0.5, 0.5, size=(20, dim)) global_candidates = np.random.uniform(data.bounds[0], data.bounds[1], size=(20, dim)) candidates = np.vstack([local_candidates, global_candidates])
# calculate scores score = RBF_score(candidates, data, omega=0.5)
# select next point next_x = candidates[np.argmin(score)] data.add_point(next_x)
print(f"Iter {it}: best_y={data.Y.min():.4f}, next_x={next_x}")
return data
# executeif __name__ == "__main__": result = run_rbf(dim=2, n_iter=10)Gaussian Process (GP) + Expected Improvement (EI)
Gaussian Process (GP) is another popular surrogate model. It uses the mean and covariance structure to model the function probabilistically.
- : mean of the stochastic process (constant)
- : GP’s randomness -
- at different locations are correlated based on distance
Note
- Mean : Predicted value of based on observed data (most likely value, expectation).
- Variance : Indicates how uncertain the prediction is.
- Near observed data points, variance is very small (almost 0), meaning the prediction is very certain.
- In unexplored regions far from observed data points, variance is large, meaning the prediction is very uncertain.
Correlation Structure
Correlation is defined using a kernel function, based on the assumption that “close inputs will have similar outputs”:
- : dimension of hyperparameters
- : scaling parameter for each dimension (estimated via maximum likelihood estimation)
- : smoothness parameter (determines the shape of the kernel, usually 1 or 2)
Given data , the mean and variance at a new point can be estimated as follows:
- : correlation matrix ()
- : upper level objective vector (actual observed values)
Predictions at a new point
Prediction based on the weighted average of existing observations.
- : weight part.
- : correlation vector between and existing points
- : correlation matrix among existing points.
Hence, the points of closer(more correlated) to have a higher weight in determining the prediction.
Uncertainty estimation at a new point
- : measures how well is explained by existing points.
- If is very close to existing points (well-explored area), this value approaches 1, making the overall variance close to 0 (low uncertainty).
- Conversely, if is far from existing points (unexplored area), this value approaches 0, leading to higher variance (high uncertainty).
Summary (Final optimization loop)
- Compute at a few initial points.
- Train the GP model to estimate and .
- Compute Expected Improvement for candidates .
- Select .
- Evaluate in practice and add to dataset → go back to step 2.
Expected Improvement (EI)
Find the optimal that maximizes using genetic algorithms, etc.
To select the next evaluation point, we consider the expected improvement over the current best.
-
Current best objective function value:
-
Improvement at a candidate point :
- : probabilistic variable reflecting uncertainty
- a value following a normal distribution with mean and variance
- : probabilistic variable reflecting uncertainty
-
Expected Improvement:
- : Cumulative Distribution Function (CDF)
- : Probability Density Function (PDF)
Code Example
import numpy as npimport arrayfrom deap import base, creator, tools, algorithmsfrom sklearn.gaussian_process import GaussianProcessRegressorfrom sklearn.gaussian_process.kernels import RBFfrom scipy.stats import norm
# -------------------------# Black-box Objective Function# -------------------------def objective(x): # quadratic function with noise return np.sum(np.array(x) ** 2) + np.random.randn() * 0.1
# -------------------------# Data Storage Class# -------------------------class DataStore: def __init__(self, dim, m_init=5, bounds=(-2, 2)): self.dim = dim self.bounds = bounds self.m = m_init # initial number of samples
self.S = np.random.uniform(bounds[0], bounds[1], size=(m_init, dim)) self.Y = np.array([objective(s) for s in self.S])
# GP surrogate self.gpr = GaussianProcessRegressor(kernel=RBF(), random_state=0) self.update_gp()
def update_gp(self): self.gpr.fit(self.S, self.Y.reshape(-1, 1))
def add_point(self, x): y = objective(x) self.S = np.vstack([self.S, x]) self.Y = np.append(self.Y, y) self.m += 1 self.update_gp()
# -------------------------# EI function# -------------------------def Expected_improvement(x, data): x_to_predict = np.array(x).reshape(1, -1) mu, sigma = data.gpr.predict(x_to_predict, return_std=True)
greater_is_better = False if greater_is_better: loss_optimum = np.max(data.Y[:data.m]) else: loss_optimum = np.min(data.Y[:data.m])
scaling_factor = (-1) ** (not greater_is_better)
with np.errstate(divide='ignore'): Z = scaling_factor * (mu - loss_optimum) / sigma expected_improvement = scaling_factor * (mu - loss_optimum) * norm.cdf(Z) + sigma * norm.pdf(Z) expected_improvement[sigma == 0.0] = 0.0 return -expected_improvement[0] # deap minimizes
# -------------------------# DEAP Setup# -------------------------def run_gp_ei(dim=2, n_iter=10): data = DataStore(dim=dim)
# Fitness definition creator.create("FitnessMin", base.Fitness, weights=(-1.0,)) creator.create("Individual", array.array, typecode='d', fitness=creator.FitnessMin)
toolbox = base.Toolbox()
# For each dimension, create attribute generator for i in range(dim): INT_MIN, INT_MAX = data.bounds toolbox.register(f"attr_float_{i}", np.random.uniform, INT_MIN, INT_MAX)
# Generate individuals toolbox.register("individual", tools.initCycle, creator.Individual, (toolbox.__getattribute__(f"attr_float_{i}") for i in range(dim)), n=1) toolbox.register("population", tools.initRepeat, list, toolbox.individual)
# Evaluate = EI toolbox.register("evaluate", Expected_improvement, data=data) toolbox.register("mate", tools.cxTwoPoint) toolbox.register("mutate", tools.mutUniformInt, low=[data.bounds[0]]*dim, up=[data.bounds[1]]*dim, indpb=0.2) toolbox.register("select", tools.selTournament, tournsize=3)
# ------------------------- # Optimization Loop # ------------------------- for it in range(n_iter): pop = toolbox.population(n=20) hof = tools.HallOfFame(1)
algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.3, ngen=15, halloffame=hof, verbose=False)
next_x = np.array(hof[0]) data.add_point(next_x)
print(f"Iter {it}: best_y={data.Y.min():.4f}, next_x={next_x}")
return data
# executeif __name__ == "__main__": result = run_gp_ei(dim=2, n_iter=10)