Learning by building: A convex solver
Theory
We are interested in solving the following convex optimisation problem:
where is our (primal) optimisation variable, is our objective function, are our inequality constraints, and is an affine equality constraint.
For this to be a convex optimisation problem the all the functions must be convex, which means that the function domain is a convex set, and for all the following holds:
We further require that all are twice continuously differentiable.
Convex optimisation problems are well studied because
- They are significantly easier to solve than non-convex problems
- Many problems can be recast or approximated as convex problems
A consequence of the convexity of the objective function is that if we can find any local minima of , then it must also be a global minima, and if is strongly convex, then there exists only one unique global minima.
We will build up to how to solve a fully constrained problem, starting from a significantly easier task, solving an unconstrained convex optimisation problem.
An unconstrained problem
Let's first tackle the simpler problem of
For simple forms of this may be analytically solvable, but for now we'll assume it isn't. We can use the fact that the function is convex and simply try to find a local minima, and if we do we will know that this is also a global minima, and we have succeeded.
To find a local minima we can use a descent method:
given x # Starting point
while True:
choose dx # Descent direction
choose t # Step size, chosen to minimise f(x + t*dx)
update x = x + t * dt
if (stopping criterion):
break
This is pretty straightforward to understand: Starting from an initial guess , we first pick a direction in which to search (), and then choose the amount we move in that direction (), then we update our current point , and start again, until we reach some pre-determined stopping threshold.
The Newton step
We want to pick as good as possible a so that our method converges as quickly as possible. Using the fact that is twice-differentiable we can write the second order approximation of near our current point as
which for small should be a pretty good approximation. The choice of which minimises this approximation is called the Newton step:
As an alternative we could use the first order approximation for , the minimiser for which is . This choice of step direction is just the negative gradient, and this method is called gradient descent. Newton's method generally converges in significantly fewer steps, but requires computing and inverting the Hessian .
Line search
The process of choosing , the amount by which move along the search direction to update is called a line search, as we are searching along the line . To be explicit, we find from:
Again for some simple forms of it may be possible to find the that exactly minimises this, however for our purposes an approximate method of finding a reasonable good choice of is enough. One such method is backtracking line search:
given x, Dx, alpha, beta # 0 < alpha < 0.5, 0 < beta < 1
t = 1
while f(x + t*Dx) > f(x) + alpha * t * grad(f).T @ Dx:
t = beta * t
This is called backtracking line search because we start with and then "backtrack" until we find a suitable value of .
Stopping criterion
The quantity
is called the Newton decrement. It has several interpretations, firstly is the directional derivate of in the direction of the Newton step, and secondly it provides us with an estimate of the difference between our current value of and the optimal value:
where this uses the second-order approximation of near and uses the fact that the is the minimiser of this approximation.
This means that we can use as our stopping criterion - once this drops below some preset threshold we can assume that our estimate of the optimal point of is "good enough"
Newton's method in full
Putting this all together we get Newton's method for minimising an unconstrained convex objective function:
given:
# Starting point
x
# Backtracking ine search parameters
0 < alpha < 0.5
0 < beta < 1.0
# Stopping tolerance
epsilon > 0
while True:
Compute the Newton step Dx
Determine t using backtracking line search
Step: x += t * Dx
Compute Newton decrement lambda^2 = -grad(f).T @ Dx
if 0.5 * lambda^2 < epsilon:
break
Newton's method has several advantages over first order methods like gradient descent:
- It is invariant to an affine coordinate transformation
- It is fast to converge (the error reduces quadratically in log-space)
- It scales well with problem size
- It is fairly insensitive to parameter choice ()
The only disadvantage being the cost and complexity of computing and inverting the Hessian, although in practice we never actually compute the inverse of Hessian, but solve the system using some matrix factorisation, e.g. Cholesky.
Equality constrained problems
Now we will look at problems of the form:
We can solve this using the method of Langrange multipliers, i.e. first we form the Lagrangian:
and then the optimality conditions are found by setting the derivatives of the Langrangian to zero:
These are also known as the KKT conditions, or the dual and primal feasibility conditions respectively. The variable is the dual variable, also known as the Lagrange multiplier.
As always, for some simpler problems these equations may be directly solvable, leading immediately to an optimal primal-dual pair . But more generally we have to proceed as before, in repeatedly solving an approximation to the full problem and iterating closer and closer to the true optimal point.
As before we will use a descent method to optimise the above problem, which means that we assume we are at some current point and first need to compute , the direction to step in.
Firstly we assume that we are starting from a feasible point , so (i.e. the current point satisfies the constraints), and as before we can work with an second-order approximation to and write
If we drop this into the KKT conditions we end up with the following system of equations:
or we can write this as
which we can solve to recover , the search direction, and , the dual variables.
Otherwise we can continue as before, i.e. use a backtracking line search to choose , the stepsize, and terminate the iterations once we reach the stopping criteria.
Inequality constrained problems
Finally we can proceed to the full problem:
We follow a similar playbook to before - rather than directly solving the full problem we solve an approximation to the full problem that is easier to handle, and then refine our solution iteratively until we reach the desired accuracy.
To proceed we replace the "hard" inequality constraints with "soft" barrier functions:
This has several desirable properties:
- It grows to as approaches zero from below
- It is continuous
- It is twice differentiable
- The "steepness" of the barrier is controllable with the barrier parameter
This indicates how we can proceed - we construct and solve
for a given value of . Then we increase , and resolve, again and again until we've reached the stopping criterion.
Phase 1 methods
The barrier method requires a strictly feasible starting point, as blows up for . In general we will not know (a priori) a strictly feasible starting point, but we can use our existing methodology to determine one. Methods such as these (to determine strictly feasible starting points) are called "Phase 1 methods".
A simple phase 1 method is to first solve the modified problem:
where we can choose , where is any point which satisfies . In simple terms:
- We pick a point which satisfies our affine equality constraints
- We add a "tolerance" variable to our constraints such that this starting point satisfies all of the relaxed constraint
- We minimise this tolerance variable
If after this proceedure we have , then we have a strictly feasible starting point (the corresponding ), because by construction we will have all . If on the other hand we have , then we know that the problem is primal infeasible.
Satisfyingly this reuses the existing implementation of the barrier method!
Implementation
The following is a ~350 line python implementation of this method, with minimal dependencies and clear and readable code. Obviously this is completely impractical for real-world use, but writing it was an interesting and satisfying learning exercise.
For more details and better explanations of everything see the brilliant book "Convex Optimization" by Stephen Boyd and Lieven Vandenberghe, available entirely for free here.
"""
ConvexiToy - A toy convex solver
A very simple pure-python implementation of an interior point convex solver.
"""
from typing import Callable
import numpy as np
from dataclasses import dataclass
from enum import Enum
from numpy.typing import NDArray
type FloatArray = NDArray[np.floating]
@dataclass
class Objective:
"""Optimization objective with function, gradient and hessian."""
n: int
func: Callable[[FloatArray], float]
grad: Callable[[FloatArray], FloatArray]
hess: Callable[[FloatArray], FloatArray]
@classmethod
def quadratic(cls, P: FloatArray, q: FloatArray | None = None) -> "Objective":
n = P.shape[0]
assert P.shape == (n, n), "P must be square"
if q is None:
q = np.zeros(n)
def func(x: FloatArray) -> float:
return 0.5 * x.T @ P @ x + q.T @ x
def grad(x: FloatArray) -> FloatArray:
return P @ x + q
def hess(_: FloatArray) -> FloatArray:
return P
return cls(n, func, grad, hess)
@dataclass
class InEqConstraint:
"""Inequality constraint f(x) <= 0."""
func: Callable[[FloatArray], float]
grad: Callable[[FloatArray], FloatArray]
hess: Callable[[FloatArray], FloatArray] | None = None
@classmethod
def affine(cls, a: FloatArray, b: float) -> "InEqConstraint":
def func(x: FloatArray) -> float:
return np.dot(a, x) - b
def grad(_: FloatArray) -> FloatArray:
return a
return cls(func, grad)
@dataclass
class AffineEqConstraint:
"""Affine equality constraint Ax = b."""
A: FloatArray
b: FloatArray
class Status(Enum):
"""Solution status."""
OPTIMAL = 1
FAILED = 2
INFEASIBLE = 3
@dataclass
class Solution:
"""Optimization solution."""
status: Status
solution: FloatArray | None = None
def backtracking_ls(
func: Callable[[FloatArray], float],
g: FloatArray,
x: FloatArray,
Dx: FloatArray,
alpha: float,
beta: float,
t0: float = 1.0,
) -> float:
"""Backtracking line search."""
t = t0
f = func(x)
agDx = alpha * np.dot(g, Dx)
while func(x + t * Dx) > f + t * agDx:
t *= beta
return t
def constrained_newton_step(H: FloatArray, g: FloatArray, A: FloatArray) -> FloatArray:
"""Compute constrained Newton step."""
n = H.shape[0]
k = A.shape[0]
if k == 0:
# No equality constraints - just solve H * dx = -g
# Add regularization if needed
coefs = H
targets = -g
else:
coefs = np.block([[H, A.T], [A, np.zeros((k, k))]])
targets = np.concatenate([-g, np.zeros(k)])
try:
sol = np.linalg.solve(coefs, targets)[:n]
except np.linalg.LinAlgError:
# Add regularization to H block only
coefs = coefs.copy()
coefs[:n, :n] += 1e-6 * np.eye(n)
sol = np.linalg.solve(coefs, targets)[:n]
return sol
def constrained_newtons_method(
func: Callable[[FloatArray], float],
grad: Callable[[FloatArray], FloatArray],
hess: Callable[[FloatArray], FloatArray],
A: FloatArray,
b: FloatArray,
x0: FloatArray,
alpha: float = 0.2,
beta: float = 0.5,
epsilon: float = 1e-6,
max_iters: int = 100,
constraint_tol: float = 1e-8,
ineqs: list[InEqConstraint] | None = None,
) -> FloatArray:
"""Equality-constrained Newton's method."""
assert np.allclose(A @ x0, b, atol=constraint_tol), "Infeasible starting point"
x = x0.copy()
for i in range(max_iters):
g = grad(x)
Dx = constrained_newton_step(hess(x), g, A)
t0 = 1.0
if ineqs is not None:
while any([c.func(x + t0 * Dx) > -constraint_tol for c in ineqs]):
t0 *= beta
t = backtracking_ls(func, g, x, Dx, alpha, beta, t0=t0)
x += t * Dx
lambda2 = -np.dot(g, Dx)
if lambda2 / 2 <= epsilon:
return x
print(f"Warning: {max_iters=} reached")
return x
@dataclass
class SolverParameters:
"""Solver parameters."""
t0: float = 1.0
mu: float = 50.0
alpha: float = 0.2
beta: float = 0.5
epsilon_inner: float = 1e-6
epsilon_outer: float = 1e-6
max_iters_inner: int = 100
max_iters_outer: int = 100
constraint_tol: float = 1e-8
def barrier_method(
obj: Objective,
x0: FloatArray,
affine_eq: AffineEqConstraint | None = None,
ineqs: list[InEqConstraint] | None = None,
solver_params: SolverParameters | None = None,
) -> Solution:
"""Interior point barrier method."""
if solver_params is None:
solver_params = SolverParameters()
x = x0.copy()
if affine_eq is None:
A = np.empty((0, obj.n))
b = np.empty(0)
else:
A = affine_eq.A
b = affine_eq.b
if not np.allclose(A @ x0, b, solver_params.constraint_tol):
return Solution(Status.FAILED)
if ineqs is None:
x = constrained_newtons_method(
obj.func,
obj.grad,
obj.hess,
A,
b,
x,
solver_params.alpha,
solver_params.beta,
solver_params.epsilon_inner,
solver_params.max_iters_inner,
solver_params.constraint_tol,
ineqs=ineqs,
)
return Solution(Status.OPTIMAL, x)
if not all(c.func(x0) <= solver_params.constraint_tol for c in ineqs):
return Solution(Status.FAILED)
m = len(ineqs)
t = float(solver_params.t0)
def func_aug(x: FloatArray) -> float:
val = t * obj.func(x)
for c in ineqs:
val -= np.log(-c.func(x))
return val
def grad_aug(x: FloatArray) -> FloatArray:
g = t * obj.grad(x)
for c in ineqs:
gx = c.grad(x)
fx = c.func(x)
g -= gx / fx
return g
def hess_aug(x: FloatArray) -> FloatArray:
H = t * obj.hess(x)
for c in ineqs:
gx = c.grad(x)
fx = c.func(x)
if c.hess is None:
H += np.outer(gx, gx) / (fx**2)
else:
H += np.outer(gx, gx) / (fx**2) - c.hess(x) / fx
return H
for _ in range(solver_params.max_iters_outer):
x = constrained_newtons_method(
func_aug,
grad_aug,
hess_aug,
A,
b,
x,
solver_params.alpha,
solver_params.beta,
solver_params.epsilon_inner,
solver_params.max_iters_inner,
ineqs=ineqs,
)
if m / t < solver_params.epsilon_outer:
return Solution(Status.OPTIMAL, x)
t *= solver_params.mu
print(f"Warning: failed to converge in {solver_params.max_iters_outer=}")
return Solution(Status.FAILED, x)
def construct_phase_1_problem(
n: int,
affine_eq: AffineEqConstraint | None,
ineqs: list[InEqConstraint],
) -> tuple[Objective, AffineEqConstraint | None, list[InEqConstraint]]:
"""Construct phase 1 feasibility problem."""
def p1_func(x_s: FloatArray) -> float:
return x_s[-1]
def p1_grad(x_s: FloatArray) -> FloatArray:
g = np.zeros_like(x_s)
g[-1] = 1.0
return g
def p1_hess(x_s: FloatArray) -> FloatArray:
n = len(x_s)
return np.zeros((n, n))
obj = Objective(n + 1, p1_func, p1_grad, p1_hess)
new_ineqs = []
for c in ineqs:
def func(x_s: FloatArray, c=c) -> float:
return c.func(x_s[:-1]) - x_s[-1]
def grad(x_s: FloatArray, c=c) -> FloatArray:
return np.concatenate([c.grad(x_s[:-1]), [-1.0]])
if c.hess is not None:
def hess(x_s: FloatArray, c=c) -> FloatArray:
n = len(x_s) - 1
return np.block(
[[c.hess(x_s[:-1]), np.zeros((n, 1))], [np.zeros(1, n), 0]]
)
else:
hess = None
new_ineqs.append(InEqConstraint(func, grad, hess))
if affine_eq is not None:
A = affine_eq.A
k = A.shape[0]
new_A = np.block([A, np.zeros((k, 1))])
new_affine_eq = AffineEqConstraint(new_A, affine_eq.b)
else:
new_affine_eq = None
return obj, new_affine_eq, new_ineqs
def phase_1(
n: int,
affine_eq: AffineEqConstraint | None = None,
ineqs: list[InEqConstraint] | None = None,
solver_params: SolverParameters | None = None,
) -> Solution:
"""Phase 1 to find feasible starting point."""
if solver_params is None:
solver_params = SolverParameters()
if affine_eq is None:
x0 = np.zeros(n)
else:
x0 = np.linalg.pinv(affine_eq.A) @ affine_eq.b
if ineqs is None:
return Solution(Status.OPTIMAL, x0)
s0 = max(c.func(x0) for c in ineqs)
if s0 < 0.0:
return Solution(Status.OPTIMAL, x0)
s0 += 1e-5
x_s0 = np.concatenate([x0, [s0]])
p1_obj, p1_aeq, p1_ineqs = construct_phase_1_problem(n, affine_eq, ineqs)
sol = barrier_method(
p1_obj,
x_s0,
p1_aeq,
p1_ineqs,
solver_params,
)
if not sol.status == Status.OPTIMAL:
return Solution(Status.INFEASIBLE)
x = sol.solution[:-1]
s = sol.solution[-1]
if s > solver_params.constraint_tol:
return Solution(Status.INFEASIBLE)
return Solution(Status.OPTIMAL, x)
def solve(
obj: Objective,
affine_eq: AffineEqConstraint | None = None,
ineqs: list[InEqConstraint] | None = None,
solver_params: SolverParameters | None = None,
) -> Solution:
"""Solve convex optimization problem."""
# Phase 1 to determine feasible starting point
p1_sol = phase_1(obj.n, affine_eq, ineqs, solver_params)
if not p1_sol.status == Status.OPTIMAL:
return Solution(Status.INFEASIBLE)
x0 = p1_sol.solution
return barrier_method(
obj,
x0,
affine_eq,
ineqs,
solver_params,
)