Learning by building: A convex solver

09 Nov, 2025

Theory

We are interested in solving the following convex optimisation problem:

\begin{matrix} {min}_{x} & f_{0} (x) \\ s . t . & f_{i} (x) \leq 0, i = 1, . ., m \\ A x = b \end{matrix}

where $x$ is our (primal) optimisation variable, $f_{0}$ is our objective function, $f_{i} (x) \leq 0$ are our $m$ inequality constraints, and $A x = b$ is an affine equality constraint.

For this to be a convex optimisation problem the all the functions $f_{i} (x)$ must be convex, which means that the function domain is a convex set, and for all $x, y \in d o m f$ the following holds:

f (θ x + (1 - θ) y) \geq θ f (x) + (1 - θ) f (y)

We further require that all $f_{i} (x)$ are twice continuously differentiable.

Convex optimisation problems are well studied because

They are significantly easier to solve than non-convex problems
Many problems can be recast or approximated as convex problems

A consequence of the convexity of the objective function is that if we can find any local minima of $f_{0} (x)$ , then it must also be a global minima, and if $f_{0} (x)$ is strongly convex, then there exists only one unique global minima.

We will build up to how to solve a fully constrained problem, starting from a significantly easier task, solving an unconstrained convex optimisation problem.

An unconstrained problem

Let's first tackle the simpler problem of

{min}_{x} f_{0} (x)

For simple forms of $f_{0}$ this may be analytically solvable, but for now we'll assume it isn't. We can use the fact that the function is convex and simply try to find a local minima, and if we do we will know that this is also a global minima, and we have succeeded.

To find a local minima we can use a descent method:

given x # Starting point
while True:
    choose dx # Descent direction
    choose t # Step size, chosen to minimise f(x + t*dx)
    update x = x + t * dt
    if (stopping criterion):
        break

This is pretty straightforward to understand: Starting from an initial guess $x$ , we first pick a direction in which to search ( $Δ x$ ), and then choose the amount we move in that direction ( $t$ ), then we update our current point $x$ , and start again, until we reach some pre-determined stopping threshold.

The Newton step

We want to pick as good as possible a $Δ x$ so that our method converges as quickly as possible. Using the fact that $f$ is twice-differentiable we can write the second order approximation of $f$ near our current point $x$ as

f (x + v) = f (x) + \nabla f (x)^{T} v + \frac{1}{2} v^{T} \nabla^{2} f (x) v + 𝒪 (v^{3})

which for small $v$ should be a pretty good approximation. The choice of $v$ which minimises this approximation is called the Newton step:

Δ x_{n t} = - \nabla^{2} f (x)^{- 1} \nabla f (x)

As an alternative we could use the first order approximation for $f (x + v) = f (x) + \nabla^{T} f (x) v$ , the minimiser for which is $Δ x = - \nabla f (x)$ . This choice of step direction is just the negative gradient, and this method is called gradient descent. Newton's method generally converges in significantly fewer steps, but requires computing and inverting the Hessian $\nabla^{2} f (x)$ .

Line search

The process of choosing $t$ , the amount by which move along the search direction to update $x$ is called a line search, as we are searching along the line $x + t Δ x$ . To be explicit, we find $t$ from:

{min}_{t} f (x + t Δ x), t > 0

Again for some simple forms of $f$ it may be possible to find the $t$ that exactly minimises this, however for our purposes an approximate method of finding a reasonable good choice of $t$ is enough. One such method is backtracking line search:

given x, Dx, alpha, beta # 0 < alpha < 0.5, 0 < beta < 1
t = 1
while f(x + t*Dx) > f(x) + alpha * t * grad(f).T @ Dx:
    t = beta * t

This is called backtracking line search because we start with $t = 1$ and then "backtrack" until we find a suitable value of $t$ .

Stopping criterion

The quantity

λ = (\nabla f (x)^{T} \nabla^{2} f (x)^{- 1} \nabla f (x))^{1 / 2}

is called the Newton decrement. It has several interpretations, firstly $λ^{2} = \nabla f (x)^{T} Δ x_{n t}$ is the directional derivate of $f$ in the direction of the Newton step, and secondly it provides us with an estimate of the difference between our current value of $f$ and the optimal value:

\begin{matrix} f (x) - & {min}_{y} (f (x + y)) \\ ≃ f (x) - {min}_{y} (f (x) + \nabla f (x)^{T} (y - x) + \frac{1}{2} (y - x)^{T} \nabla^{2} f (x) (y - x)) \\ = f (x) - (f (x) - \frac{1}{2} \nabla f (x)^{T} \nabla^{2} f (x) \nabla f (x)) \\ = \frac{1}{2} λ^{2} \end{matrix}

where this uses the second-order approximation of $f$ near $x$ and uses the fact that the $y = Δ x_{n t}$ is the minimiser of this approximation.

This means that we can use $\frac{1}{2} λ^{2}$ as our stopping criterion - once this drops below some preset threshold we can assume that our estimate of the optimal point of $f$ is "good enough"

Newton's method in full

Putting this all together we get Newton's method for minimising an unconstrained convex objective function:

given:
  # Starting point
  x
  # Backtracking ine search parameters
  0 < alpha < 0.5
  0 < beta < 1.0
  # Stopping tolerance
  epsilon > 0

while True:
  Compute the Newton step Dx
  Determine t using backtracking line search
  Step: x += t * Dx
  Compute Newton decrement lambda^2 = -grad(f).T @ Dx
  if 0.5 * lambda^2 < epsilon:
     break

Newton's method has several advantages over first order methods like gradient descent:

It is invariant to an affine coordinate transformation
It is fast to converge (the error reduces quadratically in log-space)
It scales well with problem size
It is fairly insensitive to parameter choice ( $α, β$ )

The only disadvantage being the cost and complexity of computing and inverting the Hessian, although in practice we never actually compute the inverse of Hessian, but solve the $H Δ x = - g$ system using some matrix factorisation, e.g. Cholesky.

Equality constrained problems

Now we will look at problems of the form:

\begin{matrix} {min}_{x} & f (x) \\ s . t . & A x = b \end{matrix}

We can solve this using the method of Langrange multipliers, i.e. first we form the Lagrangian:

ℒ = f (x) + ν^{T} (A x - b)

and then the optimality conditions are found by setting the derivatives of the Langrangian to zero:

\begin{matrix} \frac{\partial ℒ}{\partial x} = \nabla f (x) + A^{T} ν = 0 \\ \frac{\partial ℒ}{\partial ν} = A x - b = 0 \end{matrix}

These are also known as the KKT conditions, or the dual and primal feasibility conditions respectively. The variable $ν$ is the dual variable, also known as the Lagrange multiplier.

As always, for some simpler problems these equations may be directly solvable, leading immediately to an optimal primal-dual pair $(x^{⋆}, ν^{⋆})$ . But more generally we have to proceed as before, in repeatedly solving an approximation to the full problem and iterating closer and closer to the true optimal point.

As before we will use a descent method to optimise the above problem, which means that we assume we are at some current point $x$ and first need to compute $Δ x$ , the direction to step in.

Firstly we assume that we are starting from a feasible point $x$ , so $A x = b$ (i.e. the current point satisfies the constraints), and as before we can work with an second-order approximation to $f$ and write

\nabla f (x + Δ x) = \nabla f (x) + \nabla^{2} f (x) Δ x

If we drop this into the KKT conditions we end up with the following system of equations:

\begin{matrix} \nabla f (x) + \nabla^{2} f (x) Δ x + A^{T} ν = 0 \\ A (x + Δ x) - b = 0 \end{matrix}

or we can write this as

[\begin{matrix} \nabla^{2} f (x) & A^{T} \\ A & 0 \end{matrix}] [\begin{matrix} Δ x \\ ν \end{matrix}] = [\begin{matrix} - \nabla f (x) \\ 0 \end{matrix}]

which we can solve to recover $Δ x$ , the search direction, and $ν$ , the dual variables.

Otherwise we can continue as before, i.e. use a backtracking line search to choose $t$ , the stepsize, and terminate the iterations once we reach the stopping criteria.

Inequality constrained problems

Finally we can proceed to the full problem:

\begin{matrix} {min}_{x} & f_{0} (x) \\ s . t . & f_{i} (x) \leq 0, i = 1, . ., m \\ A x = b \end{matrix}

We follow a similar playbook to before - rather than directly solving the full problem we solve an approximation to the full problem that is easier to handle, and then refine our solution iteratively until we reach the desired accuracy.

To proceed we replace the "hard" inequality constraints $f_{i} (x) \leq 0$ with "soft" barrier functions:

ψ_{i} (x) = \frac{1}{t} \log (- f_{i} (x))

This has several desirable properties:

It grows to $\infty$ as $f_{i} (x)$ approaches zero from below
It is continuous
It is twice differentiable
The "steepness" of the barrier is controllable with the barrier parameter $t$

This indicates how we can proceed - we construct $ψ (x) = \sum_{i} ψ_{i} (x)$ and solve

\begin{matrix} {min}_{x} & f_{0} (x) + ψ (x) \\ s . t . & A x = b \end{matrix}

for a given value of $t$ . Then we increase $t$ , and resolve, again and again until we've reached the stopping criterion.

Phase 1 methods

The barrier method requires a strictly feasible starting point, as $ψ_{i} (x) = \frac{1}{t} \log (- f_{i} (x))$ blows up for $f_{i} (x) \geq 0$ . In general we will not know (a priori) a strictly feasible starting point, but we can use our existing methodology to determine one. Methods such as these (to determine strictly feasible starting points) are called "Phase 1 methods".

A simple phase 1 method is to first solve the modified problem:

\begin{matrix} {min}_{x, s} & s \\ s . t . & f_{i} (x) \leq s, i = 1, . ., m \\ A x = b \end{matrix}

where we can choose $s_{0} = {max}_{i} (f_{i} (x_{0})) + ϵ$ , where $x_{0}$ is any point which satisfies $A x = b$ . In simple terms:

We pick a point $x_{0}$ which satisfies our affine equality constraints
We add a "tolerance" variable to our constraints such that this starting point satisfies all of the relaxed constraint
We minimise this tolerance variable $s$

If after this proceedure we have $s^{⋆} < 0$ , then we have a strictly feasible starting point (the corresponding $x^{⋆}$ ), because by construction we will have all $f_{i} (x^{⋆}) < s^{⋆} < 0$ . If on the other hand we have $s^{⋆} > 0$ , then we know that the problem is primal infeasible.

Satisfyingly this reuses the existing implementation of the barrier method!

Implementation

The following is a ~350 line python implementation of this method, with minimal dependencies and clear and readable code. Obviously this is completely impractical for real-world use, but writing it was an interesting and satisfying learning exercise.

For more details and better explanations of everything see the brilliant book "Convex Optimization" by Stephen Boyd and Lieven Vandenberghe, available entirely for free here.

"""
ConvexiToy - A toy convex solver

A very simple pure-python implementation of an interior point convex solver.
"""

from typing import Callable
import numpy as np
from dataclasses import dataclass
from enum import Enum
from numpy.typing import NDArray

type FloatArray = NDArray[np.floating]


@dataclass
class Objective:
    """Optimization objective with function, gradient and hessian."""

    n: int
    func: Callable[[FloatArray], float]
    grad: Callable[[FloatArray], FloatArray]
    hess: Callable[[FloatArray], FloatArray]

    @classmethod
    def quadratic(cls, P: FloatArray, q: FloatArray | None = None) -> "Objective":
        n = P.shape[0]
        assert P.shape == (n, n), "P must be square"
        if q is None:
            q = np.zeros(n)

        def func(x: FloatArray) -> float:
            return 0.5 * x.T @ P @ x + q.T @ x

        def grad(x: FloatArray) -> FloatArray:
            return P @ x + q

        def hess(_: FloatArray) -> FloatArray:
            return P

        return cls(n, func, grad, hess)


@dataclass
class InEqConstraint:
    """Inequality constraint f(x) <= 0."""

    func: Callable[[FloatArray], float]
    grad: Callable[[FloatArray], FloatArray]
    hess: Callable[[FloatArray], FloatArray] | None = None

    @classmethod
    def affine(cls, a: FloatArray, b: float) -> "InEqConstraint":
        def func(x: FloatArray) -> float:
            return np.dot(a, x) - b

        def grad(_: FloatArray) -> FloatArray:
            return a

        return cls(func, grad)


@dataclass
class AffineEqConstraint:
    """Affine equality constraint Ax = b."""

    A: FloatArray
    b: FloatArray


class Status(Enum):
    """Solution status."""

    OPTIMAL = 1
    FAILED = 2
    INFEASIBLE = 3


@dataclass
class Solution:
    """Optimization solution."""

    status: Status
    solution: FloatArray | None = None


def backtracking_ls(
    func: Callable[[FloatArray], float],
    g: FloatArray,
    x: FloatArray,
    Dx: FloatArray,
    alpha: float,
    beta: float,
    t0: float = 1.0,
) -> float:
    """Backtracking line search."""
    t = t0
    f = func(x)
    agDx = alpha * np.dot(g, Dx)
    while func(x + t * Dx) > f + t * agDx:
        t *= beta
    return t


def constrained_newton_step(H: FloatArray, g: FloatArray, A: FloatArray) -> FloatArray:
    """Compute constrained Newton step."""
    n = H.shape[0]
    k = A.shape[0]
    if k == 0:
        # No equality constraints - just solve H * dx = -g
        # Add regularization if needed
        coefs = H
        targets = -g
    else:
        coefs = np.block([[H, A.T], [A, np.zeros((k, k))]])
        targets = np.concatenate([-g, np.zeros(k)])
    try:
        sol = np.linalg.solve(coefs, targets)[:n]
    except np.linalg.LinAlgError:
        # Add regularization to H block only
        coefs = coefs.copy()
        coefs[:n, :n] += 1e-6 * np.eye(n)
        sol = np.linalg.solve(coefs, targets)[:n]
    return sol


def constrained_newtons_method(
    func: Callable[[FloatArray], float],
    grad: Callable[[FloatArray], FloatArray],
    hess: Callable[[FloatArray], FloatArray],
    A: FloatArray,
    b: FloatArray,
    x0: FloatArray,
    alpha: float = 0.2,
    beta: float = 0.5,
    epsilon: float = 1e-6,
    max_iters: int = 100,
    constraint_tol: float = 1e-8,
    ineqs: list[InEqConstraint] | None = None,
) -> FloatArray:
    """Equality-constrained Newton's method."""
    assert np.allclose(A @ x0, b, atol=constraint_tol), "Infeasible starting point"
    x = x0.copy()
    for i in range(max_iters):
        g = grad(x)
        Dx = constrained_newton_step(hess(x), g, A)
        t0 = 1.0
        if ineqs is not None:
            while any([c.func(x + t0 * Dx) > -constraint_tol for c in ineqs]):
                t0 *= beta
        t = backtracking_ls(func, g, x, Dx, alpha, beta, t0=t0)
        x += t * Dx
        lambda2 = -np.dot(g, Dx)
        if lambda2 / 2 <= epsilon:
            return x
    print(f"Warning: {max_iters=} reached")
    return x


@dataclass
class SolverParameters:
    """Solver parameters."""

    t0: float = 1.0
    mu: float = 50.0
    alpha: float = 0.2
    beta: float = 0.5
    epsilon_inner: float = 1e-6
    epsilon_outer: float = 1e-6
    max_iters_inner: int = 100
    max_iters_outer: int = 100
    constraint_tol: float = 1e-8


def barrier_method(
    obj: Objective,
    x0: FloatArray,
    affine_eq: AffineEqConstraint | None = None,
    ineqs: list[InEqConstraint] | None = None,
    solver_params: SolverParameters | None = None,
) -> Solution:
    """Interior point barrier method."""
    if solver_params is None:
        solver_params = SolverParameters()

    x = x0.copy()

    if affine_eq is None:
        A = np.empty((0, obj.n))
        b = np.empty(0)
    else:
        A = affine_eq.A
        b = affine_eq.b
        if not np.allclose(A @ x0, b, solver_params.constraint_tol):
            return Solution(Status.FAILED)

    if ineqs is None:
        x = constrained_newtons_method(
            obj.func,
            obj.grad,
            obj.hess,
            A,
            b,
            x,
            solver_params.alpha,
            solver_params.beta,
            solver_params.epsilon_inner,
            solver_params.max_iters_inner,
            solver_params.constraint_tol,
            ineqs=ineqs,
        )
        return Solution(Status.OPTIMAL, x)

    if not all(c.func(x0) <= solver_params.constraint_tol for c in ineqs):
        return Solution(Status.FAILED)
    m = len(ineqs)

    t = float(solver_params.t0)

    def func_aug(x: FloatArray) -> float:
        val = t * obj.func(x)
        for c in ineqs:
            val -= np.log(-c.func(x))
        return val

    def grad_aug(x: FloatArray) -> FloatArray:
        g = t * obj.grad(x)
        for c in ineqs:
            gx = c.grad(x)
            fx = c.func(x)
            g -= gx / fx
        return g

    def hess_aug(x: FloatArray) -> FloatArray:
        H = t * obj.hess(x)
        for c in ineqs:
            gx = c.grad(x)
            fx = c.func(x)
            if c.hess is None:
                H += np.outer(gx, gx) / (fx**2)
            else:
                H += np.outer(gx, gx) / (fx**2) - c.hess(x) / fx
        return H

    for _ in range(solver_params.max_iters_outer):
        x = constrained_newtons_method(
            func_aug,
            grad_aug,
            hess_aug,
            A,
            b,
            x,
            solver_params.alpha,
            solver_params.beta,
            solver_params.epsilon_inner,
            solver_params.max_iters_inner,
            ineqs=ineqs,
        )
        if m / t < solver_params.epsilon_outer:
            return Solution(Status.OPTIMAL, x)
        t *= solver_params.mu

    print(f"Warning: failed to converge in {solver_params.max_iters_outer=}")
    return Solution(Status.FAILED, x)


def construct_phase_1_problem(
    n: int,
    affine_eq: AffineEqConstraint | None,
    ineqs: list[InEqConstraint],
) -> tuple[Objective, AffineEqConstraint | None, list[InEqConstraint]]:
    """Construct phase 1 feasibility problem."""

    def p1_func(x_s: FloatArray) -> float:
        return x_s[-1]

    def p1_grad(x_s: FloatArray) -> FloatArray:
        g = np.zeros_like(x_s)
        g[-1] = 1.0
        return g

    def p1_hess(x_s: FloatArray) -> FloatArray:
        n = len(x_s)
        return np.zeros((n, n))

    obj = Objective(n + 1, p1_func, p1_grad, p1_hess)

    new_ineqs = []
    for c in ineqs:

        def func(x_s: FloatArray, c=c) -> float:
            return c.func(x_s[:-1]) - x_s[-1]

        def grad(x_s: FloatArray, c=c) -> FloatArray:
            return np.concatenate([c.grad(x_s[:-1]), [-1.0]])

        if c.hess is not None:

            def hess(x_s: FloatArray, c=c) -> FloatArray:
                n = len(x_s) - 1
                return np.block(
                    [[c.hess(x_s[:-1]), np.zeros((n, 1))], [np.zeros(1, n), 0]]
                )
        else:
            hess = None

        new_ineqs.append(InEqConstraint(func, grad, hess))

    if affine_eq is not None:
        A = affine_eq.A
        k = A.shape[0]
        new_A = np.block([A, np.zeros((k, 1))])
        new_affine_eq = AffineEqConstraint(new_A, affine_eq.b)
    else:
        new_affine_eq = None

    return obj, new_affine_eq, new_ineqs


def phase_1(
    n: int,
    affine_eq: AffineEqConstraint | None = None,
    ineqs: list[InEqConstraint] | None = None,
    solver_params: SolverParameters | None = None,
) -> Solution:
    """Phase 1 to find feasible starting point."""
    if solver_params is None:
        solver_params = SolverParameters()

    if affine_eq is None:
        x0 = np.zeros(n)
    else:
        x0 = np.linalg.pinv(affine_eq.A) @ affine_eq.b

    if ineqs is None:
        return Solution(Status.OPTIMAL, x0)

    s0 = max(c.func(x0) for c in ineqs)
    if s0 < 0.0:
        return Solution(Status.OPTIMAL, x0)

    s0 += 1e-5
    x_s0 = np.concatenate([x0, [s0]])

    p1_obj, p1_aeq, p1_ineqs = construct_phase_1_problem(n, affine_eq, ineqs)

    sol = barrier_method(
        p1_obj,
        x_s0,
        p1_aeq,
        p1_ineqs,
        solver_params,
    )
    if not sol.status == Status.OPTIMAL:
        return Solution(Status.INFEASIBLE)

    x = sol.solution[:-1]
    s = sol.solution[-1]

    if s > solver_params.constraint_tol:
        return Solution(Status.INFEASIBLE)

    return Solution(Status.OPTIMAL, x)


def solve(
    obj: Objective,
    affine_eq: AffineEqConstraint | None = None,
    ineqs: list[InEqConstraint] | None = None,
    solver_params: SolverParameters | None = None,
) -> Solution:
    """Solve convex optimization problem."""
    # Phase 1 to determine feasible starting point
    p1_sol = phase_1(obj.n, affine_eq, ineqs, solver_params)
    if not p1_sol.status == Status.OPTIMAL:
        return Solution(Status.INFEASIBLE)

    x0 = p1_sol.solution

    return barrier_method(
        obj,
        x0,
        affine_eq,
        ineqs,
        solver_params,
    )