This post is a learning notes of numerical analysis.

Convergence

Key Question

For a sequence $(x_n)$ generated by an iterative algorithm, generally, we have two questions.

  1. Does the sequence converge to $\lim_{n\to \infty} x_n = x$?
  2. How fast?

For the first problem, we already have enough mathematical tools to deal with it. Here we just define the error $e_n$

\[e_n = x - x_n\]

For the second problem, we have two methods to describe the speed, rate and order.

Convergence Rate

For a sequence $(x_n)$, we say $x_n \to x$ at rate $O(\beta_n)$ if there exists $\beta_n \to 0$ and constant $\lambda \ge 0$ such that

\[|e_n| = |x - x_n| \le \lambda \beta_n\]

for all sufficiently large $n$.

Generally, $\beta_n$ is a standard sequence such as $n^2$, $n$, $\log(n)$, $e^n$, etc.

Order of Convergence

For a sequence $(x_n)$, if there exists positive numbers $\alpha$ and $\lambda$ such that

\[\lim_{n\to \infty} \frac{|e_{n+1}|}{|e_n|^{\alpha}} = \lambda\]

we say the sequence/algorithm have order of convergence $\alpha$ with asympotic error constant $\lambda$. If $\alpha = 1$, it is linear convergence. If $\alpha=2$, it is quadratic convergence. For two sequences $(x_n)$ and $(y_n)$, if $\alpha_1 = \alpha_2$, $\lambda_1 > \lambda_2$ also makes $(x_n)$ convergences faster than $(y_n)$.

Taylor’s Theorem

A really useful technique in numerical methods is Taylor’s Theorem. If $f(x)$ is a smooth function in a period $a < p < b$

\[f(x) = f(p) + \frac{f'(p)}{1!} (x-p) + \frac{f''(p)}{2!} (x-p)^2 + \cdots + \frac{f^{(n)}(p)}{n!}(x-p)^n + \frac{f^{(n+1)}(\xi)}{(n+1)!} (x-p)^{n+1}\]

where $\xi$ is between $p$ and $x$.

There are also some useful Taylor series.

\[e^x = 1 + x + \frac{x^2}{2!} + \cdots + \frac{x^n}{n!} + \cdots\]

Rootfinding

For $f: \mathbb{R} \to \mathbb{R}$, $p$ is a root if $f(p) = 0$. Meanwhile, we say a root $p$ of $f(x)$ has multiplicity $m$ if

\[f(x) = (x-p)^m q(x)\]

Where $\lim_{x\to p} q(x) \neq 0$. Furthermore, we can prove a root $p$ of $f$ has multiplicity $m$ if and only if

\[f(p) = f'(p) = \cdots = f^{(m-1)}(p) = 0\]

but

\[f^{(m)}(p) \neq 0\]

Fixed Point Iteration Schemes

A fixed point is a point $x$ such that $f(x) = x$. There are several methods we can use to find a fixed point.

Theorem 1. If $g$ is a continuous function with $g(a), g(b) \in [a, b] \subset \mathbb{R}$, then $g$ has a fixed point in $[a, b]$.

Theorem 2. Let $g$ be a smooth function on $\mathbb{R}$ such that $g: [a, b] \to [a, b]$ and there exists a $0 \le \theta < 1$ s.t. $|g’(x)| \le \theta$ for all $x \in [a, b]$, then

  1. there exists a unique fixed point of $g$ in $[a, b]$.
  2. $x_{n+1} = g(x_n)$ converges to the fixed point if $x_0 \in [a,b]$.

Bisection Method

To find a root of a function $f: \mathbb{R} \to \mathbb{R}$, given $[a_0, b_0]$ s. t. $f(a_0)f(b_0) \le 0$ (there exists at least a solution), to go from $[a_n, b_n]$ to $[a_{n+1}, b_{n+1}]$,

  1. Let $x_n = \frac{1}{2}(a_n + b_n)$.
  2. If $f(x_n) f(a_n) \le 0$, $a_{n+1} = a_n$, $b_{n+1} = x_n$.
  3. Else $a_{n+1} = x_n$, $b_{n+1} = b_n$.

Error estimate

\[|x_n - p| \le \frac{b_0 - a_0}{2^{n+1}}\]

Order of convergence: $\alpha = 1$. It converges linearly.

Newton’s Method

Newton’s method tries to find a linear approximation to $x_n$, i.e.

\[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\]
  • Pros: The order of convergence is $\alpha = 2$. It is very fast (faster than naive bisection method).
  • Cons: It does not converge from any initial guess $x_0$.

Another con we can overcome in the newton’s method is that the order of convergence will become $\alpha = 1$ when $p$ is a root of multiplicity $m \ge 2$. There are two methods to overcome it.

  1. Know $m$ in advance and let \(x_{n+1} = x_n - m\cdot \frac{f(x_n)}{f'(x_n)}\)
  2. Apply Newton’s methos to \(\widetilde{f}(x) = \frac{f(x)}{f'(x)}\)

Secant Method

Secanf method is similar to Newton’s method except we use a different linear approximation. It is like

\[f'(x_n) \;``=''\; \frac{f(x_n) - f(x_{n-1})}{x_n - x_{n-1}}\]

And

\[x_{n+1} = x_n - \frac{f(x_n) (x_n - x_{n-1})}{f(x_n) - f(x_{n-1})}\]

We can prove $\alpha = \frac{\sqrt{5} + 1}{2}$ and the convergence is superlinear.

Newton’s Method for Systems

For systems of nonlinear equations, let

\[F(x) = \left ( \begin{matrix} f_1(x_1, x_2, \cdots) \\ \vdots \\ f_n(x_1, x_2, \cdots)\end{matrix}\right)\]

The intuitive $F’(x)$ is the Jacobian $J(x)$

\[J(x) = \left(\begin{matrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \vdots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_n}\end{matrix}\right)\]

And

\[x_{n+1} = x_n - (J(x))^{-1} \cdot F(x)\]

Generally, we solve $Ja = F$ to find $J^{-1}F$. In Matlab,

% newton's method for nonlinear systems
for i = 1:5
    x = x -  df(x) \ f(x);
    fprintf("(%.10f, %.10f)\n", x(1), x(2));
end

System of Equations

Linear Systems

A linear system is

\[Ax = b\]

Where $A$ is a $n\times n$ matrix and $b$ is a $n\times 1$ vector. In order to have a unique solution $x$, $A$ must be inveritable. Note that if $A$ is a upper-triangle matrix, we can use back substitution to solve it. And if $A$ is a lower-triangle matrix, we can use forward substitution to solve the system. It is a intuitive procedure compared with a general problem of linear systems.

Matlab Code of back substitution:

function x = back_sub(A, b)
    [m, n] = size(U);
    [Q, R] = mgs(A); % modified gram-schimidt
    b = Q' * b;
    x = zeros(1,n);
    for i = n:-1:1
        sum = b(i);
        for j = n:-1:(i+1)
            sum = sum - R(i, j) .* x(j);
        end
        x(i) = sum ./ R(i, i);
    end
end

QR Factorization

For a invertible $n\times m$ matrix $A$, we have

\[A = QR​\]

where $Q$ is orthonormal ($m\times n$) and $R$ is a upper-triangle $n\times n$ matrix. It can be factored efficiently by Gram-Schimidt Procedure.

% Gram-Schimidt Procedure
function [Q, R] = gs(A)
    [m, n] = size(A);
    Q = zeros(m, n);
    R = zeros(n, n);
    for j = 1:n
        v = A(:, j);
        for i = 1:j-1
            R(i, j) = Q(:, i)' * A(:, j);
            v = v - R(i, j) * Q(:, i);
        end
        R(j, j) = norm(v);
        Q(:, j) = v / R(j, j);
    end
end
% Modified Gram-Schimidt Procedure
function [Q, R] = mgs(A)
    [m, n] = size(A);
    Q = zeros(m, n);
    R = zeros(n, n);
    V = A;

    for i = 1:n
        R(i, i) = norm(V(:, i));
        Q(:, i) = V(:, i) / R(i, i);
        for j = (i+1):n
            R(i, j) = Q(:, i)' * V(:, j);
            V(:, j) = V(:, j) - R(i, j) * Q(:, i);
        end
    end
end

High-level Steps to solve the linear system:

  1. For a linear system $Ax = b$, factor $A = QR$. Now we have $QRx =b$
  2. For a orthonormal matrix $Q$, $Q^T Q = \text{id}$. We just need to solve $Rx = Q^T b$ via back substitution.

QR factorization can also be used to solve least square problem. The problem is, for $Ax = b$, try to find

\[x_{*} = \arg \min_{x \in \mathbb{R}^n} ||Ax - b|| _2\]

Similarly, $Rx = Q^T b$ solves the problem.

LU Factorization

For a invertible matrix $A$, we have

\[A = LU\]

where $L$ is a lower-triangle matrix and $U$ is a upper-triangle matrix. It can be factored efficiently by Gaussian Elimination.

function [L, U] = lu_factor(A)
    [~, n] = size(A);
    L = eye(n);
    for i = 1:n-1
        for j = i+1:n
            L(j, i) = A(j, i) / A(i, i);
            A(j, :) = A(j, :) - L(j, i) * A(i, :);
        end
    end
    U = A;
end

An important application is LU factorization is to solve many systems with the same $A$ and different $b_1, b_2, \cdots, b_N$.

Eigenvalue-Eigenvector Problem

For a $n\times n$ matrix $A$, we are looking for pairs $(\lambda, v)$ where $\lambda \in \mathbb{C}, v \in \mathbb{C}^n, v \neq 0$ s.t.

\[Av = \lambda v\]

In linear algebra, $\lambda$ is the solution of

\[\mathrm{det}(A - \lambda I) = 0\]

Power Iteration

Power iteration can be used to find the eigenvalue-eigenvector pair with largest in magnitude eigenvalue of $A$. For a diagonalitable $n \times n$ matrix $A$, assume its eigenvalues

\[|\lambda_1| > |\lambda_2| \ge |\lambda_3| \ge \cdots \ge |\lambda_n|\]

For any essentially initial guess $x_0$,

\[x_{n+1} = Ax_n\]

will converge to $cv_1$ as $n \to \infty$. In summary, we randomly choose $x_0$, then

  1. Iteration: $\hat{x}_{n+1} = Ax_n$
  2. Renormalize: $x_{n+1} = \hat{x}_{n+1} / || \hat{x}_{n+1}|| $
  3. Approx eval: $\mu_{n+1} = x_{n+1}^T A x_{n+1}$
function e = eigen(A, N)
    % estimate the largest eigenvalue of a matrix A.
    [n, ~] = size(A);
    v = rand(n, 1);
    for i=1:N
        v = A * v;
        v = v / norm(v, 2);
        e = v' * A * v;
    end
end

The order is linear.

Shifted Inverse Power Iteration

Trick 1: If $\lambda_1, \cdots, \lambda_n$ are eigenvalues of $A$, the eigenvalues of $A^{-1}$ is $1/\lambda_{1}, \cdots, 1/\lambda_{n}$.

Trick 2: If $\lambda_1, \cdots, \lambda_n$ are eigenvalues of $A$, the eigenvalues of $A - cI$ is $\lambda_{1}-c, \cdots, \lambda_{n}-c$.

Then for a good enough initial guess $\mu_0$, we can apply power iteration to $(A - \mu_0 I)^{-1}$ and get any eigenvalue. However, the order is still linear.

Rayleigh Quotient Iteration

We can use Rayleigh Quotient Iteration to speed up convergence. The order is quadratic. However, it requires better initial guess. Hence, in practice, we tends to use power iteration first, and then switch to Rayleigh Quotient Iteration.

function [mu, v] = rq_it(A, N, mu, x)
    % implement the Rayleigh Quotient iteration.
    % A - matrix
    % N - iteration num
    % x - initial guess
    [n, ~] = size(A);
    v = x;
    for i = 1:N
        v = (A - mu * eye(n)) \ v;
        v = v / norm(v);
        mu = (A * v)' * v;
    end
end

Simulatenous Power Iteration

Simulatenous power iteration is used to find all the eigenvalues of a matrix.

function [Q, mu] = spi(A, N)
	[n, ~] = size(A);
	Q = rand(n, n);
	for i = 1:N
		[Q, ~] = mgs(A * Q);
		mu = Q' * A * Q;
	end
end

QR Algorithm

QR algorithm is cubic convergence to any matrix concurrently.

function Q = qr_algorithm(A, N)
	[Q, R] = mgs(A);
	for i = 1:N
		[Q, R] = mgs(R * Q);
		mu = Q' * A * Q;
	end
end

Polynomial Interpolation

Polynomial interpolation for $n+1$ points $(x_0, y_0), \cdots, (x_n, y_n)$is equivalent to solve a linear system

\[\left (\begin{matrix}1 & x_0 & x_0^2 & \cdots & x_0^{k-1}\\1 & x_1 & x_1^2 & \cdots & x_0^{k-1}\\\vdots & \vdots & \vdots & \ddots & \vdots\\1 & x_{k-1} & x_{k-1}^2 & \cdots & x_{k-1}^{k-1}\\\end{matrix} \right )\left (\begin{matrix}a_0\\a_1\\\vdots\\a_{k-1}\\\end{matrix} \right )=\left (\begin{matrix}y_0\\y_1\\\vdots\\y_{k-1}\\\end{matrix} \right )\]

The coefficient matrix is a Vandermonde matrix, which is nearly singular.

Lagrange Form of the Interpolating Polynomial

Actually, a better approach is Lagrange’s form of interpolating polynomial.

\[L_i = \prod_{j=0,j\neq i}^{n} \frac{(x-x_j)}{(x_i-x_j)}\] \[p(x) = \sum_{i=0}^n y_i L_i(x)\]

Error Estimation

If $x_0, \cdots, x_n$ are interpolation points for a smooth function $f$ in $[a, b]$, for any $x \in [a,b]$, there exists $\xi \in [a,b]$,

\[f(x) = p(x) + \frac{f^{(n+1)}(\xi)}{(n+1)!}(x - x_0)(x-x_1)\cdots(x-x_n)\]

Chebychev

\[T_n = \cos(n \cos^{-1}(x))\] \[T_n = 2x T_n(x) - T_{n-1}(x)\] \[T_0(x) = 1\] \[T_1(x) = x\]

Legendre Polynomial

\[\hat{p_j}(x) = \frac{\sqrt{2}}{\sqrt{2j+1}} p_j(x)\] \[\hat{p_0}(x) = 1\] \[\hat{p_1}(x) = x\] \[\hat{p_j}(x) = \frac{2j-1}{j} x \hat{p_{j-1}}(x) - \frac{j-1}{j} \hat{p_{j-2}}(x)\]

Numerical Quadrature

\[I_n(f) = \sum_{i=0}^n f(x_i) \int_a^b L_i(x)dx\]

Reference

  • A Friendly Introduction to Numerical Analysis.
  • MATH 371 Introduction to Numerical Methods, University of Michigan.