Notes of Numerical Analysis
This post is a learning notes of numerical analysis.
Convergence
Key Question
For a sequence $(x_n)$ generated by an iterative algorithm, generally, we have two questions.
- Does the sequence converge to $\lim_{n\to \infty} x_n = x$?
- How fast?
For the first problem, we already have enough mathematical tools to deal with it. Here we just define the error $e_n$
\[e_n = x - x_n\]For the second problem, we have two methods to describe the speed, rate and order.
Convergence Rate
For a sequence $(x_n)$, we say $x_n \to x$ at rate $O(\beta_n)$ if there exists $\beta_n \to 0$ and constant $\lambda \ge 0$ such that
\[|e_n| = |x - x_n| \le \lambda \beta_n\]for all sufficiently large $n$.
Generally, $\beta_n$ is a standard sequence such as $n^2$, $n$, $\log(n)$, $e^n$, etc.
Order of Convergence
For a sequence $(x_n)$, if there exists positive numbers $\alpha$ and $\lambda$ such that
\[\lim_{n\to \infty} \frac{|e_{n+1}|}{|e_n|^{\alpha}} = \lambda\]we say the sequence/algorithm have order of convergence $\alpha$ with asympotic error constant $\lambda$. If $\alpha = 1$, it is linear convergence. If $\alpha=2$, it is quadratic convergence. For two sequences $(x_n)$ and $(y_n)$, if $\alpha_1 = \alpha_2$, $\lambda_1 > \lambda_2$ also makes $(x_n)$ convergences faster than $(y_n)$.
Taylor’s Theorem
A really useful technique in numerical methods is Taylor’s Theorem. If $f(x)$ is a smooth function in a period $a < p < b$
\[f(x) = f(p) + \frac{f'(p)}{1!} (x-p) + \frac{f''(p)}{2!} (x-p)^2 + \cdots + \frac{f^{(n)}(p)}{n!}(x-p)^n + \frac{f^{(n+1)}(\xi)}{(n+1)!} (x-p)^{n+1}\]where $\xi$ is between $p$ and $x$.
There are also some useful Taylor series.
\[e^x = 1 + x + \frac{x^2}{2!} + \cdots + \frac{x^n}{n!} + \cdots\]Rootfinding
For $f: \mathbb{R} \to \mathbb{R}$, $p$ is a root if $f(p) = 0$. Meanwhile, we say a root $p$ of $f(x)$ has multiplicity $m$ if
\[f(x) = (x-p)^m q(x)\]Where $\lim_{x\to p} q(x) \neq 0$. Furthermore, we can prove a root $p$ of $f$ has multiplicity $m$ if and only if
\[f(p) = f'(p) = \cdots = f^{(m-1)}(p) = 0\]but
\[f^{(m)}(p) \neq 0\]Fixed Point Iteration Schemes
A fixed point is a point $x$ such that $f(x) = x$. There are several methods we can use to find a fixed point.
Theorem 1. If $g$ is a continuous function with $g(a), g(b) \in [a, b] \subset \mathbb{R}$, then $g$ has a fixed point in $[a, b]$.
Theorem 2. Let $g$ be a smooth function on $\mathbb{R}$ such that $g: [a, b] \to [a, b]$ and there exists a $0 \le \theta < 1$ s.t. $|g’(x)| \le \theta$ for all $x \in [a, b]$, then
- there exists a unique fixed point of $g$ in $[a, b]$.
- $x_{n+1} = g(x_n)$ converges to the fixed point if $x_0 \in [a,b]$.
Bisection Method
To find a root of a function $f: \mathbb{R} \to \mathbb{R}$, given $[a_0, b_0]$ s. t. $f(a_0)f(b_0) \le 0$ (there exists at least a solution), to go from $[a_n, b_n]$ to $[a_{n+1}, b_{n+1}]$,
- Let $x_n = \frac{1}{2}(a_n + b_n)$.
- If $f(x_n) f(a_n) \le 0$, $a_{n+1} = a_n$, $b_{n+1} = x_n$.
- Else $a_{n+1} = x_n$, $b_{n+1} = b_n$.
Error estimate
\[|x_n - p| \le \frac{b_0 - a_0}{2^{n+1}}\]Order of convergence: $\alpha = 1$. It converges linearly.
Newton’s Method
Newton’s method tries to find a linear approximation to $x_n$, i.e.
\[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\]- Pros: The order of convergence is $\alpha = 2$. It is very fast (faster than naive bisection method).
- Cons: It does not converge from any initial guess $x_0$.
Another con we can overcome in the newton’s method is that the order of convergence will become $\alpha = 1$ when $p$ is a root of multiplicity $m \ge 2$. There are two methods to overcome it.
- Know $m$ in advance and let \(x_{n+1} = x_n - m\cdot \frac{f(x_n)}{f'(x_n)}\)
- Apply Newton’s methos to \(\widetilde{f}(x) = \frac{f(x)}{f'(x)}\)
Secant Method
Secanf method is similar to Newton’s method except we use a different linear approximation. It is like
\[f'(x_n) \;``=''\; \frac{f(x_n) - f(x_{n-1})}{x_n - x_{n-1}}\]And
\[x_{n+1} = x_n - \frac{f(x_n) (x_n - x_{n-1})}{f(x_n) - f(x_{n-1})}\]We can prove $\alpha = \frac{\sqrt{5} + 1}{2}$ and the convergence is superlinear.
Newton’s Method for Systems
For systems of nonlinear equations, let
\[F(x) = \left ( \begin{matrix} f_1(x_1, x_2, \cdots) \\ \vdots \\ f_n(x_1, x_2, \cdots)\end{matrix}\right)\]The intuitive $F’(x)$ is the Jacobian $J(x)$
\[J(x) = \left(\begin{matrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \vdots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_n}\end{matrix}\right)\]And
\[x_{n+1} = x_n - (J(x))^{-1} \cdot F(x)\]Generally, we solve $Ja = F$ to find $J^{-1}F$. In Matlab,
% newton's method for nonlinear systems
for i = 1:5
x = x - df(x) \ f(x);
fprintf("(%.10f, %.10f)\n", x(1), x(2));
end
System of Equations
Linear Systems
A linear system is
\[Ax = b\]Where $A$ is a $n\times n$ matrix and $b$ is a $n\times 1$ vector. In order to have a unique solution $x$, $A$ must be inveritable. Note that if $A$ is a upper-triangle matrix, we can use back substitution to solve it. And if $A$ is a lower-triangle matrix, we can use forward substitution to solve the system. It is a intuitive procedure compared with a general problem of linear systems.
Matlab Code of back substitution:
function x = back_sub(A, b)
[m, n] = size(U);
[Q, R] = mgs(A); % modified gram-schimidt
b = Q' * b;
x = zeros(1,n);
for i = n:-1:1
sum = b(i);
for j = n:-1:(i+1)
sum = sum - R(i, j) .* x(j);
end
x(i) = sum ./ R(i, i);
end
end
QR Factorization
For a invertible $n\times m$ matrix $A$, we have
\[A = QR\]where $Q$ is orthonormal ($m\times n$) and $R$ is a upper-triangle $n\times n$ matrix. It can be factored efficiently by Gram-Schimidt Procedure.
% Gram-Schimidt Procedure
function [Q, R] = gs(A)
[m, n] = size(A);
Q = zeros(m, n);
R = zeros(n, n);
for j = 1:n
v = A(:, j);
for i = 1:j-1
R(i, j) = Q(:, i)' * A(:, j);
v = v - R(i, j) * Q(:, i);
end
R(j, j) = norm(v);
Q(:, j) = v / R(j, j);
end
end
% Modified Gram-Schimidt Procedure
function [Q, R] = mgs(A)
[m, n] = size(A);
Q = zeros(m, n);
R = zeros(n, n);
V = A;
for i = 1:n
R(i, i) = norm(V(:, i));
Q(:, i) = V(:, i) / R(i, i);
for j = (i+1):n
R(i, j) = Q(:, i)' * V(:, j);
V(:, j) = V(:, j) - R(i, j) * Q(:, i);
end
end
end
High-level Steps to solve the linear system:
- For a linear system $Ax = b$, factor $A = QR$. Now we have $QRx =b$
- For a orthonormal matrix $Q$, $Q^T Q = \text{id}$. We just need to solve $Rx = Q^T b$ via back substitution.
QR factorization can also be used to solve least square problem. The problem is, for $Ax = b$, try to find
\[x_{*} = \arg \min_{x \in \mathbb{R}^n} ||Ax - b|| _2\]Similarly, $Rx = Q^T b$ solves the problem.
LU Factorization
For a invertible matrix $A$, we have
\[A = LU\]where $L$ is a lower-triangle matrix and $U$ is a upper-triangle matrix. It can be factored efficiently by Gaussian Elimination.
function [L, U] = lu_factor(A)
[~, n] = size(A);
L = eye(n);
for i = 1:n-1
for j = i+1:n
L(j, i) = A(j, i) / A(i, i);
A(j, :) = A(j, :) - L(j, i) * A(i, :);
end
end
U = A;
end
An important application is LU factorization is to solve many systems with the same $A$ and different $b_1, b_2, \cdots, b_N$.
Eigenvalue-Eigenvector Problem
For a $n\times n$ matrix $A$, we are looking for pairs $(\lambda, v)$ where $\lambda \in \mathbb{C}, v \in \mathbb{C}^n, v \neq 0$ s.t.
\[Av = \lambda v\]In linear algebra, $\lambda$ is the solution of
\[\mathrm{det}(A - \lambda I) = 0\]Power Iteration
Power iteration can be used to find the eigenvalue-eigenvector pair with largest in magnitude eigenvalue of $A$. For a diagonalitable $n \times n$ matrix $A$, assume its eigenvalues
\[|\lambda_1| > |\lambda_2| \ge |\lambda_3| \ge \cdots \ge |\lambda_n|\]For any essentially initial guess $x_0$,
\[x_{n+1} = Ax_n\]will converge to $cv_1$ as $n \to \infty$. In summary, we randomly choose $x_0$, then
- Iteration: $\hat{x}_{n+1} = Ax_n$
- Renormalize: $x_{n+1} = \hat{x}_{n+1} / || \hat{x}_{n+1}|| $
- Approx eval: $\mu_{n+1} = x_{n+1}^T A x_{n+1}$
function e = eigen(A, N)
% estimate the largest eigenvalue of a matrix A.
[n, ~] = size(A);
v = rand(n, 1);
for i=1:N
v = A * v;
v = v / norm(v, 2);
e = v' * A * v;
end
end
The order is linear.
Shifted Inverse Power Iteration
Trick 1: If $\lambda_1, \cdots, \lambda_n$ are eigenvalues of $A$, the eigenvalues of $A^{-1}$ is $1/\lambda_{1}, \cdots, 1/\lambda_{n}$.
Trick 2: If $\lambda_1, \cdots, \lambda_n$ are eigenvalues of $A$, the eigenvalues of $A - cI$ is $\lambda_{1}-c, \cdots, \lambda_{n}-c$.
Then for a good enough initial guess $\mu_0$, we can apply power iteration to $(A - \mu_0 I)^{-1}$ and get any eigenvalue. However, the order is still linear.
Rayleigh Quotient Iteration
We can use Rayleigh Quotient Iteration to speed up convergence. The order is quadratic. However, it requires better initial guess. Hence, in practice, we tends to use power iteration first, and then switch to Rayleigh Quotient Iteration.
function [mu, v] = rq_it(A, N, mu, x)
% implement the Rayleigh Quotient iteration.
% A - matrix
% N - iteration num
% x - initial guess
[n, ~] = size(A);
v = x;
for i = 1:N
v = (A - mu * eye(n)) \ v;
v = v / norm(v);
mu = (A * v)' * v;
end
end
Simulatenous Power Iteration
Simulatenous power iteration is used to find all the eigenvalues of a matrix.
function [Q, mu] = spi(A, N)
[n, ~] = size(A);
Q = rand(n, n);
for i = 1:N
[Q, ~] = mgs(A * Q);
mu = Q' * A * Q;
end
end
QR Algorithm
QR algorithm is cubic convergence to any matrix concurrently.
function Q = qr_algorithm(A, N)
[Q, R] = mgs(A);
for i = 1:N
[Q, R] = mgs(R * Q);
mu = Q' * A * Q;
end
end
Polynomial Interpolation
Polynomial interpolation for $n+1$ points $(x_0, y_0), \cdots, (x_n, y_n)$is equivalent to solve a linear system
\[\left (\begin{matrix}1 & x_0 & x_0^2 & \cdots & x_0^{k-1}\\1 & x_1 & x_1^2 & \cdots & x_0^{k-1}\\\vdots & \vdots & \vdots & \ddots & \vdots\\1 & x_{k-1} & x_{k-1}^2 & \cdots & x_{k-1}^{k-1}\\\end{matrix} \right )\left (\begin{matrix}a_0\\a_1\\\vdots\\a_{k-1}\\\end{matrix} \right )=\left (\begin{matrix}y_0\\y_1\\\vdots\\y_{k-1}\\\end{matrix} \right )\]The coefficient matrix is a Vandermonde matrix, which is nearly singular.
Lagrange Form of the Interpolating Polynomial
Actually, a better approach is Lagrange’s form of interpolating polynomial.
\[L_i = \prod_{j=0,j\neq i}^{n} \frac{(x-x_j)}{(x_i-x_j)}\] \[p(x) = \sum_{i=0}^n y_i L_i(x)\]Error Estimation
If $x_0, \cdots, x_n$ are interpolation points for a smooth function $f$ in $[a, b]$, for any $x \in [a,b]$, there exists $\xi \in [a,b]$,
\[f(x) = p(x) + \frac{f^{(n+1)}(\xi)}{(n+1)!}(x - x_0)(x-x_1)\cdots(x-x_n)\]Chebychev
\[T_n = \cos(n \cos^{-1}(x))\] \[T_n = 2x T_n(x) - T_{n-1}(x)\] \[T_0(x) = 1\] \[T_1(x) = x\]Legendre Polynomial
\[\hat{p_j}(x) = \frac{\sqrt{2}}{\sqrt{2j+1}} p_j(x)\] \[\hat{p_0}(x) = 1\] \[\hat{p_1}(x) = x\] \[\hat{p_j}(x) = \frac{2j-1}{j} x \hat{p_{j-1}}(x) - \frac{j-1}{j} \hat{p_{j-2}}(x)\]Numerical Quadrature
\[I_n(f) = \sum_{i=0}^n f(x_i) \int_a^b L_i(x)dx\]Reference
- A Friendly Introduction to Numerical Analysis.
- MATH 371 Introduction to Numerical Methods, University of Michigan.