---
jupytext:
  cell_metadata_filter: -all
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.10.3
kernelspec:
  display_name: Julia 1.7.1
  language: julia
  name: julia-fast
---
```{code-cell}
:tags: [remove-cell]
using FundamentalsNumericalComputation
FNC.init_format()
```

(section-leastsq-qr)=
# The QR factorization

```{index} ! orthogonal vectors, ! orthonormal vectors
```

Sets of vectors satisfying a certain property are useful both theoretically and computationally.

::::{proof:definition} Orthogonal vectors
Two vectors $\mathbf{u}$ and $\mathbf{v}$ in $\mathbb{R}^n$ are **orthogonal** if $\mathbf{u}^T\mathbf{v}=0$. We say that a collection of vectors $\mathbf{q}_1,\ldots,\mathbf{q}_k$ is orthogonal if

```{math}
:label: orthogonality
i \neq j \quad \Rightarrow \quad \mathbf{q}_i^T\mathbf{q}_j = 0.
```

If {eq}`orthogonality` applies and also $\mathbf{q}_i^T\mathbf{q}_i=1$ for all $i=1,\ldots,n$, we say the vectors are **orthonormal**.
::::

```{index} inner product
```

In two and three dimensions, orthogonality is the same as perpendicularity. 

Orthogonal vectors simplify inner products. For example, if $\mathbf{q}_1$ and $\mathbf{q}_2$ are orthogonal, then

```{math}
:label: orthosubtract
\| \mathbf{q}_1 - \mathbf{q}_2 \|_2^2 = (\mathbf{q}_1-\mathbf{q}_2)^T(\mathbf{q}_1-\mathbf{q}_2)
= \mathbf{q}_1^T\mathbf{q}_1 - 2 \mathbf{q}_1^T\mathbf{q}_2 + \mathbf{q}_2^T\mathbf{q}_2
= \|\mathbf{q}_1\|_2^2 + \|\mathbf{q}_2\|_2^2.
```

As in the rest of this chapter, we will be using the 2-norm exclusively. 

```{index} subtractive cancellation
```

Equation {eq}`orthosubtract` is the key to the computational attractiveness of orthogonality. {numref}`fig-nonorthogonal` shows how nonorthogonal vectors can allow a multidimensional version of subtractive cancellation, in which $\|\mathbf{x}-\mathbf{y}\|$ is much smaller than $\|\mathbf{x}\|$ and $\|\mathbf{y}\|$. As the figure illustrates, orthogonal vectors do not allow this phenomenon. By {eq}`orthosubtract`, the magnitude of a vector difference or sum is larger than the magnitudes of the original vectors. 

```{figure} figures/nonorthogonal.svg
:name: fig-nonorthogonal
Nonorthogonal vectors can cause cancellation when subtracted, but orthogonal vectors never do.
```

```{proof:observation}
Addition and subtraction of vectors are guaranteed to be well conditioned when the vectors are orthogonal.
```
## Orthogonal and ONC matrices

Statements about orthogonal vectors are often made more easily in matrix form. Let $\mathbf{Q}$ be an $n\times k$ matrix whose columns $\mathbf{q}_1, \ldots, \mathbf{q}_k$ are orthogonal vectors. The orthogonality conditions {eq}`orthogonality` become simply that $\mathbf{Q}^T\mathbf{Q}$ is a diagonal matrix, since

```{math}
\mathbf{Q}^T \mathbf{Q} =
\begin{bmatrix}
\mathbf{q}_1^T \\[1mm] \mathbf{q}_2^T \\ \vdots \\ \mathbf{q}_k^T
\end{bmatrix}
\begin{bmatrix}
\mathbf{q}_1 & \mathbf{q}_2 &  \cdots & \mathbf{q}_k
\end{bmatrix}
=
\begin{bmatrix}
\mathbf{q}_1^T\mathbf{q}_1 & \mathbf{q}_1^T\mathbf{q}_2 & \cdots & \mathbf{q}_1^T\mathbf{q}_k \\[1mm]
\mathbf{q}_2^T\mathbf{q}_1 & \mathbf{q}_2^T\mathbf{q}_2 & \cdots & \mathbf{q}_2^T\mathbf{q}_k \\
\vdots & \vdots & & \vdots \\
\mathbf{q}_k^T\mathbf{q}_1 & \mathbf{q}_k^T\mathbf{q}_2 & \cdots & \mathbf{q}_k^T\mathbf{q}_k
\end{bmatrix}.
```

```{index} ! ONC matrix
```

If the columns of $\mathbf{Q}$ are orthonormal, then $\mathbf{Q}^T\mathbf{Q}$ is the $k\times k$ identity matrix. This is such an important property that we will break with common practice here and give this type of matrix a name. 

```{proof:definition} ONC matrix
An **ONC matrix** is one whose columns are an orthonormal set of vectors. 
```

(theorem-qr-ONC)=
````{proof:theorem} ONC matrix

Suppose $\mathbf{Q}$ is a real $n\times k$ ONC matrix (matrix with orthonormal columns). Then:

1. $\mathbf{Q}^T\mathbf{Q} = \mathbf{I}$ ($k\times k$ identity).
2. $\| \mathbf{Q}\mathbf{x} \|_2 = \| \mathbf{x} \|_2$ for all $k$-vectors $\mathbf{x}$.
3. $\| \mathbf{Q} \|_2=1$.
````

````{proof:proof}
The first part is derived above. The second part follows a pattern that has become well established by now:

```{math}
\| \mathbf{Q}\mathbf{x} \|_2^2 = (\mathbf{Q}\mathbf{x})^T(\mathbf{Q}\mathbf{x}) = \mathbf{x}^T \mathbf{Q}^T \mathbf{Q} \mathbf{x} = \mathbf{x}^T \mathbf{I} \mathbf{x} = \| \mathbf{x} \|_2^2.
```

The last part of the theorem is left to the exercises.
````

```{index} ! orthogonal matrix
```

Of particular interest is a *square* ONC matrix.[^ortho] 

```{proof:definition}
An **orthogonal matrix** is a square matrix with orthonormal columns.
```

Orthogonal matrices have properties beyond {numref}`Theorem {number} <theorem-qr-ONC>`. 

[^ortho]: Confusingly, a square matrix whose columns are orthogonal is not necessarily an orthogonal matrix; the columns must be orthonormal, which is a stricter condition.


(theorem-qr-orthogmatrix)=
````{proof:theorem} Orthogonal matrix

Suppose $\mathbf{Q}$ is an $n\times n$ real orthogonal matrix. Then:
1. $\mathbf{Q}^T = \mathbf{Q}^{-1}$.
2. $\mathbf{Q}^T$ is also an orthogonal matrix.
3. $\kappa(\mathbf{Q})=1$ in the 2-norm.
4. For any other $n\times n$ matrix $\mathbf{A}$, $\| \mathbf{A}\mathbf{Q} \|_2=\| \mathbf{A} \|_2$.
5. If $\mathbf{U}$ is another $n\times n$ orthogonal matrix, then $\mathbf{Q}\mathbf{U}$ is also orthogonal.
````
::::{proof:proof}
Since $\mathbf{Q}$ is an ONC matrix, $\mathbf{Q}^T\mathbf{Q}=\mathbf{I}$. All three matrices are $n\times n$, so $\mathbf{Q}^{-1}=\mathbf{Q}^T$. The proofs of the other statements are left to the exercises.
::::

## Orthogonal factorization

```{index} ! matrix factorization; QR
```

Now we come to another important way to factor a matrix: the **QR factorization**. As we will show below, the QR factorization plays a role in linear least squares analogous to the role of LU factorization in linear systems.

(theorem-qr-QR)=
````{proof:theorem} QR factorization
Every real $m\times n$ matrix $\mathbf{A}$ ($m\ge n$) can be written as $\mathbf{A}=\mathbf{Q}\mathbf{R}$, where $\mathbf{Q}$ is an $m\times m$ orthogonal matrix and $\mathbf{R}$ is an $m\times n$ upper triangular matrix.
````

In most introductory books on linear algebra, the QR factorization is derived through a process known as **Gram–Schmidt orthogonalization**. However, while it is an important tool for theoretical work, the Gram–Schmidt process is numerically unstable. We will consider an alternative construction in {numref}`section-leastsq-house`.

When $m$ is much larger than $n$, which is often the case, there is a compressed form of the factorization that is more efficient. In the product

```{math}
\mathbf{A} =
\begin{bmatrix}
\mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_m
\end{bmatrix}
\begin{bmatrix}
r_{11} & r_{12} & \cdots & r_{1n} \\
0 & r_{22} & \cdots &  r_{2n} \\
\vdots & & \ddots & \vdots\\
0 & 0 & \cdots & r_{nn} \\
0 & 0 & \cdots & 0 \\
\vdots & \vdots &  & \vdots \\
0 & 0 & \cdots & 0
\end{bmatrix},
```

the vectors $\mathbf{q}_{n+1},\ldots,\mathbf{q}_m$ always get multiplied by zero. Nothing about $\mathbf{A}$ is lost if we delete them and reduce the factorization to the equivalent form

```{math}
:label: economyqr
\mathbf{A} =
\begin{bmatrix}
\mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_n
\end{bmatrix}
\begin{bmatrix}
r_{11} & r_{12} & \cdots & r_{1n} \\
0 & r_{22} & \cdots &  r_{2n} \\
\vdots & & \ddots & \vdots\\
0 & 0 & \cdots & r_{nn}
\end{bmatrix} = \hat{\mathbf{Q}} \hat{\mathbf{R}}.
```

::::{proof:definition} Thin QR factorization
The thin QR factorization is $\mathbf{A} = \hat{\mathbf{Q}} \hat{\mathbf{R}}$, where $\hat{\mathbf{Q}}$ is $m\times n$ and ONC, and $\hat{\mathbf{R}}$ is $n\times n$ and upper triangular.
::::

(demo-qr-qrfact)=
```{proof:demo}
```

```{raw} html
<div class='demo'>
```

```{raw} latex
%%start demo%%
```

Julia provides access to both the thin and full forms of the QR factorization.

```{code-cell}
A = rand(1.:9.,6,4)
@show m,n = size(A);
```

Here is a standard call:

```{code-cell}
Q,R = qr(A);
Q
```

```{code-cell}
R
```

::::{grid} 1 1 2 2
:gutter: 2

:::{grid-item}
:columns: 7

```{raw} latex
\begin{minipage}[t]{0.5\textwidth}
```
If you look carefully, you see that we seemingly got a full $\mathbf{Q}$ but a thin $\mathbf{R}$. However, the $\mathbf{Q}$ above is not a standard matrix type. If you convert it to a true matrix, then it reverts to the thin form.

```{raw} latex
\end{minipage}\hfill
```
:::
:::{grid-item-card}
:columns: 5

```{raw} latex
\begin{minipage}[t]{0.4\textwidth}\begin{mdframed}[default]\small
```
To enter the accented character `Q̂`, type `Q\hat` followed by <kbd>Tab</kbd>.
```{raw} latex
\end{mdframed}\end{minipage}
```
:::
::::

```{code-cell}
Q̂ = Matrix(Q)
```

We can test that $\mathbf{Q}$ is an orthogonal matrix:

```{code-cell}
opnorm(Q'*Q - I)
```

The thin $\hat{\mathbf{Q}}$ cannot be an orthogonal matrix, because it is not square, but it is still ONC:

```{code-cell}
Q̂'*Q̂ - I
```
```{raw} html
</div>
```

```{raw} latex
%%end demo%%
```

## Least squares and QR

```{index} linear least-squares problem
```

```{index} normal equations
```

If we substitute the thin factorization {eq}`economyqr` into the normal equations {eq}`normaleqns`, we can simplify expressions a great deal.

```{math}
\begin{split}
  \mathbf{A}^T\mathbf{A} \mathbf{x} &= \mathbf{A}^T \mathbf{b}, \\
  \hat{\mathbf{R}}^T \hat{\mathbf{Q}}^T \hat{\mathbf{Q}} \hat{\mathbf{R}} \mathbf{x} &= \hat{\mathbf{R}}^T \hat{\mathbf{Q}}^T \mathbf{b}, \\
  \hat{\mathbf{R}}^T \hat{\mathbf{R}} \mathbf{x}& = \hat{\mathbf{R}}^T \hat{\mathbf{Q}}^T \mathbf{b}.
\end{split}
```

In order to have the normal equations be well posed, we require that $\mathbf{A}$ is not rank-deficient (as proved in {numref}`Theorem %s <theorem-ATA>`). This is enough to guarantee that $\hat{\mathbf{R}}$ is nonsingular (see [Exercise 4](problem-qr-nonsingR)). Therefore, its transpose is nonsingular as well, and we arrive at

```{math}
:label: lsqr
\hat{\mathbf{R}} \mathbf{x}=\hat{\mathbf{Q}}^T \mathbf{b}.
```

(algorithm-qr-solve)=
::::{proof:algorithm} Solution of linear least squares by thin QR
1. Compute the thin QR factorization $\hat{\mathbf{Q}}\hat{\mathbf{R}}=\mathbf{A}$.
1. Compute $\mathbf{z} = \hat{\mathbf{Q}}^T\mathbf{b}$.
1. Solve the $n\times n$ linear system $\hat{\mathbf{R}}\mathbf{x} = \mathbf{z}$ for $\mathbf{x}$.
::::

This algorithm is implemented in {numref}`Function {number} <function-lsqrfact>`. It is essentially the algorithm used internally by Julia when `A\b` is called. The execution time is dominated by the factorization, the most common method for which is described in {numref}`section-leastsq-house`.

(function-lsqrfact)=
```{proof:function} lsqrfact
**Linear least-squares solution by QR factorization**
```{code-block} julia
:lineno-start: 1
"""
    lsqrfact(A,b)

Solve a linear least-squares problem by QR factorization. Returns
the minimizer of ||b-Ax||.
"""
function lsqrfact(A,b)
    Q,R = qr(A)
    c = Q'*b
    x = backsub(R,c)
    return x
end
```

The solution of least-squares problems via QR factorization is more stable than when the normal equations are formulated and solved directly.

(demo-qr-stable)=
```{proof:demo}
```

```{raw} html
<div class='demo'>
```

```{raw} latex
%%start demo%%
```

We'll repeat the experiment of {numref}`Demo {number} <demo-normaleqns-instab>`, which exposed instability in the normal equations. 

```{code-cell}
t = range(0,3,length=400)
f = [ x->sin(x)^2, x->cos((1+1e-7)*x)^2, x->1. ]
A = [ f(t) for t in t, f in f ]
x = [1.,2,1]
b = A*x;
```

The error in the solution by {numref}`Function {number} <function-lsqrfact>` is within the bound predicted by the condition number.

```{code-cell}
observed_error = norm(FNC.lsqrfact(A,b)-x)/norm(x);
@show observed_error;
κ = cond(A)
@show error_bound = κ*eps();
```

```{raw} html
</div>
```

```{raw} latex
%%end demo%%
```

## Exercises

1. ✍ Prove part 3 of {numref}`Theorem %s <theorem-qr-ONC>`.

2. ✍ Prove {numref}`Theorem %s <theorem-qr-orthogmatrix>`. For the third part, use the definition of the 2-norm as an induced matrix norm, then apply some of our other results as needed.

    (problem-qr-legendre)=
3. ⌨ Let $t_0,\ldots,t_m$ be $m+1$ equally spaced points in $[-1,1]$. Let $\mathbf{A}$ be the matrix in {eq}`vandersystemrect` for $m=400$ and fitting by polynomials of degree less than 5. Find the thin QR factorization of $\mathbf{A}$, and, on a single graph, plot every column of $\hat{\mathbf{Q}}$ as a function of the vector $t$.

    (problem-qr-nonsingR)=
4. ✍ Prove that if the $m\times n$ ($m\ge n$) matrix $\mathbf{A}$ is not rank-deficient, then the factor $\hat{\mathbf{R}}$ of the thin QR factorization is invertible. (Hint: Suppose on the contrary that $\hat{\mathbf{R}}$ is singular. Show using the factored form of $\mathbf{A}$ that this would imply that $\mathbf{A}$ is rank-deficient.)

5. ✍ Let $\mathbf{A}$ be $m\times n$ with $m>n$. Show that if $\mathbf{A}=\mathbf{Q}\mathbf{R}$ is a QR factorization and $\mathbf{R}$ has rank $n$, then $\mathbf{A}^+=\mathbf{R}^+\mathbf{Q}^T$.

6. ✍ Let $\mathbf{A}$ be $m\times n$ with $m>n$. Show that if $\mathbf{A}=\hat{\mathbf{Q}}\hat{\mathbf{R}}$ is a thin QR factorization and $\hat{\mathbf{R}}$ is invertible, then $\mathbf{A}^+=\hat{\mathbf{R}}^{-1}\hat{\mathbf{Q}}^T$.

7. ⌨ Repeat [Exercise 3.1.2](problem-fitting-census), but use thin QR factorization rather than the backslash to solve the least-squares problem.

8. ✍ The matrix $\mathbf{P}=\hat{\mathbf{Q}} \hat{\mathbf{Q}}^T$ derived from the thin QR factorization has some interesting and important properties.

    **(a)** Prove that $\mathbf{P}=\mathbf{A}\mathbf{A}^+$.

    **(b)** Prove that $\mathbf{P}^2=\mathbf{P}$. (This property defines a *projection matrix*.)

    **(c)** Any vector $\mathbf{x}$ may be written trivially as $\mathbf{x}=\mathbf{u}+\mathbf{v}$, where $\mathbf{u}=\mathbf{P}\mathbf{x}$ and $\mathbf{v}=(\mathbf{I}-\mathbf{P})\mathbf{x}$. Prove that $\mathbf{u}$ and $\mathbf{v}$ are orthogonal. (Together with part (b), this means that $\mathbf{P}$ is an *orthogonal projector*.)

