# 7.5. Dimension reduction#

The SVD has another important property that proves very useful in a variety of applications. Let $$\mathbf{A}$$ be a real $$m\times n$$ matrix with SVD $$\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^T$$ and (momentarily) $$m\ge n$$. Another way of writing the thin form of the SVD is

(7.5.1)#$\begin{split}\begin{split} \mathbf{A} = \hat{\mathbf{U}}\hat{\mathbf{S}}\mathbf{V}^T &= \begin{bmatrix} \rule[-0.3em]{0pt}{1em} \mathbf{u}_1 & \mathbf{u}_2 & \cdots & \mathbf{u}_n \end{bmatrix} \: \begin{bmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_n \end{bmatrix} \: \begin{bmatrix} \mathbf{v}_1^T \\ \vdots \\ \mathbf{v}_n^T \end{bmatrix}\ \\ &= \begin{bmatrix} \rule[-0.3em]{0pt}{1em} \sigma_1\mathbf{u}_1 & \cdots & \sigma_n\mathbf{u}_n \end{bmatrix}\: \begin{bmatrix} \mathbf{v}_1^T \\ \vdots \\ \mathbf{v}_n^T \end{bmatrix} \\ &= \sigma_1 \mathbf{u}_{1}\mathbf{v}_{1}^T + \cdots + \sigma_r \mathbf{u}_{r}\mathbf{v}_{r}^T = \sum_{i=1}^r \sigma_i \mathbf{u}_{i}\mathbf{v}_{i}^T, \end{split}\end{split}$

where $$r$$ is the rank of $$\mathbf{A}$$. The final formula also holds for the case $$m<n$$.

Each outer product $$\mathbf{u}_{i}\mathbf{v}_{i}^T$$ is a rank-1 matrix of unit 2-norm. Thanks to the ordering of singular values, then, Equation (7.5.1) expresses $$\mathbf{A}$$ as a sum of decreasingly important contributions. This motivates the definition, for $$1\le k \le r$$,

(7.5.2)#$\mathbf{A}_k = \sum_{i=1}^k \sigma_i \mathbf{u}_{i}\mathbf{v}_{i}^T = \mathbf{U}_k \mathbf{S}_k \mathbf{V}_k^T,$

where $$\mathbf{U}_k$$ and $$\mathbf{V}_k$$ are the first $$k$$ columns of $$\mathbf{U}$$ and $$\mathbf{V}$$, respectively, and $$\mathbf{S}_k$$ is the upper-left $$k\times k$$ submatrix of $$\mathbf{S}$$.

The rank of a sum of matrices is always less than or equal to the sum of the ranks, so $$\mathbf{A}_k$$ is a rank-$$k$$ approximation to $$\mathbf{A}$$. It turns out that $$\mathbf{A}_k$$ is the best rank-$$k$$ approximation of $$\mathbf{A}$$, as measured in the matrix 2-norm.

Theorem 7.5.1

Suppose $$\mathbf{A}$$ has rank $$r$$ and let $$\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^T$$ be an SVD. Let $$\mathbf{A}_k$$ be as in (7.5.2) for $$1\le k < r$$. Then

1. $$\| \mathbf{A} - \mathbf{A}_k \|_2 = \sigma_{k+1}, \quad k=1,\ldots,r-1$$, and

2. If the rank of $$\mathbf{B}$$ is $$k$$ or less, then $$\| \mathbf{A}-\mathbf{B} \|_2\ge \sigma_{k+1}$$.

Proof

(part 1 only) Note that (7.5.2) is identical to (7.5.1) with $$\sigma_{k+1},\ldots,\sigma_r$$ all set to zero. This implies that

$\mathbf{A} - \mathbf{A}_k = \mathbf{U}(\mathbf{S}-\hat{\mathbf{S}})\mathbf{V}^T,$

where $$\hat{\mathbf{S}}$$ has those same values of $$\sigma_i$$ replaced by zero. But that makes the above an SVD of $$\mathbf{A} - \mathbf{A}_k$$, with singular values $$0,\ldots,0,\sigma_{k+1},\ldots,\sigma_r$$, the largest of which is $$\sigma_{k+1}$$. That proves the first claim.

## Compression#

If the singular values of $$\mathbf{A}$$ decrease sufficiently rapidly, then $$\mathbf{A}_{k}$$ may capture the most significant behavior of the matrix for a reasonably small value of $$k$$.

Demo 7.5.2

We make an image from some text, then reload it as a matrix.

plot(annotations=(0.5,0.5,text("Hello world",44,:center,:middle)),
grid=:none,frame=:none,size=(400,150))
savefig("hello.png")
A = @. Float64(Gray(img))
Gray.(A) Next we show that the singular values decrease until they reach zero (more precisely, until they are about $$\epsilon_\text{mach}$$ times the norm of the matrix) at around $$k=45$$.

U,σ,V = svd(A)
scatter(σ,xaxis=(L"i"), yaxis=(:log10,L"\sigma_i"),
title="Singular values")


The rapid decrease suggests that we can get fairly good low-rank approximations.

plt = plot(layout=(2,2),frame=:none,aspect_ratio=1,titlefontsize=10)
for i in 1:4
k = 3i
Ak = U[:,1:k]*diagm(σ[1:k])*V[:,1:k]'
plot!(Gray.(Ak),subplot=i,title="rank = \$k")
end
plt


Consider how little data is needed to reconstruct these images. For rank-9, for instance, we have 9 left and right singular vectors plus 9 singular values, for a compression ratio of better than 12:1.

m,n = size(A)
compression = m*n / (9*(m+n+1))

12.099213551119178


## Exercises#

1. ✍ Suppose that $$\mathbf{A}$$ is an $$n\times n$$ matrix. Explain why $$\sigma_n$$ is the distance (in 2-norm) from $$\mathbf{A}$$ to the set of all singular matrices.

2. ✍ Suppose $$\mathbf{A}$$ is a $$7\times 4$$ matrix and the eigenvalues of $$\mathbf{A}^*\mathbf{A}$$ are 3, 4, 7, and 10. How close is $$\mathbf{A}$$ in the 2-norm to (a) a rank-3 matrix? (b) a rank-2 matrix?

3. (a) ⌨ Find the rank-1 matrix closest to

$\begin{split} \mathbf{A}=\displaystyle \begin{bmatrix} 1 & 5 \\ 5 & 1 \end{bmatrix}, \end{split}$

as measured in the 2-norm.

(b) ⌨ Repeat part (a) for

$\begin{split} \mathbf{A}=\displaystyle \begin{bmatrix} 1 & 5 \\ 0 & 1 \end{bmatrix}. \end{split}$
4. ✍ Find the rank-1 matrix closest to

$\begin{split} \mathbf{A}=\displaystyle \begin{bmatrix} 1 & b \\ b & 1 \end{bmatrix}, \end{split}$

as measured in the 2-norm, where $$b>0$$.

5. ⌨ Following Demo 7.5.2 as a guide, load the “mandrill” test image and convert it to a matrix of floating-point pixel grayscale intensities. Using the SVD, display as images the best approximations of rank 5, 10, 15, and 20.

1

In statistics this quantity may be interpreted as the fraction of explained variance.