Symmetry and definiteness
Contents
7.4. Symmetry and definiteness#
As we saw in Section 2.9, symmetry can simplify the LU factorization into the symmetric form \(\mathbf{A}=\mathbf{L}\mathbf{D}\mathbf{L}^T\). Important specializations occur as well for the eigenvalue and singular value factorizations. In this section we stay with complex-valued matrices, so we are interested in the case when \(\mathbf{A}^*=\mathbf{A}\), i.e., \(\mathbf{A}\) is hermitian. However, we often loosely speak of symmetry to mean this property even in the complex case. All of the statements in this section easily specialize to the real case.
Normality#
Suppose now that \(\mathbf{A}^*=\mathbf{A}\) and that \(\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^*\) is an SVD. Since \(\mathbf{S}\) is real and square, we have
and it’s tempting to conclude that \(\mathbf{U}=\mathbf{V}\). Happily, this is nearly true. The following theorem is typically proved in an advanced linear algebra course.
If \(\mathbf{A}=\mathbf{A}^*\), then \(\mathbf{A}\) has a diagonalization \(\mathbf{A}=\mathbf{V} \mathbf{D} \mathbf{V}^{-1}\) in which \(\mathbf{V}\) is unitary and \(\mathbf{D}\) is diagonal and real.
Another way to state the result of this theorem is that a hermitian matrix has real eigenvalues and a complete set of orthonormal eigenvectors—that is, it is normal. Because hermitian matrices are normal, their eigenvalue condition number is guaranteed to be 1 by Theorem 7.2.9.
The converse of Theorem 7.4.1 is also true: every normal matrix with real eigenvalues is hermitian. This was illustrated in Demo 7.2.11.
For a hermitian matrix, the EVD
is almost an SVD.
If \(\mathbf{A}^*=\mathbf{A}\) and \(\mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^{-1}\) is a unitary diagonalization, then
is an SVD, where \(|\mathbf{D}|\) is the elementwise absolute value and \(\mathbf{T}\) is diagonal with \(|T_{ii}|=1\) for all \(i\).
Let \(T_{ii}=\operatorname{sign}(D_{ii})\) for all \(i\). Then \(\mathbf{T}^2=\mathbf{I}\), \(|\mathbf{D}|=\mathbf{T}\mathbf{D}\), and
Rayleigh quotient#
Recall that for a matrix \(\mathbf{A}\) and compatible vector \(\mathbf{x}\), the quadratic form \(\mathbf{x}^* \mathbf{A} \mathbf{x}\) is a scalar.
Given hermitian \(\mathbf{A}\) and nonzero vector \(\mathbf{x}\), the Rayleigh quotient is the function
If \(\mathbf{v}\) is an eigenvector such that \(\mathbf{A} \mathbf{v}=\lambda \mathbf{v}\), then one easily calculates that \(R_{\mathbf{A}}(\mathbf{v})=\lambda.\) That is, the Rayleigh quotient maps an eigenvector into its associated eigenvalue.
If \(\mathbf{A}^*=\mathbf{A}\), then the Rayleigh quotient has another interesting property: \(\nabla R_{\mathbf{A}}(\mathbf{v})=\boldsymbol{0}\) if \(\mathbf{v}\) is an eigenvector. By a multidimensional Taylor series, then,
as \(\epsilon\to 0\). The conclusion is that a good estimate of an eigenvector becomes an even better estimate of an eigenvalue.
We will use a symmetric matrix with a known EVD.
n = 20;
λ = 1:n
D = diagm(λ)
V,_ = qr(randn(n,n)) # get a random orthogonal V
A = V*D*V';
The Rayleigh quotient is a scalar-valued function of a vector.
R = x -> (x'*A*x)/(x'*x);
The Rayleigh quotient evaluated at an eigenvector gives the corresponding eigenvalue.
R(V[:,7])
6.999999999999999
If the input to he Rayleigh quotient is within a small \(\delta\) of an eigenvector, its output is within \(O(\delta^2)\) of the corresponding eigenvalue. In this experiment, we observe that each additional digit of accuracy in an approximate eigenvector gives two more digits to the eigenvalue estimate coming from the Rayleigh quotient.
δ = @. 1 ./10^(1:5)
eval_diff = zeros(size(δ))
for (k,delta) in enumerate(δ)
e = randn(n); e = delta*e/norm(e);
x = V[:,7] + e
eval_diff[k] = R(x) - 7
end
label = ["perturbation δ","δ^2","R(x) - λ"]
pretty_table([δ δ.^2 eval_diff],label)
┌────────────────┬─────────┬─────────────┐
│ perturbation δ │ δ^2 │ R(x) - λ │
├────────────────┼─────────┼─────────────┤
│ 0.1 │ 0.01 │ 0.0390208 │
│ 0.01 │ 0.0001 │ 0.000316182 │
│ 0.001 │ 1.0e-6 │ 3.66714e-6 │
│ 0.0001 │ 1.0e-8 │ 3.21225e-8 │
│ 1.0e-5 │ 1.0e-10 │ 3.82387e-10 │
└────────────────┴─────────┴─────────────┘
Definite, semidefinite, and indefinite matrices#
In the real case, we called a symmetric matrix \(\mathbf{A}\) symmetric positive definite (SPD) if \(\mathbf{x}^T \mathbf{A}\mathbf{x} > 0 \) for all nonzero vectors \(\mathbf{x}\). In the complex case the relevant property is hermitian positive definite (HPD), meaning that \(\mathbf{A}^*=\mathbf{A}\) and \(\mathbf{x}^* \mathbf{A}\mathbf{x} > 0\) for all complex vectors \(\mathbf{x}\). Putting this property together with the Rayleigh quotient leads to the following.
If \(\mathbf{A}^*=\mathbf{A}\), then the following statements are equivalent.
\(\mathbf{A}\) is HPD.
The eigenvalues of \(\mathbf{A}\) are positive numbers.
Suppose item 1 is true. If \(\mathbf{A}\mathbf{x} = \lambda \mathbf{x}\) is an eigenpair, then a Rayleigh quotient implies that
Hence item 2 is true. Conversely, suppose item 2 is known. Then we can write the EVD as \(\mathbf{A}=\mathbf{V}\mathbf{S}^2\mathbf{V}^*\), where the \(S_{ii}\) are positive square roots of the eigenvalues. Hence
as both \(\mathbf{S}\) and \(\mathbf{V}\) are invertible. Thus, item 1 is true.
According to Theorem 7.4.5, for an HPD matrix, the EVD \(\mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^*\) meets all the requirements of the SVD, provided the ordering of eigenvalues is chosen appropriately.
A hermitian matrix with all negative eigenvalues is called negative definite, and one with eigenvalues of different signs is indefinite. Finally, if one or more eigenvalues is zero and the rest have one sign, it is positive or negative semidefinite.
Exercises#
✍ Each line below is an EVD for a hermitian matrix. State whether the matrix is definite, indefinite, or semidefinite. Then state whether the given factorization is also an SVD, and if it is not, modify it to find an SVD.
(a) \(\begin{bmatrix} 0 & 0 \\ 0 & -1 \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} -1 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}\)
(b) \(\begin{bmatrix} 4 & -2 \\ -2 & 1 \end{bmatrix} = \begin{bmatrix} 1 & -0.5 \\ -0.5 & -1 \end{bmatrix} \begin{bmatrix} 5 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 0.8 & -0.4 \\ -0.4 & -0.8 \end{bmatrix}\)
(c) \(\begin{bmatrix} -5 & 3\\ 3 & -5 \end{bmatrix} = \begin{bmatrix} \alpha & \alpha \\ \alpha & -\alpha \end{bmatrix} \begin{bmatrix} -2 & 0 \\ 0 & -8 \end{bmatrix} \begin{bmatrix} \alpha & \alpha \\ \alpha & -\alpha \end{bmatrix}, \quad\alpha=1/\sqrt{2}\)
⌨ Determine whether each matrix is positive definite, negative definite, positive or negative semidefinite, or indefinite.
(a)
matrixdepot("pei",5)-6I
(b)
matrixdepot("hilb",8)-2I
(c)
matrixdepot("dingdong",20)
(d)
matrixdepot("lehmer",100)
(e)
matrixdepot("fiedler",200)
✍ Prove true, or give a counterexample: If \(\mathbf{A}\) and \(\mathbf{B}\) are hermitian matrices of the same size, then
\[ R_{\mathbf{A}+\mathbf{B}}(\mathbf{x}) = R_{\mathbf{A}}(\mathbf{x})+R_{\mathbf{B}}(\mathbf{x}). \]⌨ The range of the function \(R_{\mathbf{A}}(\mathbf{x})\) is a subset of the complex plane known as the field of values of the matrix \(\mathbf{A}\). Use 500 random vectors to plot points in the field of values of \(\mathbf{A} = \displaystyle \begin{bmatrix} 1 & 0 & -2\\ 0 & 2 & 0\\ -2 & 0 & 1 \end{bmatrix}\). Then compute its eigenvalues and guess what the exact field of values is.
✍ Let \(\mathbf{A}=\displaystyle \begin{bmatrix} 3 & -2 \\ -2 & 0 \end{bmatrix}.\)
(a) Write out \(R_{\mathbf{A}}(\mathbf{x})\) explicitly as a function of \(x_1\) and \(x_2\).
(b) Find \(R_{\mathbf{A}}(\mathbf{x})\) for \(x_1=1\), \(x_2=2\).
(c) Find the gradient vector \(\nabla R_{\mathbf{A}}(\mathbf{x})\).
(d) Show that the gradient vector is zero when \(x_1=1\), \(x_2=2\).
✍ A skew-Hermitian matrix is one that satisfies \(\mathbf{A}^*=-\mathbf{A}\). Show that if \(\mathbf{A}\) is skew-Hermitian, then \(R_{\mathbf{A}}\) is imaginary-valued.
⌨ Thanks largely to Theorem 7.4.1, the eigenvalue problem for symmetric/hermitian matrices is easier than for general matrices.
(a) Let \(\mathbf{A}\) be a \(1000\times 1000\) random real matrix, and let \(\mathbf{S}=\mathbf{A}+\mathbf{A}^T\). Using
@elapsed
, time theeigvals
function for \(\mathbf{A}\) and then for \(\mathbf{S}\). You should find that the computation for \(\mathbf{S}\) is around an order of magnitude faster.(b) Perform the experiment from part (a) on \(n\times n\) matrices for \(n=200,300,\ldots,1600\). Plot running time as a function of \(n\) for both matrices on a single log-log plot. Is the ratio of running times roughly constant, or does it grow with \(n\)?