# Sparsity and structure

## Contents

# 8.1. Sparsity and structure¶

Very large matrices cannot be stored all within primary memory of a computer unless they are **sparse**. A sparse matrix has *structural zeros*, meaning entries that are known to be exactly zero.} For instance, the adjacency matrix of a graph has zeros where there are no links in the graph. To store and operate with a sparse matrix efficiently, it is not represented as an array of all of its values. There is a variety of sparse formats available; for the most part, you can imagine that the matrix is stored as triples \((i,j,A_{ij})\) for all the nonzero \((i,j)\) locations.

## Computing with sparse matrices¶

Most graphs with real applications have many fewer edges than the maximum possible \(n^2\) for \(n\) nodes. Accordingly, their adjacency matrices have mostly zero elements and should be represented sparsely. Julia functions to deal with sparse matrices are found in the `SparseArrays`

package in the standard library.

Here we load the adjacency matrix of a graph with 2790 nodes. Each node is a web page referring to Roswell, NM, and the edges represent links between web pages. (Credit goes to Panayiotis Tsaparas and the University of Toronto for making this data public.)

```
@load "roswell.jld2" A; # file is on the book's website
```

We may define the density of \(\mathbf{A}\) as the number of nonzeros divided by the total number of entries.

```
m,n = size(A)
@show density = nnz(A) / (m*n);
```

```
density = nnz(A) / (m * n) = 0.0010902994565845762
```

The computer memory consumed by any variable can be discovered using `summarysize`

. We can use it to compare the space needed for the sparse representation to its dense counterpart, that is, the space needed to store all the elements, whether zero or not.

```
F = Matrix(A)
Base.summarysize(F) / Base.summarysize(A)
```

```
550.268980630567
```

As you can see, the storage savings are dramatic. Matrix-vector products are also much faster using the sparse form because operations with structural zeros are skipped.

```
x = randn(n)
A*x; # make sure * is loaded and compiled
@elapsed for i in 1:300; A*x; end
```

```
0.003042209
```

```
F*x;
@elapsed for i in 1:300; F*x; end
```

```
0.650395333
```

Arithmetic operations such as `+`

, `-`

, `*`

, and `^`

respect and exploit sparsity if the matrix operands are sparse. However, matrix operations may substantially decrease the amount of sparsity, a phenomenon known as **fill-in**.

Here is the adjacency matrix of a graph representing a small-world network, featuring connections to neighbors and a small number of distant contacts.

```
@load "smallworld.jld2" A
graphplot(A,linealpha=0.5)
```

Because each node connects to relatively few others, the adjacency matrix is quite sparse.

```
spy(A,title="Nonzero locations",m=2,color=:blues)
```

By Theorem 7.1.5, the entries of \(\mathbf{A}^k\) give the number of walks of length \(k\) between pairs of nodes, as with “*k* degrees of separation” within a social network. As \(k\) grows, the density of \(\mathbf{A}^k\) also grows.

```
plt = plot(layout=(1,3),legend=:none,size=(600,240))
for k in 2:4
spy!(A^k,subplot=k-1,color=:blues,
title=latexstring("\\mathbf{A}^$k"))
end
plt
```

Use

`nnz`

to count the number of nonzeros in a sparse matrix.