# The Galerkin method

## Contents

# 10.6. The Galerkin method¶

Using finite differences we defined a collocation method in which an approximation of the differential equation is required to hold at a finite set of nodes. In this section we present an alternative based on integration rather than differentiation. Our presentation will be limited to the linear BVP

However, we will assume that the linear problem is presented in the equivalent form

Such a transformation is always possible, at least in principle (see Exercise 3), and the case when \(u(a)\) and \(u(b)\) are nonzero can also be incorporated (see Exercise 2). The approach is also adaptable to Neumann conditions (see Exercise 5). As with finite differences, a nonlinear problem is typically solved by using a Newton iteration to create a sequence of linear problems.

## Weak formulation¶

Let (10.6.1) be multiplied by a generic function \(\psi(x)\) called the **test function**, and then integrate both sides in \(x\):

The last line above used an integration by parts.

We now make an important and convenient assumption about the test function. The first term in (10.6.2), consisting of boundary evaluations, disappears if we require that \(\psi(a)=\psi(b)=0\). Doing so leads to

which is known as the **weak form** of the differential equation (10.6.1).

Every solution of (10.6.1) (what we might now call the strong form of the problem) is a weak solution, but the converse is not always true. While the weak form might look odd, in many mathematical models it could be considered more fundamental than (10.6.1).

## Galerkin conditions¶

Our goal is to solve a finite-dimensional problem that approximates the weak form of the BVP. Let \(\phi_0,\phi_1,\ldots,\phi_m\) be linearly independent functions satisfying \(\phi_i(a)=\phi_i(b)=0\). If we require

then (10.6.3) becomes, after some rearrangement,

One way to satisfy this condition is to ensure that the term inside the brackets is zero for each possible value of \(i\), that is,

for \(i=1,\ldots,m\). The independence of the \(\phi_i\) furthermore guarantees that this is the *only* possibility, so we no longer need to consider the \(z_i\).

Now that we have approximated the weak form of the BVP by a finite set of constraints, the next step is to represent the approximate solution by a finite set as well. A natural choice is to approximate \(u(x)\) the same way as we did the test function \(\psi\), where the \(\phi_j\) form a basis for representing the solution:

Substituting (10.6.5) into (10.6.4) implies

for \(i=1,\ldots,m\). This rearranges easily into

still for each \(i=1,\ldots,m\). These are the **Galerkin conditions** defining a numerical solution. They follow entirely from the BVP and the choice of the \(\phi_i\).

The conditions (10.6.6) are a linear system of equations for the unknown coefficients \(w_j\). Define \(m\times m\) matrices \(\mathbf{K}\) and \(\mathbf{M}\), and the vector \(\mathbf{f}\), by

Then (10.6.6) is simply

The matrix \(\mathbf{K}\) is called the **stiffness matrix** and \(\mathbf{M}\) is called the **mass matrix**. By their definitions, they are symmetric. The last piece of the puzzle is to make some selection of \(\phi_1,\ldots,\phi_m\) and obtain a fully specified algorithm.

Suppose we are given \(-u''+4u=x\), with \(u(0)=u(\pi)=0\). We could choose the basis functions \(\phi_k=\sin(kx)\) for \(k=1,2,3\). Then

With some calculation (or computer algebra), we find

Upon solving the resulting diagonal linear system, the approximate solution is

## Finite elements¶

One useful and general choice for the \(\phi_i\) are the piecewise linear hat functions constructed in Section 5.2. As usual, we select nodes \(a=t_0 < t_1 < \cdots < t_n=b\). Also define

Then we set \(m=n-1\), and the \(\phi_i\) in (10.6.5) are

Recall that these functions are cardinal, i.e., \(H_i(t_i)=1\) and \(H_i(t_j)=0\) if \(i\neq j\). Hence

where as usual \(u_j\) is the value of the numerical solution at \(t_j\). Note that we omit \(H_0\) and \(H_n\), which equal one at the endpoints of the interval, because the boundary conditions on \(u\) render them irrelevant.

The importance of the hat function basis in the Galerkin method is that each one is nonzero in only two adjacent intervals. As a result, we shift the focus from integrations over the entire interval in (10.6.7) to integrations over each subinterval, \(I_k=[t_{k-1},t_k]\). Specifically, we use

Start with the first subinterval, \(I_1\). The only hat function that is nonzero over \(I_1\) is \(H_1(x)\). Thus the only integrals we need to consider over \(I_1\) have \(i=j=1\):

which contribute to the sums for \(K_{11}\), \(M_{11}\), and \(f_1\), respectively.

Before writing more formulas, we make one more very useful simplification. Unless the coefficient functions \(c(x)\), \(s(x)\), and \(f(x)\) specified in the problem are especially simple functions, the natural choice for evaluating all of the required integrals is numerical integration, say by the trapezoid formula. As it turns out, though, such integration is not really necessary. The fact that we have approximated the solution of the BVP by a piecewise linear interpolant makes the numerical method second-order accurate overall. It can be proven that the error is still second order if we replace each of the coefficient functions by a constant over \(I_k\), namely the average of the endpoint values:

Thus the integrals in (10.6.11) can be evaluated solely from the node locations. For instance,

Now consider interval \(I_2=[t_1,t_2]\). Here both \(H_1\) and \(H_2\) are nonzero, so there are contributions to all of the matrix elements \(K_{11}\), \(K_{12}=K_{21}\), \(K_{22}\), to \(M_{11}\), \(M_{12}=M_{21}\), \(M_{22}\), and to \(f_1\) and \(f_2\). Over \(I_2\) we have \(H_2'= h_2^{-1}\) and \(H_{1}'= -h_2^{-1}\). Hence the contributions to \(K_{11}\) and \(K_{22}\) in (10.6.11) are \(\overline{c}_2/h_2\), and the contributions to \(K_{12}=K_{21}\) are \(-\overline{c}_2/h_2\). We summarize the relationship by

where the squiggly arrow is meant to show that the values of the \(2\times 2\) matrix on the left are added to the appropriate submatrix of \(\mathbf{K}\) on the right. Similar expressions are obtained for contributions to \(\mathbf{M}\) and \(\mathbf{f}\) in (10.6.11); see below.

In general, over \(I_k\) for \(1<k<n\), we have \(H_k'= h_k^{-1}\) and \(H_{k-1}'=-h_k^{-1}\). The stiffness matrix contributions over \(I_k\) become

One finds the contributions to the other structures by similar computations:

and

The contribution from \(I_n\) affects just \(K_{n-1,n-1}\), \(M_{n-1,n-1}\), and \(f_{n-1}\), and it produces formulas similar to those for \(I_1\).

Each \(I_k\) contributes to four elements of each matrix and two of the vector \(\mathbf{f}\), except for \(I_1\) and \(I_n\), which each contribute to just one element of each matrix and \(\mathbf{f}\). The spatially localized contributions to the matrices characterize a **finite element method** (FEM). Putting together all of the contributions to (10.6.8) to form the complete algebraic system is often referred to as the *assembly* process.

## Implementation and convergence¶

Function 10.6.3 implements the piecewise linear FEM on the linear problem as posed in (10.6.3), using an equispaced grid. The code closely follows the description above.

**Piecewise linear finite elements for a linear BVP**

```
1"""
2 fem(c,s,f,a,b,n)
3
4Use a piecewise linear finite element method to solve a two-point
5boundary value problem. The ODE is (`c`(x)u')' + `s`(x)u = `f`(x) on
6the interval [`a`,`b`], and the boundary values are zero. The
7discretization uses `n` equal subintervals.
8
9Return vectors for the nodes and the values of u.
10"""
11function fem(c,s,f,a,b,n)
12 # Define the grid.
13 h = (b-a)/n
14 x = @. a + h*(0:n)
15
16 # Templates for the subinterval matrix and vector contributions.
17 Ke = [1 -1; -1 1]
18 Me = (1/6)*[2 1; 1 2]
19 fe = (1/2)*[1; 1]
20
21 # Evaluate coefficient functions and find average values.
22 cval = c.(x); cbar = (cval[1:n]+cval[2:n+1]) / 2;
23 sval = s.(x); sbar = (sval[1:n]+sval[2:n+1]) / 2;
24 fval = f.(x); fbar = (fval[1:n]+fval[2:n+1]) / 2;
25
26 # Assemble global system, one interval at a time.
27 K = zeros(n-1,n-1); M = zeros(n-1,n-1); f = zeros(n-1);
28 K[1,1] = cbar[1]/h; M[1,1] = sbar[1]*h/3; f[1] = fbar[1]*h/2;
29 K[n-1,n-1] = cbar[n]/h; M[n-1,n-1] = sbar[n]*h/3; f[n-1] = fbar[n]*h/2;
30 for k in 2:n-1
31 K[k-1:k,k-1:k] += (cbar[k]/h) * Ke
32 M[k-1:k,k-1:k] += (sbar[k]*h) * Me
33 f[k-1:k] += (fbar[k]*h) * fe
34 end
35
36 # Solve system for the interior values.
37 u = (K+M) \ f
38 u = [0; u; 0] # put the boundary values into the result
39 return x,u
40end
```

We solve the equation

in which

Here are the coefficient function definitions. Even though \(s\) is a constant, it has to be defined as a function for Function 10.6.3 to use it.

```
c = x -> x^2;
q = x -> 4;
f = x -> sin(π*x);
```

```
x,u = FNC.fem(c,q,f,0,1,50)
plot(x,u,label="",
xaxis=(L"x"),yaxis=(L"u"),title="Solution by finite elements")
```

Because piecewise linear interpolation on a uniform grid of size \(h\) is \(O(h^2)\) accurate, the accuracy of the FEM method based on linear interpolation as implemented here is similar to the second-order finite-difference method.

## Exercises¶

⌨ For each linear BVP, use Function 10.6.3 to solve the problem and plot the solution for \(n=40\). Then for each \(n=10,20,40,\ldots,640\), compute the norm of the error. Make a log-log convergence plot of error versus \(n\) and compare graphically to second-order convergence.

**(a)**\(-u''+u=-8 + 16 x^2 - x^4, \quad u(0) =u(2) =0\)Exact solution: \(x^2(4-x^2)\)

**(b)**\([(2+x)u']' +11x u = -e^x \left(12 x^3+7 x^2+1\right), \quad u(-1) =u(1) =0\)Exact solution: \(e^x \left(1-x^2\right)\)

**(c)**\(u''+x(u'+u) = -x[4 \sin(x)+5 x \cos(x)], \quad u(0) =u(2\pi) =0\)Exact solution: \(-x^2\sin(x)\)

✍ Suppose you want to solve the differential equation \(-[c(x)u']'+d(x)u = f(x)\), as in Equation (10.6.1), except with the boundary conditions \(u(a)=\alpha\), \(u(b)=\beta\). Find constants \(p\) and \(q\) such that if \(v(x)=u(x)+px+q\), then \(v\) satisfies the same BVP, except that \(v(a)=v(b)=0\) and \(f\) is replaced by a different function.

✍ Suppose \(p(x)u''(x)+q(x)u'(x)+r(x)=0\), and assume that \(p(x)\neq 0\) for all \(x\) in \([a,b]\). Let \(z(x)\) be any function satisfying \(z'=q/p\). Show that the differential equation is equivalent to one in the form (10.6.1), and find the functions \(c(x)\), \(d(x)\), and \(f(x)\) in that equation. (Hint: Start by multiplying through the equation by \(\exp(z)\).)

✍ Derive (10.6.13), starting from the mass matrix definition in (10.6.11). You should replace \(s(x)\) by the constant \(\overline{s}_k\) within interval \(I_k\).

Suppose the Dirichlet boundary conditions \(u(a)=u(b)=0\) are replaced by the homogeneous Neumann conditions \(u'(a)=u'(b)=0\).

**(a)**✍ Explain why the weak form (10.6.3) can be derived without any boundary conditions on the test function \(\psi\).**(b)**⌨ The result of part (a) suggests replacing (10.6.10) with\[ u(x) = \sum_{j=0}^{n} u_j H_j(x) \]and making (10.6.11) hold for all \(i,j\) from \(0\) to \(n\). Modify Function 10.6.3 to do this and thereby solve the Neumann problem. (Note that \(I_1\) and \(I_n\) now each make multiple contributions, like all the other integration subintervals.)

**(c)**⌨ Test your function on the problem\[ u''+u = -2 \sin(x), \quad u'(0)=u'(1)=0, \]whose exact solution is \((x-1)\cos(x) - \sin(x)\). Show second-order convergence.