Simple Integration

As of March 2020, School of Haskell has been switched to read-only mode.

For the sake of smoothness :), we start by 'formalizing' the notion of definite integration. If you haven't seen my previous tutorial on 'Representation, Operations, and Semantics', be aware that this 'formalization' is an application of the concepts discussed in 'ROS'.

The idea is to take you through my own journey in Computational Physics. In this specific tutorial, you will see how I took the concepts from a Calculus 101 course, and turned it into a 'Computational Idea', which I then implemented in Haskell.

You do not have to do this in Haskell, but I recommend it. Even if you plan to work in 'C' or 'FORTRAN', I believe (that is the reason we are here) Haskell can teach you more about 'Computational Ideas' and their relationship to 'Mathematical Ideas' (and later, we will discuss 'Physical Ideas') than any other languages out there. Also, Haskell allows you to write efficient programs, which is important in Computational Physics. Furthermore, I will teach you (if you keep coming back) what optimization is, and how to do it smartly.

But enough blabla, let's dive in.

From Exact to Approximate Integration

Let us consider definite integrals.

integral f a b

How can we make sense of this concept in a 'computational' manner?

In mathematics, integral is a function that takes a function as input, an interval, and returns the area under the curve.

In a computational context, integral is likely to be an Operation rather than a Representation.

So, our task is to find a 'computational idea' where integral is an operation.

Exercise: How would you solve this problem? How would you represent real numbers? How would you represent a function from the real to the real? How would you represent an interval?

Solution: Two for the price of One

Note about higher-order Functions

Take-away from our first two tries

So you notice that these two notions of integrations are different. (I'm talking about the Solution to the first Exercise)

Clearly, the first integral really corresponds directly to the mathematical idea of an exact (definite) integral.

Given a function f, and an interval (a,b), then there is really only one exact integral in math, and one exact integral in the first representation.

However, the second representation offers many representations of the same function. (you are free to choose how many point you use to represent your function)

for instance:


f x = x
a = 0
b = 1

fs1 = [0, 1]      --representing f with 2 points on the interval (a,b)
fs2 = [0, 0.5, 1] --................... 3 ...........................

xs1 = [0, 1]      -- the corresponding variables
xs2 = [0, 0.5, 1]

So what is essential here?

Let us suppose that we always evaluate points at constant increments.

Now, the only freedom we have is the number of points contained on our interval.

Thus, a more useful representation than 1) for approximate integration is

A third representation; Now, we talk about approximate integration.


type Func = Double -> Double
data Interval = I (Double,Double) Int

integral :: Func -> Interval -> Double
integral f (I (a,b) n) = ...

This is a representation of a discrete interval, which now makes the two representations (2 and 3) equivalent.

However, it is possible to define integration without the constraint of constant increment.

More: The second and third representation are compatible

Simple Rules

OK, now that we really really know what an integral is, let us look at various algorithms for integration.

For the moment (we will come back when we talk about Stochastic Methods), the mathematical form of algorithms to compute integrals has the following form:


integral f xs = sum [f x_i * w x_i | x_i <- xs]
--or
integral' fs xs = sum $ zipWith (*) ws fs

However, this form doesn't tell us about w or ws. Intuitively, you can think of an integral as a sum, where the ws are the weights of each of the fs.

Note: why we work with the 2nd rep. from now on.

Left rule

This is the simplest way to integrate;

  1. we split the interval (a,b) in (n-1) slices of equal length h = (b - a)/(n-1),
  2. we approximate the function on each of those intervals by a constant.

What is n again?

In the case of the Left rule, the first interval will be approximated by f a, the second by f (a + h), and so forth. Thus, f b is never evaluated.

Formally,


integral f (I (a,b) n) = let h = (b - a)/(n-1)
                             xs = map (\i -> a + i * h) [0..n-2]
                         in sum [h * f x_i | x_i <- xs]

To help us later, we want to define w, the weight function.

wLeft h = h -- w is a constant, always h ... for the first n - 1 points.

This is not quite right; we would like to be explicit about the fact that if we were to include b in xs, then w would be 0.

--here, number is the position in the sum starting at 1.
wLeft h n number | (number == n) = 0
                 | (number >= 1 && number < n) = h
                 | otherwise = error "number is not in range with n"
                 

Therefore, we can have something like:

integral f (I (a,b) n) = let h = (b - a)/(n-1)
                             xs = toXs (I (a,b) n)
                             fs = map f xs
                             ws = map (wLeft h n) [1..n] 
                         in sum $ zipWith (*) fs ws

-- If we already have the list of function values from somewhere else. 
integralOfList fs (I (a,b) n) = let h = (b - a)/(n-1)
                                    ws = map (wLeft h n) [1..n] 
                                in sum $ zipWith (*) fs ws                         

We see that the structure is rather independent of the specific choice of wLeft.

Why bother with integralOfList? (First Reason)

Why bother with integralOfList? (Second Reason)

This method---although simple---is never used, because the error is proportional to 1/n. Let's do better.

Trapezoidal rule

Before, we approximated the function locally as a constant. Now, we will do just a tiny bit better, but the error will be proportional to 1/n^2 (still bad, but much better than 1/n).

The idea is that for each slice of x, you approximate f by a linear function.

This gives you (on a single interval)

integral f [x_0, x_1] = let h = x_1 - x_0
                        in h * (f x_0 + f x_1)/2                       

The name of the rule comes from the formula above (the area of each slice is that of a trapezoid).

Exercise: 1) Draw a trapezoid. 2) What shape corresponds to the Left rule?

If you carry this rule through for a whole interval (by summing over all slices), you get the following:

integral [f1,f2,..,fN] (I (a,b) n) = let h = (b - a)/ fromIntegral (n - 1)
                                     in h*f1/2 + h*f2 + .. + h*fN/2

Why different weights for the edges?

So basically, we only need to define w for the trapezoidal rule in accordance to this, and we are good.

--here, number is the position in the sum starting at 1.
wTrap h n number | (number == 1 || number == n) = h/2
                 | (number > 1 && number < n) = h
                 | otherwise = error "number is not in range with n"
                 

Now, the full integral formula:

integralOfList fs (I (a,b) n) = let h = (b - a)/(n-1)
                                    ws = map (wTrap h n) [1..n] 
                                in sum $ zipWith (*) fs ws 

The only difference between this one and the one for the Left rule is the weight function.

Common Patterns?

Simpson's Rule

This method only works if the number of points n is an odd number.

The idea is to approximate the function by a quadratic in each slice of three points; it amounts to this:


f x = f x_0 + 
    (x - x_0) * (f (x_0 + h) - f (x_0 - h))/ (2*h) + 
    (x - x_0)**2 / 2 * (f (x_0 + h) - 2 * f x_0 + f (x_0 - h))/ h**2 +
    O (h**3)

--where ** means exponentiation;
-- and O means `Big-oh notation'. 
-- It means that what is left, 
-- after we expanded the first 
-- Taylor terms, is proportional to `h**3`.

This gives you:


integral f [x_0, x_1, x_2] = let h = (x_2 - x_0) / 2
                             in h * (f x_0 + 4 * f x_1 + f x_2)/3 
                        

If you are skeptical, try to integrate f x as approximated above with your knowledge of mathematical (exact) integration. Remember, a quadratic can be integrated exactly.

If you carry this rule through for a whole interval (by summing all the slices), you get the following:

integral [f1,f2,..,fN] (I (a,b) n) = let h = (b - a)/ fromIntegral (n - 1)
                                     in (1/3) * h*f1 + 
                                        (4/3) * h*f2 +
                                        (2/3) * h*f3 +
                                        (4/3) * h*f4 +
                                        .. + 
                                        (1/3) * h*fN

So basically, we only need to define w for the Simpson's rule in accordance to this, and we are good.

--here, number is the position in the sum starting at 1.
wSimp h n number | (number == 1 || number == n) = (1/3)*h
                 | (number > 1 && number < n && even number) = (4/3)*h
                 | (number > 1 && number < n && odd number) = (2/3)*h
                 | otherwise = error "number is not in range with n"
                 

The full integral formula is:

intSimp fs (I (a,b) n) = if odd n 
                         then integralOfList wSimp fs (I (a,b) n)
                         else error "The Simpson's rule only works with an odd number of points; the Interval given had an even n."

Exercise: Use both the Simpson's rule and the Trapezoidal rule to create an integration which never fails due to even n, and has best accuracy.

The Simpson's rule makes the error go as 1/n^4.

A little Problem just for You

The error function is defined exactly by:


erf x = 2 / sqrt pi * integral (\y -> exp (-y**2)) (0,x)

Now, suppose we fix x=1,

I am telling you that: erf 1 = 0.842700792949715.

Your mission---should you choose to accept it---is to use the three methods we saw 1. Left 2. Trapezoidal 3. Simpson to investigate.

More precisely, for each of these methods, you can define the relative error as a function of n:

--this is not actual code; part of your mission is to define relative error properly.

relError n = abs (methodAnswer n - exactAnswer) / exactAnswer

Then, you must:

  1. Compute this function for all three methods for n = 3, 5, 9, 16, 32, 64, 128.

  2. Make a table, or plot your results.

  3. Compare the three methods.

  4. (Very important) Have fun.

  5. Hold your breath for the upcoming Gaussian Quadrature in 'Advanced Integration'.

Note on Performance.

As we will see later, the Gaussian quadrature is much better than Simpson's rule, but suppose we didn't know about such quadrature; how could we improve the performance of Simpson's rule?

So far, we used lists to represent our function. The compiler does an amazing job for you when you're working with lists, but (this is completely irrelevant now, and we will come back to it when it becomes relevant) lists are not the best representation for things of fixed length.

Because our integrator doesn't return a list (in fact, it doesn't have to take a list as input to begin with), we could replace lists by some similar representation.

A better representation, which has the same interface (has the same name for similar functions) and operations, is the Vector representation. If you are a performance junky, I challenge you to implement Simpson's rule without using any lists; instead, use Vector from Data.Vector.Unboxed. You can click on the green word for more info on Data.Vector.

A few thoughts on Optimization (we will come back to it)