10:38 PM
Last time, I said I couldn't figure out how to justify the discussion of g(x) for fixed-point iteration. I just realized this was actually discussed in the class, but I was too lost at the time to figure it out.
Back up for a moment to consider an example of bisection. For example, compute the square root of A, where A is some non-negative number. The square root of A is a root of the function f(x)=x2−A. We are looking for the positive number r which satisfies f(r)=0. Let's be lazy and note that r must be between 1 and A (even if A<1), so we can start with 1 and A as the endpoints of the bracket and start iterating. With each iteration, one endpoint of the bracket will move, and xi, which is the midpoint of the bracket and our best guess for r, will converge towards r. Three iterations increases the accuracy of xi by about one decimal place, so if we need a lot of decimal places it might take a while.
There are a couple of things to think about if we want a method that converges faster than bisection. The first is that we are only moving one endpoint at a time. The second is that we know f(x) in this case, and we aren't using that information. This suggests the possibility of moving both endpoints with each iteration by using what we know about f(x). For f(x)=x2−A, if x<r then r<A/x and vice versa. Therefore x and A/x bracket r. So if we start with a bracket with endpoints a and b, we can get a new bracket with endpoints x=(a+b)/2 and A/x.
When we try this approach to find r, we find that the number of accurate digits in our approximation xi approximately doubles with each iteration. This is way faster than basic bisection, which requires 3 iterations just to get one additional digit of accuracy. In addition, since our two endpoints are x and A/x, we only have to keep track of one number in each iteration, as opposed to the two numbers required for bisection. This leads to the iteration formula xi+1=(xi+A/xi)/2=g(xi).
Applying this idea of bisection but with moving both endpoints to cube roots (f(x)=x3−A) leads to a guess of an iterating function g(x)=(x+A/x2)/2. However, this iterating function gives performance not much better than bisection, which prompts a general discussion of fixed-point iteration and the demonstration that if 0<|g′(r)|<1, fixed-point iteration has linear convergence, but if g′(r)=0, fixed-point iteration has quadratic convergence.
Having established that fixed-point iteration gives quadratic convergence if g′(r)=0, we can ask how to determine g(x) from a given f(x), which leads to Newton's method. However, I think conceptually Newton's method makes more sense geometrically, so I would introduce it geometrically, then show that it meets the criteria of fixed-point convergence with g′(r)=0 as long as f′(r) is not also zero and f″(r) is not infinite.
4:32 PM
My policy of being really lazy about writing equations and using images is likely to severely reduce the comprehensibility of this post. This post is screaming out for some graphs, but I'm going to try to do everything in plain text anyway. Depending on how badly this goes, I may work on figuring out how to include graphs in the future.
A quick review from last time: Numerical Analysis is the process of computing approximate values for mathematical expressions that are difficult or impossible to evaluate exactly. Last time we looked at approximating ex as a Taylor series expansion. Computing individual terms can be done with a few multiplications and additions, and a finite number of terms can be summed to approximate the true value of ex.
There are two important limitations here: first, the uncertainty, or error, of the approximation depends on the number of terms in the Taylor series sum. The uncertainty can be reduced by computing more terms, meaning there is a tradeoff between the speed of the computation and the precision of the computation. Second, because of the way that computers represent numbers, rounding errors are introduced into all computations. Depending on the problem being solved and how the solution is computed, this can greatly reduce the accuracy of the computation regardless of the number of terms used in the computation.
This time, we are looking at a different type of problem. Given a function f(x), how do we determine the roots, or the values of x such that f(x)=0? We will look at three approaches, each with different advantages and disadvantages.
Bisection
The bisection method of finding roots is really simple and reliable, at the expense of not being very fast. The idea is this: Suppose that you know that for a function f(x), a root r exists such that f(r)=0. Suppose also that you know two numbers, a and b, such that f(a) is a negative number and f(b) is a positive number. As long as f(x) is continuous, r must be between a and b. (If you don't believe me, draw a picture. The point (a,f(a)) is below the x-axis and the point (b,f(b)) is above the x-axis. f(x) can be any curved or straight line that connects the two points, as long as f(x) has exactly one value for every x and there are no jumps in f(x). f(x) must cross the x-axis between a and b, and the value of x where it crosses is r. Note that this is true whether a<b or a>b.)
This is kind of nifty, in that we basically haven't done anything at all, and we already have an estimate of r. If we guess r as (a+b)/2, since we know that r must be between a and b, the maximum error in our guess is |b−a|/2. Suppose that we need a better guess. Then we can set c to the midpoint of a and b, or (a+b)/2, and find f(c). If f(c) is negative, replace a with c. If f(c) is positive, replace b with c. (If f(c)=0, then c=r and we are done, but we should not expect that to happen.) With our new a and b, we can be certain that r is between a and b, and a and b are now closer together (half the distance, in fact), so our estimate of r is better.
We now have a way to get an accurate estimate of r to any desired accuracy. If we start with an interval between a and b that contains r, by repeating the process of finding c, the midpoint of a and b, and setting it as one of the endpoints of the interval, the interval will get progressively smaller. In fact, the interval is half as large with each iteration, since c bisects the interval. Eventually it will be small enough to be an accurate estimate of r, regardless of how large the starting interval was.
So, the good news is that bisection always works. Given that f(x) is continuous and has a root r, bisection will always converge to r. (Note that if f(x) has multiple roots, it may not converge to the root you want.) You don't even need to know what the function f(x) actually is, as long as you can find the value of f(x) for any given x.
There are two pieces of bad news. First, you need the starting points a and b. The function x2 obviously has a root at 0, but you can't use bisection to find it because there is no starting value of a such that a2 is a negative number. If a function has multiple roots and you are interested in one in particular, it may be difficult to find starting values of a and b that will converge to the desired root.
The second piece of bad news is that bisection is slow. The error, or uncertainty, in r decreases by 1/2 with each iteration, so if your starting error is 1, then 10 iterations are required to get 3 decimal places of accuracy. This may not seem like a big deal, but if f(x) takes a long time to compute and you need many digits of accuracy, bisection may be too slow to be useful.
Fixed-Point Iteration
I haven't figured out how to justify Fixed-Point Iteration. The problem we are trying to solve is how to find roots of the function f(x), or values r such that f(r)=0. Fixed-point iteration, however, is based around the equation g(x)=x, for the function g(x). If you have a function g(x), and a value of x = r such that g(r)=r, then r is a root of the function f(x) where f(x)=g(x)−x. And if you start with the equation f(x)=0, then it can be rewritten in a variety of ways to get an equation of the form g(x)=x, but it's hard to argue why you would want to, or given multiple choices for how to rewrite f(x)=0 as g(x)=x, why you would choose a particular form for g(x).
That said, let's go ahead and look at g(x)=x anyway. The premise behind that equation is either neat or pointless depending on your perspective. The idea is that you start with a number x, do a bunch of things to that number, and get the same number x out at the end. If you had g(x) programmed into your calculator, you could enter the value x, compute g(x), take the result of that calculation and calculate g(x) again, and keep doing that as long as you want and the number you entered originally would never change. I'm assuming that if you think that's a colossal waste of time that you've already lost interest in this blog and moved on, so if you're still reading, you're like me and think that this is pretty neat.
Let's look at an example. x2 is a standard calculator function. If you type either 0 or 1 into your calculator and then start repeatedly tapping the x2 button, absolutely nothing will happen. This means that 0 and 1 are roots of the function f(x)=g(x)−x=x2−x. 0 and 1 are also the fixed points of the function g(x). In general, a fixed point r of a function g(x) is a value of x such that g(r)=r.
The interesting question is what happens when g(x)≠x. If you start with a number between 0 and 1, square it, square the result, and continue indefinitely, the result will converge towards 0. If you start with a number greater than 1 and square it repeatedly, the number will grow towards infinity. This suggests that by rewriting the equation f(x)=0 in the form g(x)=x, plugging a starting value x0 into g(x) and then plugging the result back into g(x) repeatedly may converge on the solution g(r)=r. In this case, r is a root of f(x). But then again, it may not converge. In the case of g(x)=x2, repeated iteration converges on the solution g(0)=0, but will not converge on the solution g(1)=1.
So, when is this useful? This is fairly easy to visualize. Draw a graph with the diagonal line y=x. Now imagine a function g(x) crossing the line y=x. The point where it crosses is the fixed point r we are trying to find. There are several possibilities for what g(x) could look like. If g(x) is rising as x increases, then it could be rising faster than the line y=x. In that case, when x<r then g(x) is below the line y=x and when x>r then g(x) is above the line. If g(x) is rising but more slowly than y=x, g(x) will be above the line when x<r and below the line when x>r. (Again, the words may be hard to understand, but if you draw it, it will be clear.) In the case where g(x) is falling, it's a little more complicated but the important question to keep in mind is whether it is falling slowly or quickly.
Repeatedly calculating new values for x can be easily visualized on this graph by creating a cobweb graph. Start at the point (x0,0). Draw a vertical line to the function g(x). It doesn't matter whether the line is going up or down, just that it's vertical. The point where the line intersects g(x) is the point (x0, g(x0)). Next, move horizontally to the line y=x. Again, it doesn't matter whether you are moving to the left or to the right. The point where the horizontal line intersects the line y=x is the point (g(x0),g(x0)). If we label g(x0) as x1, this gives a new starting point. From (x1,x1), move vertically to g(x), at the point (x1,g(x1)). Then move horizontally to the line y=x. This pair of vertical and horizontal moves can be repeated indefinitely.
Depending on what g(x) looks like, there are 4 basic paths that this cobweb can take. If g(x) is rising, then the cobweb will be a staircase. If g(x) is rising slowly, the staircase will move toward the fixed point. If g(x) is rising quickly, the staircase will move away from the fixed point. (It doesn't matter whether x0 is greater or less than r. In either case, the cobweb will move towards r if g(x) is rising slowly and away if g(x) is rising quickly. Don't trust me. Draw it!) If g(x) is falling, then the cobweb will follow a spiral path. Like the rising case, if it is falling slowly, it will spiral towards the fixed point, and if it is falling quickly, it will spiral away.
So that's what's going visually. What's going on mathematically? g(x) rising or falling slowly or quickly sounds vague, but it can easily be made mathematically precise. Rising or falling slowly means that |g′(x)|<1, while rising or falling quickly means that |g′(x)|>1. In the rising case, this is pretty obviously true from the fact that the slope of y=x is 1 and since g(x) is either rising faster (slope greater than 1) or slower (slope less than 1) than y=x. The falling case is basically a mirror image and the comparison line is a line with slope −1 through the point (r, g(r)).
So, now we know when this is useful. If we want to find a root of f(x), we should rewrite the equation f(x)=0 to look like g(x)=x, such that |g′(x)|<1 near the root. We can then start with a guess x0 for the root and repeatedly calculate g(x) starting with the results of the previous calculation, and x will converge to the root r. There is an important question here: how do you rewrite f(x)=0 so that |g′(x)|<1 near the root of f(x)? Newton's method, which is described below, will give one answer.
Leaving that question aside, assuming you have the function g(x), how fast is it? Starting with the guess x0, the error is |x0−r|. But of course we don't know r, so we can't use that to put a bound on the error. But we can estimate the error if we know g′(x). Suppose we know that 0≤g′(x)≤m<1 for some constant m for all values of x between x0 and r. (It's fair to ask how we can know m if we don't know r, but at least in some cases we can estimate it, and this is a worst case estimate anyway. We hope that Fixed-Point Iteration will work better than this.) Then if we draw a straight line with slope m through the point (x0,g(x0)), this line will intersect the line y=x at some point, and r will lie between x0 and the point of intersection. Note that if m>1, r will not be between the intersection and x0, and if m=1 the lines will not intersect at all, so this approach is only valid if m<1. (Again, this is using lots of words to describe something easy to see with a picture.) This point of intersection gives an upper bound on the range of possible values for r, and thus an upper bound on the error |r−x0|. The x value at the intersection point xint=(g(x0)−mx0)/(1−m), so the error is |g(x0)−x0|/(1−m).
We can use this to estimate the change in the error in each iteration. Starting from the point (x0,g(x0)), we can set x1=g(x0) and compute the new point (x1,g(x1)). In the worst case, g′(x)=m for all points between x0 and x1. Then g(x1)=(1+m)x1−mx0 and therefore the error is |mx1−mx0|/(1−m). In other words, the error in x1 is at most the error in x0 multiplied by m. As long as m is less than 1, the error will decrease with every iteration. If we are lucky, m will also decrease with every iteration, meaning the closer we get to r, the faster we get closer.
In the case that g(x) is a decreasing function, the logic is a little more complicated since the cobweb path is more complicated, but the same basic conclusion holds. For any function g(x), if |g′(x)|≤m<1 for all x near r and including x0, the maximum error in the estimate of r decreases by a factor of m with each iteration. Obviously, if we want to use fixed-point iteration to find a root of f(x), we should try to rewrite the equation f(x)=0 in the form g(x)=x so that g′(x) is as close to zero as possible around to the root. Again, Newton's method, described below, can be thought of as an application of fixed-point iteration that generates g(x) such that g′(x) is near zero when x is close to r.
We can now compare fixed-point iteration with bisection. For any function f(x) with a root r, bisection will converge to a root given two starting points a and b such that f(a)<0 and f(b)>0. In comparison, fixed-point iteration only requires one starting point, x0, but only converges if the equation f(x)=0 can be rewritten in the form g(x)=x, with |g′(x)|<1 near r. The maximum error in the estimate of r is exactly known in the case of bisection, and the maximum error decreases by a factor of 1/2 with every iteration. For fixed-point iteration, the maximum error is only known if m≥|g′(x)| for x near r is known. The error decreases by a factor of m with each iteration, which means that fixed-point iteration can be potentially much faster or much slower (or not work at all) than bisection depending on g(x).
Newton's method
The thing about bisection is that it is dumb. Given the two starting points, our next estimate is always the midpoint of the two points. If we know something about the shape of the function, we should be able to make smarter guesses about the value of r than just taking the midpoint. Newton's method is one way of making smarter guesses.
Like with fixed-point iteration, Newton's method starts with a single point, x0, but Newton's method works directly with f(x). Also, while fixed-point iteration can be computed with g(x), but g′(x) is necessary to compute the error in the estimate, Newton's method requires computing f′(x).
Start with x0, and find the point (x0,f(x0)). Next, find f′(x0) and use it to draw a line through the point (x0,f(x0)) which is tangent to f(x). The point where this line crosses the x-axis is the estimate of the root r, and this process can be repeated to get better estimates of r. The new point x1 has the value x0−f(x0)/f′(x0). Conceptually what this process does is start at the point (x0,f(x0)) and then follows along the curve f(x) towards the point (r,f(r)) (where f(r)=0). It does this by approximating f(x) as a straight line, which has the advantage of being fast, but the disadvantage that for some functions f(x), this is not a very good approximation.
If we define g(x)=x−f(x)/f′(x), then we can see that Newton's method is an example of fixed-point iteration. The error and rate of convergence of fixed-point iteration are dependent on g′(x), which is equal to 1−((f′(x))2−f(x)f″(x))/(f′(x))2. When x is close to r, f(x) is close to 0, so g′(x) is close to 0 if f′(x) is not close to zero and f″(x) is not approaching ∞. In fixed-point iteration, the error in x1 is equal or less than the error in x0 multiplied by g′(x) for some x between x0 and r. Since g′(x) is close to 0 when x is close to r (as long as f′(x) is not close to zero and f″(x) does not approach ∞), this implies that Newton's method converges extremely quickly.
If we define e1 as the error in x1 (=|r−x1|) and e0 as the error in x0, then for fixed point iteration, e1/e0≈g′(r) when x0 is near r. In the case of Newton's method, g′(r)=0, which implies that e1/e0≈0. This is great, but not very specific. To be a bit more specific about what ≈0 means, we can look at e1/(e0)2. The Taylor series approximation of f(x) tells us that f(r)=f(x0)+(r−x0)f′(x0)+(r−x0)2f″(c)/2, where c is some point between x0 and r. Since f(r)=0, we can rewrite the Taylor series approximation as x0−f(x0)/f′(x0)−r=(r−x0)2f″(c)/2f′(x0). However, x0−f(x0)/f′(x0)=x1, |r−x1|=e1, and |r−x0|=e0. Therefore e1=(e0)2|f″(c)/2f′(x0)|.
This means that for bisection and for fixed-point iteration in general, e1=ke0 for some k. As long as k is less than 1, this will converge to r steadily, but relatively slowly. (Convergence of this form is called linear convergence.) However, for fixed-point iteration when g′(r)=0, including Newton's method when f′(r)≠0 and f″(r)≠∞, e1=k(e0)2 for some k. This means that when e is small, it gets much smaller with each iteration, and x converges towards r much faster than the other methods. (Convergence of this form is known as quadratic convergence.)
Newton's method requires knowing f′(x), which means that it requires more information about f(x) than bisection and fixed-point iteration, which only require f(x). Like fixed-point iteration, it only requires a single starting value for x. Also like fixed-point iteration, it can fail. While fixed-point iteration fails if |g′(r)|≥1, Newton's method fails if f′(r)=0 or f″(r)=∞. Like fixed-point iteration, it can be difficult to estimate the error. (The maximum error is dependent on the maximum f″(x) between r and x0.) However, when it converges, Newton's method converges quadratically, meaning that many fewer iterations are required to reach the desired accuracy than the linear convergence of bisection or fixed-point iteration.