Built-in errors: William Kahan and floating point arithmetic
Computers. We all know that inside, there are just (finitely many) little zeroes and ones being manipulated. Which means that there's a fundamental mismatch between real numbers, they of the generally infinite tail of decimal places, and the fact that computers (obviously) can only store a finite number of decimal places.
But we generally don't bother. After all, some smart person has implemented floating point arithmetic on our computer, no? And that can't be all that hard a problem. Basic, really. And we usually keep more decimal places around than we even need for our calculations, so we'll be on the safe side.
Except that William Kahan tells us that's wrong. And since he's the "father of the floating point", helping to create the relevant specifications (1, 2, 3), it's not so easy to just dismiss what he says.
Let's start with the fun part. You have Excel on your computer?
Good. Try the following, one of a number of examples that Kahan writes about in this set of slides here (this example p.3):
Use Excel to compute V=4/3.
Set your accuracy (Format -> Cell) to 20 digits or so, scientific notation.
You will get 1.33333333333333000E+00, in other words: 14 threes as decimal places, then zeros.
Obviously, that's the accuracy used for the calculation: 14 decimal placees.
Now compute W=V-1.
Wait a minute: 3.33333333333333000E-01.
Where did the extra 3 come from?
Something as simple as removing the initial 1 has - what? Changed the accuracy? Magically turned a digit 0 into a digit 3? Certainly, this is not according to the rules of arithmetic. Kahan talks about "cosmetic rounding".
Now, let's compute X = W*3.
Shouldn't that be 9.99999999999999000E-01, with the limited accuracy we've seen in the previous examples?
Let's remove the initial one, Y = X-1, to be sure:
Yep, all the .9999... has gone away.
To be sure, let's multiply by a large number, Z = Y*252:
Yep, definitely gone.
To sum up,
(4/3 - 1)*3 - 1 = 0.00000000000000000E+00.
Except... wait for it: except if you enclose it in parentheses.
Then you get, ta-da:
((4/3 - 1)*3 - 1) = -2.22044604925031000000E-16.
But not if you multiply that by 252, because then it's suddenly
((4/3 - 1)*3 - 1)*2^52 = -1.00000000000000000000E+00.
Quoting Kahan: "Excel's arithmetic is weird." Underneath the hood, computers are apparently doing some very curious things.
Here is a sample spreadsheet with these examples that works for me, using Excel for Mac 2008, version 12.1.0.
That, of course, leaves open the question: Is it relevant?
Kahan claims that "scientists and engineers are almost all unaware ...
• ... of how high is the incidence of misleadingly inaccurate computed results.
• ... of how necessary is the investigation of every suspicious computed result as a potential harbinger of substantially worse to come.
• ... of the potential availability of software tools that would reduce those investigations’ costs in expertise and time by orders of magnitude.
• ... that these tools will remain unavailable unless producers of software development systems (languages, compilers, debuggers) know these tools are in demand."
If all goes as planned, I'll have a chance to interview Kahan on Tuesday. A good opportunity to ask about the specific dangers, and what to do about it. I'll also ask Kahan, since he seems to be somewhat caller-in-the-desert-like, what it takes to convince those people who could do something to do something. It's one thing knowing there is a problem - it's quite another to get people to fix it. I've known this for the big problems such as, say, climate change. But apparently, it also applies to computers manipulating their ones and zeroes in unexpected ways.