Maths ideas from John Allen Paulos

There’s always enough random success to justify anything to someone who wants to believe.
(Innumeracy, p.33)

It’s easier and more natural to react emotionally than it is to deal dispassionately with statistics or, for that matter, with fractions, percentages and decimals.
(A Mathematician Reads the Newspaper p.81)

I’ve just read two of John Allen Paulos’s popular books about maths, A Mathematician Reads the Newspaper: Making Sense of the Numbers in the Headlines (1995) and Innumeracy: Mathematical Illiteracy and Its Consequences (1998).

My reviews tended to focus on the psychological, logical and cognitive errors which Paulos finds so distressingly common on modern TV and in newspapers, among politicians and commentators, and in every walk of life. I focused on these for the simple reason that I didn’t understand the way he explained most of his mathematical arguments.

I also criticised a bit the style and presentation of the books, which I found meandering, haphazard and so quite difficult to follow, specially since he was packing in so many difficult mathematical concepts.

Looking back at my reviews I realise I spent so much time complaining that I missed out promoting and explaining large chunks of the mathematical concepts he describes (sometimes at length, sometimes only in throwaway references).

This blog post is designed to give a list and definitions of the mathematical principles which John Allen Paulos describes and explains in these two books.

They concepts appear, in the list below, in the same order as they crop up in the books.

1. Innumeracy: Mathematical Illiteracy and Its Consequences (1988)

The multiplication principle If some choice can be made in M different ways and some subsequent choice can be made in B different ways, then there are M x N different ways the choices can be made in succession. If a woman has 5 blouses and 3 skirts she has 5 x 3 = 15 possible combinations. If I roll two dice, there are 6 x 6 = 36 possible combinations.

If, however, I want the second category to exclude the option which occurred in the first category, the second number is reduced by one. If I roll two dice, there are 6 x 6 = 36 possible combinations. But the number of outcomes where the number on the second die differs from the first one is 6 x 5. The number of outcomes where the faces of three dice differ is 6 x 5 x 4.

If two events are independent in the sense that the outcome of one event has no influence on the outcome of the other, then the probability that they will both occur is computed by calculating the probabilities of the individual events. The probability of getting two head sin two flips of a coin is ½ x ½ = ¼ which can be written (½)². The probability of five heads in a row is (½)5.

The probability that an event doesn’t occur is 1 minus the probability that it will occur. If there’s a 20% chance of rain, there’s an 80% chance it won’t rain. Since a 20% chance can also be expressed as 0.2, we can say there is a 0.2 chance it will rain and a 1 – 0.2 = 0.8 chance it won’t rain.

Binomial probability distribution arises whenever a procedure or trial may result in a ‘success’ or ‘failure’ and we are interested in the probability of obtaining R successes from N trials.

Dirichlet’s Box Principle aka the pigeonhole principle Given n boxes and m>n objects, at least one box must contain more than one object. If the postman has 21 letters to deliver to 20 addresses he knows that at least one address will get two letters.

Expected value The expected value of a quantity is the average of its values weighted according to their probabilities. If a quarter of the time a quantity equals 2, a third of the time it equals 6, another third of the time it equals 15, and the remaining twelfth of the time it equals 54, then its expected value is 12. (2 x ¼) + (6 x 1/3) + (15 x 1/3) + (54 x 1/12) = 12.

Conditional probability Unless the events A and B are independent, the probability of A is different from the probability of A given that B has occurred. If the event of interest is A and the event B is known or assumed to have occurred, ‘the conditional probability of A given B’, or ‘the probability of A under the condition B’, is usually written as P(A | B), or sometimes PB(A) or P(A / B).

For example, the probability that any given person has a cough on any given day may be only 5%. But if we know that the person has a cold, then they are much more likely to have a cough. The conditional probability of someone with a cold having a cough might be 75%. So the probability of any member of the public having a cough is 5%, but the probability of any member of the public who has a cold having a cough is 75%. P(Cough) = 5%; P(Cough | Sick) = 75%

The law of large numbers is a principle of probability according to which the frequencies of events with the same likelihood of occurrence even out, given enough trials or instances. As the number of experiments increases, the actual ratio of outcomes will converge on the theoretical, or expected, ratio of outcomes.

For example, if a fair coin (where heads and tails come up equally often) is tossed 1,000,000 times, about half of the tosses will come up heads, and half will come up tails. The heads-to-tails ratio will be extremely close to 1:1. However, if the same coin is tossed only 10 times, the ratio will likely not be 1:1, and in fact might come out far different, say 3:7 or even 0:10.

The gambler’s fallacy a misunderstanding of probability: the mistaken belief that because a coin has come up heads a number of times in succession, it becomes more likely to come up tails. Over a very large number of instances the law of large numbers comes into play; but not in a handful.

Regression to the mean in any series with complex phenomena that are dependent on many variables, where chance is involved, extreme outcomes tend to be followed by more moderate ones. Or: the tendency for an extreme value of a random quantity whose values cluster around an average to be followed by a value closer to the average or mean.

Poisson probability distribution measures the probability that a certain number of events occur within a certain period of time. The events need to be a) unrelated to each other b) to occur with a known average rate. The Ppd can be used to work out things like the numbers of cars that pass on a certain road in a certain time, the number of telephone calls a call center receives per minute.

Bayes’ Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if cancer is related to age, then, using Bayes’ theorem, a person’s age can be used to more accurately assess the probability that they have cancer, compared to the assessment of the probability of cancer made without knowledge of the person’s age.

Arrow’s impossibility theorem (1951) no rank-order electoral system can be designed that always satisfies these three “fairness” criteria:

• If every voter prefers alternative X over alternative Y, then the group prefers X over Y.
• If every voter’s preference between X and Y remains unchanged, then the group’s preference between X and Y will also remain unchanged (even if voters’ preferences between other pairs like X and Z, Y and Z, or Z and W change).
• There is no “dictator”: no single voter possesses the power to always determine the group’s preference.

The prisoner’s dilemma (1951) Two criminals are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communicating with the other. The prosecutors lack sufficient evidence to convict the pair on the principal charge, but they have enough to convict both on a lesser charge. The prosecutors offer each prisoner a bargain. Each prisoner is given the opportunity either to betray the other by testifying that the other committed the crime, or to cooperate with the other by remaining silent. The offer is:

• If A and B each betray the other, each of them serves two years in prison
• If A betrays B but B remains silent, A will be set free and B will serve three years in prison (and vice versa)
• If A and B both remain silent, both of them will only serve one year in prison (on the lesser charge).

Prisoner’s dilemma graphic. Source: Wikipedia

Binomial probability Binomial means it has one of only two outcomes such as heads or tails. A binomial experiment is one that possesses the following properties:

• The experiment consists of n repeated trials
• Each trial results in an outcome that may be classified as a success or a failure (hence the name, binomial)
• The probability of a success, denoted by p, remains constant from trial to trial and repeated trials are independent.

The number of successes X in n trials of a binomial experiment is called a binomial random variable. The probability distribution of the random variable X is called a binomial distribution.

Type I and type II errors Type I error is where a true hypothesis is rejected. Type II error is where a false hypothesis is accepted.

Confidence interval Used in surveys, the confidence interval is a range of values, above and below a finding, in which the actual value is likely to fall. The confidence interval represents the accuracy or precision of an estimate.

Central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed. OR: the sum or average of a large bunch of measurements follows a normal curve even if the individual measurements themselves do not. OR: averages and sums of non-normally distributed quantities will nevertheless themselves have a normal distribution. OR:

Under a wide variety of circumstances, averages (or sums) of even non-normally distributed quantities will nevertheless have a normal distribution (p.179)

Regression analysis here are many types of regression analysis, at their core they all examine the influence of one or more independent variables on a dependent variable. Performing a regression allows you to confidently determine which factors matter most, which factors can be ignored, and how these factors influence each other. In order to understand regression analysis you must comprehend the following terms:

• Dependent Variable: This is the factor you’re trying to understand or predict.
• Independent Variables: These are the factors that you hypothesize have an impact on your dependent variable.

Correlation is not causation a principle which cannot be repeated too often.

Gaussian distribution Gaussian distribution (also known as normal distribution) is a bell-shaped curve, and it is assumed that during any measurement values will follow a normal distribution with an equal number of measurements above and below the mean value.

The normal distribution is the most important probability distribution in statistics because it fits so many natural phenomena. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution.

Statistical significance A result is statistically significant if it is sufficiently unlikely to have occurred by chance.

2. A Mathematician Reads the Newspaper: Making Sense of the Numbers in the Headlines

Incidence matrices In mathematics, an incidence matrix is a matrix that shows the relationship between two classes of objects. If the first class is X and the second is Y, the matrix has one row for each element of X and one column for each element of Y. The entry in row x and column y is 1 if x and y are related (called incident in this context) and 0 if they are not. Paulos creates an incidence matrix to show

Complexity horizon On the analogy of an ‘event horizon’ in physics, Paulos suggests this as the name for levels of complexity in society around us beyond which mathematics cannot go. Some things just are too complex to be understood using any mathematical tools.

Nonlinear complexity Complex systems often have nonlinear behavior, meaning they may respond in different ways to the same input depending on their state or context. In mathematics and physics, nonlinearity describes systems in which a change in the size of the input does not produce a proportional change in the size of the output.

The Banzhaf power index is a power index defined by the probability of changing an outcome of a vote where voting rights are not necessarily equally divided among the voters or shareholders. To calculate the power of a voter using the Banzhaf index, list all the winning coalitions, then count the critical voters. A critical voter is a voter who, if he changed his vote from yes to no, would cause the measure to fail. A voter’s power is measured as the fraction of all swing votes that he could cast. There are several algorithms for calculating the power index.

Vector field may be thought of as a rule f saying that ‘if an object is currently at a point x, it moves next to point f(x), then to point f(f(x)), and so on. The rule f is non-linear if the variables involved are squared or multiplied together and the sequence of the object’s positions is its trajectory.

Chaos theory (1960) is a branch of mathematics focusing on the behavior of dynamical systems that are highly sensitive to initial conditions.

‘Chaos’ is an interdisciplinary theory stating that within the apparent randomness of chaotic complex systems, there are underlying patterns, constant feedback loops, repetition, self-similarity, fractals, self-organization, and reliance on programming at the initial point known as sensitive dependence on initial conditions.

The butterfly effect describes how a small change in one state of a deterministic nonlinear system can result in large differences in a later state, e.g. a butterfly flapping its wings in Brazil can cause a hurricane in Texas.

Linear models are used more often not because they are more accurate but because that are easier to handle mathematically.

All mathematical systems have limits, and even chaos theory cannot predict even relatively simple nonlinear situations.

Zipf’s Law states that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table. So word number n has a frequency proportional to 1/n. Thus the most frequent word will occur about twice as often as the second most frequent word, three times as often as the third most frequent word, etc. For example, in one sample of words in the English language, the most frequently occurring word, “the”, accounts for nearly 7% of all the words (69,971 out of slightly over 1 million). True to Zipf’s Law, the second-place word “of” accounts for slightly over 3.5% of words (36,411 occurrences), followed by “and” (28,852). Only about 135 words are needed to account for half the sample of words in a large sample

Benchmark estimates Benchmark numbers are numbers against which other numbers or quantities can be estimated and compared. Benchmark numbers are usually multiples of 10 or 100.

Non standard models Almost everyone, mathematician or not, is comfortable with the standard model (N : +, ·) of arithmetic. Less familiar, even among logicians, are the non-standard models of arithmetic.

The S-curve A sigmoid function is a mathematical function having a characteristic “S”-shaped curve or sigmoid curve. Often, sigmoid function refers to the special case of the logistic function shown below

and defined by the formula:

This curve, sometimes called the logistic curve is extremely widespread: it appears to describe the growth of entities as disparate as Mozart’s symphony production, the rise of airline traffic, and the building of Gothic cathedrals (p.91)

Differential calculus The study of rates of change, rates of rates of change, and the relations among them.

Algorithm complexity gives on the length of the shortest program (algorithm) needed to generate a given sequence (p.123)

Chaitin’s theorem states that every computer, every formalisable system, and every human production is limited; there are always sequences that are too complex to be generated, outcomes too complex to be predicted, and events too dense to be compressed (p.124)

Simpson’s paradox (1951) A phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined.

The amplification effect of repeated playing the same game, rolling the same dice, tossing the same coin.

Innumeracy by John Allen Paulos (1988)

Our innate desire for meaning and pattern can lead us astray… (p.81)

Giving due weight to the fortuitous nature of the world is, I think, a mark of maturity and balance. (p.133)

John Allen Paulos is an American professor of mathematics who won fame beyond his academic milieu with the publication of this short (134-page) but devastating book thirty years ago, the first of a series of books popularising mathematics in a range of spheres from playing the stock market to humour.

As Paulos explains in the introduction, the world is full of humanities graduates who blow a fuse if you misuse ‘infer’ and ‘imply’, or end a sentence with a dangling participle, but are quite happy to believe and repeat the most hair-raising errors in maths, statistics and probability.

The aim of this book was:

• to lay out examples of classic maths howlers and correct them
• to teach readers to be more alert when maths, stats and data need to be used
• and to provide basic rules in order to understand when innumerate journalists, politicians, tax advisors and other crooks are trying to pull the wool over your eyes, or are just plain wrong

There are five chapters:

1. Examples and principles
2. Probability and coincidence
3. Pseudoscience
4. Whence innumeracy
5. Statistics, trade-offs and society

Many common themes emerge:

Don’t personalise, numeratise

One contention of this book is that innumerate people characteristically have a strong tendency to personalise – to be misled by their own experiences, or by the media’s focus on individuals and drama… (p.1)

Powers

The first chapter uses lots of staggering statistics to get the reader used to very big and very small numbers, and how to compute them.

1 million seconds is 11 and a half days. 1 billion seconds is 32 years.

He suggests you come up with personal examples of numbers for each power up to 12 or 13 i.e. meaningful embodiments of thousands, tens of thousands, hundreds of thousands and so on to help you remember and contextualise them in a hurry.

A snail moves at 0.005 miles an hour, Concorde at 2,000 miles per hour. Escape velocity from earth is about 7 miles per second, or 25,000 miles per hour. The mass of the Earth is 5.98 x 1024 kg

Early on he tells us to get used to the nomenclature of ‘powers’ – using 10 to the power 3 or 10³ instead of 1,000, or 10 to negative powers to express numbers below 1. (In fact, right at this early stage I found myself stumbling because one thousand means more to me that 10³ and a thousandth means more than more 10-3 but if you keep at it, it is a trick you can acquire quite quickly.)

He introduces us to basic ideas like the additive principle (aka the rule of sum), which states that if some choice can be made in M different ways and some subsequent choice can be made in N different ways, then there are M x N different ways these choices can be made in succession – which can be applied to combinations of multiple items of clothes, combinations of dishes on a menu, and so on.

Thus the number of results you get from rolling a die is 6. If you roll two dice, you can now get 6 x 6 = 36 possible numbers. Three numbers = 216. If you want to exclude the number you get on the first dice from the second one, the chances of rolling two different numbers on two dice is 6 x 5, of rolling different numbers on three dice is 6 x 5 x 4, and so on.

Thus: Baskin Robbins advertises 31 different flavours of ice cream. Say you want a triple scoop cone. If you’re happy to have any combination of flavours, including where any 2 or 3 flavours are the same – that’s 31 x 31 x 31 = 29,791. But if you ask how many combinations of flavours there are, without a repetition of the same flavour in any of the cones – that is 31 x 30 x 29 = 26,970 ways of combining.

Probability

I struggled with even the basics of probability. I understand a 1 in five chance of something happening, reasonably understand a 20% chance of something happening, but struggled when probability was expressed as a decimal number e.g. 0.2 as a way of writing a 20 percent or 1 in 5 chance.

With the result that he lost me on page 16 on or about the place where he explained the following example.

Apparently a noted 17th century gambler asked the famous mathematician Pascal which is more likely to occur: obtaining at least one 6 in four rolls of a single die, or obtaining at least one 12 in twenty four rolls of a pair of dice. Here’s the solution:

Since 5/6 is the probability of not rolling a 6 on a single roll of a die, (5/6)is the probability of not rolling a 6 in four rolls of the die. Subtracting this number from 1 gives us the probability that this latter event (no 6s) doesn’t occur; in other words, of there being at least one 6 rolled in four tries: 1 – (5/6)= .52. Likewise, the probability of rolling at least one 12 in twenty-four rolls of a pair of dice is seen to be 1 – (35/36)24 = .49.

a) He loses me in the second sentence which I’ve read half a dozen times and still don’t understand – it’s where he says the chances that this latter event doesn’t occur: something about the phrasing there, about the double negative, loses me completely, with the result that b) I have no idea whether .52 is more likely or less likely than .49.

He goes on to give another example: if 20% of drinks dispensed by a vending machine overflow their cups, what is the probability that exactly three of the next ten will overflow?

The probability that the first three drinks overflow and the next seven do not is (.2)x (.8)7. But there are many different ways for exactly three of the ten cups to overflow, each way having probability (.2)x (.8)7. It may be that only the last three cups overflow, or only the fourth, fifth and ninth cups, and so on. Thus, since there are altogether (10 x 9 x 8) / (3 x 2 x 1) = 120 ways for us to pick three out of the ten cups, the probability of some collection of exactly three cups overflowing is 120 x (.2)x (.8)7.

I didn’t understand the need for the (10 x 9 x 8) / (3 x 2 x 1) equation – I didn’t understand what it was doing, and so didn’t understand what it was measuring, and so didn’t understand the final equation. I didn’t really have a clue what was going on.

In fact, by page 20, he’d done such a good job of bamboozling me with examples like this that I sadly concluded that I must be innumerate.

More than that, I appear to have ‘maths anxiety’ because I began to feel physically unwell as I read that problem paragraph again and again and again and didn’t understand it. I began to feel a tightening of my chest and a choking sensation in my throat. Rereading it now is making it feel like someone is trying to strangle me.

Maybe people don’t like maths because being forced to confront something you don’t understand, but which everyone around you is saying is easy-peasy, makes you feel ill.

2. Probability and coincidence

Having more or less given up on trying to understand Paulos’s maths demonstrations in the first twenty pages, I can at least latch on to his verbal explanations of what he’s driving at, in sentences like these:

A tendency to drastically underestimate the frequency of coincidences is a prime characteristic of innumerates, who generally accord great significance to correspondences of all sorts while attributing too little significance to quite conclusive but less flashy statistical evidence. (p.22)

It would be very unlikely for unlikely events not to occur. (p.24)

There is a strong general tendency to filter out the bad and the failed and to focus on the good and the successful. (p.29)

Belief in the… significance of coincidences is a psychological remnant of our past. It constitutes a kind of psychological illusion to which innumerate people are particularly prone. (p.82)

Slot machines light up and make a racket when people win, there is unnoticed silence for all the failures. Big winners on the lottery are widely publicised, whereas every one of the tens of millions of failures is not.

One result is ‘Golden Age’ thinking when people denigrate today’s sports or arts or political figures, by comparison with one or two super-notable figures from the vast past, Churchill or Shakespeare or Michelangelo, obviously neglecting the fact that there were millions of also-rans and losers in their time as well as ours.

The Expected value of a quality is the average of its values weighted according to their probabilities. I understood these words but I didn’t understand any of the five examples he gave.

The likelihood of probability In many situations, improbability is to be expected. The probability of being dealt a particular hand of 13 cards in bridge is less than 1 in 600 billion. And yet it happens every time someone is dealt a hand in bridge. The improbable can happen. In fact it happens all the time.

The gambler’s fallacy The belief that, because a tossed coin has come up tails for a number of tosses in a row, it becomes steadily more likely that the next toss will be a head.

3. Pseudoscience

Paulos rips into Freudianism and Marxism for the way they can explain away any result counter to their ‘theories’. The patient gets better due to therapy: therapy works. The patient doesn’t get better during therapy, well the patient was resisting, projecting their neuroses on the therapist, any of hundreds of excuses.

But this is just warming up before he rips into a real bugbear of  his, the wrong-headedness of Parapsychology, the Paranormal, Predictive dreams, Astrology, UFOs, Pseudoscience and so on.

As with predictive dreams, winning the lottery or miracle cures, many of these practices continue to flourish because it’s the handful of successes which stand out and grab our attention and not the thousands of negatives.

Probability

As Paulos steams on with examples from tossing coins, rolling dice, playing roulette, or poker, or blackjack, I realise all of them are to do with probability or conditional probability, none of which I understand.

This is why I have never gambled on anything, and can’t play poker. When he explains precisely how accumulating probabilities can help you win at blackjack in a casino, I switch off. I’ve never been to a casino. I don’t play blackjack. I have no intention of ever playing blackjack.

When he says that probability theory began with gambling problems in the seventeenth century, I think, well since I don’t gamble at all, on anything, maybe that’s why so much of this book is gibberish to me.

Medical testing and screening

Apart from gambling the two most ‘real world’ areas where probability is important appear to be medicine and risk and safety assessment. Here’s an extended example he gives of how even doctors make mistakes in the odds.

Assume there is a test for cancer which is 98% accurate i.e. if someone has cancer, the test will be positive 98 percent of the time, and if one doesn’t have it, the test will be negative 98 percent of the time. Assume further that .5 percent – one out of two hundred people – actually have cancer. Now imagine that you’ve taken the test and that your doctor sombrely informs you that you have tested positive. How depressed should you be? The surprising answer is that you should be cautiously optimistic. To find out why, let’s look at the conditional probability of your having cancer, given that you’ve tested positive.

Imagine that 10,000 tests for cancer are administered. Of these, how many are positive? On the average, 50 of these 10,000 people (.5 percent of 10,000) will have cancer, and, so, since 98 percent of them will test positive, we will have 49 positive tests. Of the 9,950 cancerless people, 2 percent of them will test positive, for a total of 199 positive tests (.02 x 9,950 = 199). Thus, of the total of 248 positive tests (199 + 49 = 248), most (199) are false positives, and so the conditional probability of having cancer given that one tests positive is only 49/248, or about 20 percent! (p.64)

I struggled to understand this explanation. I read it four or five times, controlling my sense of panic and did, eventually, I think, follow the argumen.

However, worse in a way, when I think I did finally understand it, I realised I just didn’t care. It’s not just that the examples he gives are hard to follow. It’s that they’re hard to care about.

Whereas his descriptions of human psychology and cognitive errors in human thinking are crystal clear and easy to assimilate:

If we have no direct evidence of theoretical support for a story, we find that detail and vividness vary inversely with likelihood; the more vivid details there are to a story, the less likely the story is to be true. (p.84)

4. Whence innumeracy?

It came as a vast relief when Paulos stopped trying to explain probability and switched to a long chapter puzzling over why innumeracy is so widespread in society, which kicks off by criticising the poor level of teaching of maths in school and university.

This was like the kind of hand-wringing newspaper article you can read any day of the week in a newspaper or online, and so felt reassuringly familiar and easy to assimilate. I stopped feeling so panic-stricken.

This puzzling over the disappointing level of innumeracy goes on for quite a while. Eventually it ends with a digression about what appears to be a pet idea of his: the notion of introducing a safety index for activities and illnesses.

Paulos’s suggestion is that his safety index would be on a logarithmic scale, like the Richter Scale – so straightaway he has to explain what a logarithm is: The logarithm for 100 is 2 because 100 is 102, the logarithm for 1,000 is 3 because 1,000 is 103. I’m with him so far, as he goes on to explain that the logarithm of 700 i.e. between 2 (100) and 3 (1,000) is 2.8. Since 1 in 5,300 Americans die in a car crash each year, the safety index for driving would be 3.7, the logarithm of 5,300. And so on with numerous more examples, whose relative risks or dangers he reduces to figures like 4.3 and 7.1.

I did understand his aim and the maths of this. I just thought it was bonkers:

1. What is the point of introducing a universal index which you would have to explain every time anyone wanted to use it? Either it is designed to be usable by the widest possible number of citizens; or it is a neat exercise on maths to please other mathematicians and statisticians.

2. And here’s the bigger objection – What Paulos, like most of the university-educated, white, liberal intellectuals I read in papers, magazines and books, fails to take into account is that a large proportion of the population is thick.

Up to a fifth of the adult population of the UK is functionally innumerate, that means they don’t know what a ‘25% off’ sign means on a shop window. For me an actual social catastrophe being brought about by this attitude is the introduction of Universal Credit by the Conservative government which, from top to bottom, is designed by middle-class, highly educated people who’ve all got internet accounts and countless apps on their smartphones, and who have shown a breath-taking ignorance about what life is like for the poor, sick, disabled, illiterate and innumerate people who are precisely the people the system is targeted at.

Same with Paulos’s scheme. Smoking is one of the most dangerous and stupid things which any human can do. Packs of cigarettes have for years, now, carried pictures of disgusting cancerous growths and the words SMOKING KILLS. And yet despite this, about a fifth of adults, getting on for 10 million people, still smoke. 🙂

Do you really think that introducing a system using ornate logarithms will get people to make rational assessments of the risks of common activities and habits?

Paulos then goes on to complicate the idea by suggesting that, since the media is always more interested in danger than safety, maybe it would be more effective, instead of creating a safety index, to create a danger index.

You would do this by

1. working out the risk of an activity (i.e. number of deaths or accidents per person doing the activity)
2. converting that into a logarithmic value (just to make sure than nobody understands it) and then
3. subtracting the logarithmic value of the safety index from 10, in order to create a danger index

He goes on to say that driving a car and smoking would have ‘danger indices’ of 3.7 and 2.9, respectively. The trouble was that by this point I had completely ceased to understand what he’s saying. I felt like I’ve stepped off the edge of a tall building into thin air. I began to have that familiar choking sensation, as if someone was squeezing my chest. Maths anxiety.

Under this system being kidnapped would have a safety index of 6.7. Playing Russian roulette once a year would have a safety index of 0.8.

It is symptomatic of the uselessness of the whole idea that Paulos has to remind you what the values mean (‘Remember that the bigger the number, the smaller the risk.’ Really? You expect people to run with this idea?)

Having completed the danger index idea, Paulos returns to his extended lament on why people don’t like maths. He gives a long list of reasons why he thinks people are so innumerate a condition which is, for him, a puzzling mystery.

For me this lament is a classic example of what you could call intellectual out-of-touchness. He is genuinely puzzled why so many of his fellow citizens are innumerate, can’t calculate simple odds and fall for all sorts of paranormal, astrology, snake-oil blether.

He proposes typically academic, university-level explanations for this phenomenon – such as that people find maths too cold and analytical and worry that it prevents them thinking about the big philosophical questions in life. He worries that maths has an image problem.

In other words, he fails to consider the much more obvious explanation that maths, probability and numeracy in general might be a combination of fanciful, irrelevant and deeply, deeply boring.

I use the word ‘fanciful’ deliberately. When he writes that the probability of drawing two aces in succession from a pack of cards is not (4/52 x 4/52) but (4/52 x 3/51) I do actually understand the distinction he’s making (having drawn one ace there are only 3 left and only 52 cards left) – I just couldn’t care less. I really couldn’t care less.

Or take this paragraph:

Several years ago Pete Rose set a National League record by hitting safely in forty-four consecutive games. If we assume for the sake of simplicity that he batted .300 (30 percent of the time he got a hit, 70 percent of the time he didn’t) and that he came to bat four times a game, the chances of his not getting a hit in any given game were, assuming independence, (.7)4 – .24… [at this point Paulos has to explain what ‘independence’ means in a baseball context: I couldn’t care less]… So the probability he would get at least one hit in any game was 1-.24 = .76. Thus, the chances of him getting a hit in any given sequence of forty-four consecutive games were (.76)44 = .0000057, a tiny probability indeed. (p.44)

I did, in fact, understand the maths and the working out in this example. I just don’t care about the problem or the result.

For me this is a – maybe the – major flaw of this book. This is that in the blurbs on the front and back, in the introduction and all the way through the text, Paulos goes on and on about how we as a society need to be mathematically numerate because maths (and particularly probability) impinges on so many areas of our life.

But when he tries to show this – when he gets the opportunity to show us what all these areas of our lives actually are – he completely fails.

Almost all of the examples in the book are not taken from everyday life, they are remote and abstruse problems of gambling or sports statistics.

• which is more likely: obtaining at least one 6 in four rolls of a single die, or obtaining at least one 12 in twenty four rolls of a pair of dice?
• if 20% of drinks dispensed by a vending machine overflow their cups, what is the probability that exactly three of the next ten will overflow?
• Assume there is a test for cancer which is 98% accurate i.e. if someone has cancer, the test will be positive 98 percent of the time, and if one doesn’t have it, the test will be negative 98 percent of the time. Assume further that .5 percent – one out of two hundred people – actually have cancer. Now imagine that you’ve taken the test and that your doctor sombrely informs you that you have tested positive. How depressed should you be?
• What are the odds on Pete Rose getting a hit in a sequence of forty-four games?

Are these the kinds of problems you are going to encounter today? Or tomorrow? Or ever?

No. The longer the book went on, the more I realised just how little a role maths plays in my everyday life. In fact more or less the only role maths plays in my life is looking at the prices in supermarkets, where I am attracted to goods which have a temporary reduction on them. But I do that because they’re labels are coloured red, not because I calculate the savings. Being aware of the time, so I know when to do household chores or be somewhere punctually. Those are the only times I used numbers today.

5. Statistics, trade-offs and society

This feeling that the abstruseness of the examples utterly contradicts the bold claims that reading the book will help us with everyday experiences was confirmed in the final chapter, which begins with the following example.

Imagine four dice, A, B, C and D, strangely numbered as follows: A has 4 on four faces and 0 on two faces; B has 3s on all six faces; C has four faces with 2 and two faces with 6; and D has 5 on three faces and 1 on three faces…

I struggled to the end of this sentence and just thought: ‘No, no more, I don’t have to make myself feel sick and unhappy any more’ – and skipped the couple of pages detailing the fascinating and unexpected results you can get from rolling such a collection of dice.

This chapter goes on to a passage about the Prisoner’s Dilemma, a well-known problem in logic, which I have read about and instantly forgotten scores of times over the years.

Paulos gives us three or four variations on the idea, including:

• Imagine you are locked up in prison by a philanthropist with 20 other people.

Or:

• Imagine you are locked in a dungeon by a sadist with 20 other people.

Or:

• Imagine you are one of two drug traffickers making a quick transaction on a street corner and forced to make a quick decision.

Or:

• Imagine you are locked in a prison cell, and another prisoner is locked in an identical cell down the corridor.

Well, I’m not any of these things, I’m never likely to be, and I am not really interested in these fanciful speculations.

Moreover, I am well into middle age, have travelled round the world, had all sorts of jobs in companies small, large and enormous – and I am not aware of having ever been in any situation which remotely resembled any variation of the Prisoner’s Dilemma I’ve ever heard of.

In other words, to me, it is another one of the endless pile of games and puzzles which logicians and mathematicians love to spend all day playing but which have absolutely no impact whatsoever on any aspect of my life.

Pretty much all of his examples conclusively prove how remote mathematical problems and probabilistic calculation is from the everyday lives you and I lead. When he asks:

How many people would there have to be in a group in order for the probability to be half that at least two people in it have the same birthday? (p.23)

Imagine a factory which produces small batteries for toys, and assume the factory is run by a sadistic engineer… (p.117)

It dawns on me that my problem might not be that I’m innumerate, so much as I’m just uninterested in trivial or frivolous mental exercises.

Someone offers you a choice of two envelopes and tells you one has twice as much money in it as the other. (p.127)

Flip a coin continuously until a tail appears for the first time. If this doesn’t happen until the twentieth (or later) flip, you win \$1 billion. If the first tail occurs before the twentieth flip, you pay \$100. Would you play? (p.128)

No. I’d go and read an interesting book.

Thoughts

If Innumeracy: Mathematical Illiteracy and Its Consequences is meant to make its readers more numerate, it failed with me.

This is for a number of reasons:

1. crucially – because he doesn’t explain maths very well; or, the way he explained probability had lost me by about page 16 – in other words, if this is meant to be a primer for innumerate people it’s a fail
2. because the longer it goes on, the more convinced I became that I rarely use maths, arithmetic and probability in my day today life: whole days go by when I don’t do a single sum, and so lost all motivation to submit myself to the brain-hurting ordeal of trying to understand his examples

3. Also because the structure and presentation of the book is a mess. The book meanders through a fog of jokes, anecdotes and maths trivia, baseball stories and gossip about American politicians – before suddenly unleashing a fundamental aspect of probability theory on the unwary reader.

I’d have preferred the book to have had a clear, didactic structure, with an introduction and chapter headings explaining just what he was going to do, an explanation, say, of how he was going to take us through some basic concepts of probability one step at a time.

And then for the concepts to have been laid out very clearly and explained very clearly, from a number of angles, giving a variety of different examples until he and we were absolutely confident we’d got it – before we moved on to the next level of complexity.

The book is nothing like this. Instead it sacrifices any attempt at logical sequencing or clarity for anecdotes about Elvis Presley or UFOs, for digressions about Biblical numerology, the silliness of astrology, the long and bewildering digression about introducing a safety index for activities (summarised above), or prolonged analyses of baseball or basketball statistics. Oh, and a steady drizzle of terrible jokes.

Which two sports have face-offs?
Ice hockey and leper boxing.

Half way through the book, Paulos tells us that he struggles to write long texts (‘I have a difficult time writing at extended length about anything’, p.88), and I think it really shows.

It certainly explains why:

• the blizzard of problems in coin tossing and dice rolling stopped without any warning, as he switched tone copletely, giving us first a long chapter about all the crazy irrational beliefs people hold, and then another chapter listing all the reasons why society is innumerate
• the last ten pages of the book give up the attempt of trying to be a coherent narrative and disintegrate into a bunch of miscellaneous odds and ends he couldn’t find a place for in the main body of the text

Also, I found that the book was not about numeracy in the broadest sense, but mostly about probability. Again and again he reverted to examples of tossing coins and rolling dice. One enduring effect of reading this book is going to be that, the next time I read a description of someone tossing a coin or rolling a die, I’m just going to skip right over the passage, knowing that if I read it I’ll either be bored to death (if I understand it) or have an unpleasant panic attack (if I don’t).

In fact in the coda at the end of the book Paulos explicitly says it has mostly been about probability – God, I wish he’d explained that at the beginning.

Right at the very, very end he briefly lists key aspects of probability theory which he claims to have explained in the book – but he hasn’t, some of them are only briefly referred to with no explanation at all, including: statistical tests and confidence intervals, cause and correlation, conditional probability, independence, the multiplication principle, the notion of expected value and of probability distribution.

These are now names I have at least read about, but they are all concepts I am nowhere near understanding, and light years away from being able to use in practical life.

Innumeracy – or illogicality?

Also there was an odd disconnect between the broadly psychological and philosophical prose explanations of what makes people so irrational, and the incredibly narrow scope of the coin-tossing, baseball-scoring examples.

What I’m driving at is that, in the long central chapter on Pseudoscience, when he stopped to explain what makes people so credulous, so gullible, he didn’t really use any mathematical examples to disprove Freudianism or astrology or so on: he had to appeal to broad principles of psychology, such as:

• people are drawn to notable exceptions, instead of considering the entire field of entities i.e.
• people filter out the bad and the failed and focus on the good and the successful
• people seize hold of the first available explanation, instead of considering every single possible permutation
• people humanise and personalise events (‘bloody weather, bloody buses’)
• people over-value coincidences

My point is that there is a fundamental conceptual confusion in the book which is revealed in the long chapter about pseudoscience which is that his complaint is not, deep down, right at bottom, that people are innumerate; it is that people are hopelessly irrational and illogical.

Now this subject – the fundamental ways in which people are irrational and illogical – is dealt with much better, at much greater length, in a much more thorough, structured and comprehensible way in Stuart Sutherland’s great book, Irrationality, which I’ll be reviewing and summarising later this week.

Innumeracy amounts to random scratches on the surface of the vast iceberg which is the deep human inability to think logically.

Conclusion

In summary, for me at any rate, this was not a good book – badly structured, meandering in direction, unable to explain even basic concepts but packed with digressions, hobby horses and cul-de-sacs, unsure of its real purpose, stopping for a long rant against pseudosciences and an even longer lament on why maths is taught so badly  – it’s a weird curate’s egg of a text.

Its one positive effect was to make me want to track down and read a good book about probability.