Chapter 16 The Wisdom and Foolishness of Crowds

In this chapter, we begin our explorations of the power of beliefs. We here explore how rational players update their beliefs when receiving information (signals), and how these beliefs affect the aggregate behavior of a crowd of people; in particular, as we shall see, depending on the context, two very different types of effects can arise. (We here make extensive use of notions from probability theory; see Appendix A for the necessary preliminaries.)

16.1 The Wisdom of Crowds

It has been experimentally observed that if we take a large crowd of people, each of whom has a poor estimate of some quantity, the median of the crowd’s estimates tends to be a fairly good estimate: the seminal paper by Sir Francis Galton [Gal07] (who was Charles Darwin’s cousin), published in Nature in 1907, investigated a crowd of people attempting to guess the weight of an ox at a county fair; while most people were individually far off, the “middlemost” (i.e., the median voter) was surprisingly close: the actual weight of the ox was 1,198 pounds and the middlemost estimate 1,207 pounds. And the mean of the guesses was even closer!

A simple signaling game Let us introduce a simple model to explain this “wisdom of crowds” phenomenon. We will focus on a simple “yes/no” (i.e., binary) decision (e.g., “does smoking cause cancer?” or “is climate change real?”). Formally, the state of the world can be expressed as a single bit $W$ . Now, let us assume we have a set of $n$ individuals. A priori, all of them consider $W = 0$ and $W = 1$ to be as likely—that is, according to their beliefs $Pr [W = 1] = 1 / 2$ —but then each individual $i$ receives some independent signal $X_{i}$ that is correlated with the state of the world, but only very weakly so: for every $b \in {0, 1}$ ,

Pr [X_{i} = b ∣ W = b] \geq 1 / 2 + 𝜖,

where $𝜖$ is some small constant, and all $X_{i}$ are independent random variables.

Consider a game where each individual/player is supposed to output a guess for $W$ and receives “high” utility if they output the correct answer and “low” utility otherwise.¹ The players all need to make these guesses simultaneously (or at least independent of the guesses of the other players), so the only information a player has when making his guess is their own signal (i.e., they cannot be influenced by the decision of others). Intuitively, in such a situation, a player $i$ wishing to maximize his expected utility should thus output $X_{i}$ as his guess. To formalize this, we will rely on Bayes’ rule (see Theorem A.1 in Appendix A): Since the utility for a correct guess is higher than for an incorrect one, a player $i$ getting a signal $X_{i} = 1$ should output 1 if

Pr [W = 1 ∣ X_{i} = 1] \geq Pr [W = 0 ∣ X_{1} = 1] .

Let us analyze the left-hand side (LHS) and right-hand side (RHS) separately. By Bayes’ rule,

LHS = Pr [W = 1 ∣ X_{i} = 1] = Pr [X_{i} = 1 ∣ W = 1] \frac{Pr [W = 1]}{Pr [X_{i} = 1]}

\geq \frac{(\frac{1}{2} + 𝜖)}{2 Pr [X_{i} = 1]} .

By the similar logic,

RHS = Pr [W = 0 ∣ X_{i} = 1] = Pr [X_{i} = 1 ∣ W = 0] \frac{Pr [W = 1]}{Pr [X_{i} = 1]}

\leq \frac{(\frac{1}{2} - 𝜖)}{2 Pr [X_{i} = 1]},

which clearly is smaller than the LHS. So, we conclude that whenever a rational player—who wants to maximize their expected utility—receives a signal $X_{i} = 1$ , they should output 1 as their guess; by the same argument it follows that a rational player should output 0 whenever they receive $X_{i} = 0$ as their signal.

Analyzing the aggregate behavior What is the probability that the “majority bit” $b$ that got the most guesses (which is equivalent to the “median vote” for the case of binary decisions) is actually equal to the true state $W$ ? At first sight, one might think that it would still only be $1 / 2 + 𝜖$ —each individual player is only weakly informed about the state of $W$ , so why would majority voting increase the chances of producing an “informed” result? Indeed, if the signals were dependent, majority voting may not help (e.g., if all players receive the same signal). But, we are here considering a scenario where each player receives a “fresh,” independent signal. In this case, we can use a variant of the Law of Large Numbers to argue that the majority vote equals $W$ with high probability (depending on $n$ ).

To show this, let us introduce the useful Hoeffding bound [Hoe63], which can be thought of as a quantitative version of the Law of Large Numbers:²

Theorem 16.1 (Hoeffding Bound). Let $Y_{1}, \dots, Y_{n}$ be $n$ independent random variables over $[a, b]$ . Let $M = \frac{1}{n} \sum_{i = 1}^{n} Y_{i}$ . Then:

Pr [| M - 𝔼 [M] | \geq 𝜖] \leq 2 e^{- \frac{2 𝜖^{2} n}{{(b - a)}^{2}}} .

In other words, the Hoeffding bound is a bound on the deviation of the “empirical mean” of independent random variables (over some bounded range) from the expectation of the empirical mean. We will omit the proof of this theorem; instead, we will prove that in the above-described game, the majority vote will be correct with high probability (if players act rationally). Let $Majority (X_{1}, \dots, X_{n}) = 0$ if at least $n / 2$ of the $x_{i}$ are zero, and 1 otherwise. We now have the following theorem.

Theorem 16.2. Let $W \in {0, 1}$ , and let $X_{1}, \dots, X_{n} \in {0, 1}$ be independent random variables such that $Pr [X_{i} = W] \geq 1 / 2 + 𝜖$ . Then:

Pr [Majority (x_{1}, \dots, x_{n}) = W] \geq 1 - 2 e^{- 2 𝜖^{2} n} .

Proof. Define random variables $Y_{1}, \dots, Y_{n}$ such that $Y_{i} = 1$ if $X_{i} = W$ and $Y_{i} = - 1$ otherwise. Note that if

M = \frac{1}{n} \sum_{i = 1}^{n} Y_{i} > 0

then clearly $Majority (X_{1}, \dots, X_{n}) = W$ . We will show that $Pr [M \leq 0]$ is sufficiently small, which by the above implies the theorem.

By linearity of expectations, we have

𝔼 [M] = \frac{1}{n} \sum_{i = 1}^{n} 𝔼 [Y_{i}] .

Note that, for all $i$ ,

𝔼 [Y_{i}] \geq 1 (\frac{1}{2} + 𝜖) + (- 1) (\frac{1}{2} - 𝜖) = \frac{1}{2} + 𝜖 - \frac{1}{2} + 𝜖 = 2 𝜖 .

Thus,

Pr [M \leq 0] \leq Pr [| M - 2 𝜖 | \geq 2 𝜖] = Pr [| M - 𝔼 [M] | \geq 2 𝜖],

which by the Hoeffding bound (setting $a = 1$ , $b = - 1$ ) is smaller than

2 e^{- \frac{2 {(2 𝜖)}^{2} n}{{(b - a)}^{2}}} = 2 e^{- \frac{2 * 4 𝜖^{2} n}{2^{2}}} = 2 e^{- 2 𝜖^{2} n} .

This concludes the proof.

■

Observe that, as we might expect, the (lower bound on the) probability that the majority of the guesses is correct increases with (a) the number of players $n$ , and (b) the probability $𝜖$ that individual players’ guesses are correct.

A connection to two-candidate voting The above theorem shows that the majority voting rule (studied in chapter 11) has the property that, if the outcome over which players vote objectively is “good” or “bad” (but players are uncertain about which is the case), majority voting will lead to the “good” outcome as long as players receive independent signals that are correlated with the correct state of the world.

Beyond binary decisions Let us briefly point out that the Hoeffding bound applies also in a setting where players need to guess some (nonbinary) real number in some bounded interval (like in the “guessing the weight of the ox” example). Consider, for instance, a scenario where everyone receives as a signal an independently “perturbed” version of the “true value,” where the expectation of the perturbation is 0 (i.e., positive and negative perturbations are as likely)—that is, everyone gets an independently drawn signal whose expected value is the “true value.” The Hoeffding bound then says that, with high probability, the mean of the signals of all the players is close to the true value.

16.2 The Foolishness of Crowds: Herding

Let us now return to the binary decision scenario, but let us change the problem a bit: Instead of having the players simultaneously announce their guesses, the players announce sequentially, one after the other; additionally, before making their own guess, each player gets to first observe the guesses of everyone who preceded them. (For instance, think of this as posting your opinion on Facebook after first seeing what some of your friends posted.)

How should people act in order to maximize the probability that their own guess is correct (and thus maximize their expected utility)? Again, formally, we can model the situation as a game where the player receives some high utility for a correct guess and a low utility for an incorrect guess, but this time it is a game over multiple stages (i.e., an extensive-form game). As before, a priori, the players consider $W = 0$ and $W = 1$ to be as likely; that is, according to their beliefs $Pr [W = 1] = 1 / 2$ . Let us now additionally assume that all players’ evidence is equally strong—that is, for $b \in {0, 1}$ ,

Pr [X_{i} = b ∣ W = b] = 1 / 2 + 𝜖

for all $i$ (i.e., we have equality instead of $\geq$ ). Most importantly, let us now also assume that players not only are rational themselves, but that it is commonly known that everyone is rational—that is, everyone is rational, everyone knows that everyone is rational, everyone knows that everyone knows that everyone is rational, and so on. We shall see how to formalize this notion of common knowledge of rationality in chapter 17, but for now we appeal to intuition (which will suffice for this example).

Analyzing the aggregate behavior To understand what happens in this game, let us analyze the reasoning of the players one by one:

For the first player, nothing has changed from before: the only information he has is

X_{1}

and thus (by the same arguments as above) he will maximize his utility by guessing

g_{1} = X_{1}

The second player effectively has two pieces of evidence now—their own and also the guess of the first player. Additionally, since the second player knows that the first player is rational, he knows that $g_{1} = X_{1}$ . Thus, the second player now effectively sees the signals $X_{1}, X_{2}$ . Intuitively, if $X_{1} = 1$ and $X_{2} = 1$ , they should guess $g_{2} = 1$ , since they now have two pieces of evidence favoring $W = 1$ . We again rely on Bayes’ rule to formalize this. Since the second player is rational, they should guess $g_{2} = 1$ after seeing $X_{1} = 1, X_{2} = 1$ if: $Pr [W = 1 ∣ X_{1} = X_{2} = 1] \geq Pr [W = 0 ∣ X_{1} = X_{2} = 1]$
By Bayes’ rule,
$Pr [W = 1 ∣ X_{1} = 1, X_{2} = 1] = Pr [X_{1} = 1, X_{2} = 1 ∣ W = 1] \frac{Pr [W = 1]}{Pr [X_{1} = 1, X_{2} = 1]},$
which by independence of the signals $X_{1}, X_{2}$ equals
$\frac{{(\frac{1}{2} + 𝜖)}^{2}}{2 Pr [X_{1} = 1, X_{2} = 1]} .$
By the same logic,
$Pr [W = 0 ∣ X_{1} = 1, X_{2} = 1] = \frac{{(\frac{1}{2} - 𝜖)}^{2}}{2 Pr [X_{1} = 1, X_{2} = 1]},$
which clearly is smaller. So, if the second player sees $X_{1} = 1, X_{2} = 1$ they should output 1, and if they see $X_{1} = 0, X_{2} = 0$ they should output 0. If $X_{1} = 1$ but $X_{2} = 0$ , or vice versa, then intuitively $W = 0$ and $W = 1$ are equally likely (which again can be formalized using Bayes’ rule), so a rational player can output either choice as their guess. We will assume that a rational player will prefer his own signal and output $g_{2} = X_{2}$ in this case. Thus, we conclude that a rational player 2, who knows that player 1 is rational (and thus $g_{1} = X_{1}$ ), will always output his signal as his guess (just as a rational player 1 would).³ So far, all is good $\dots$
Let us now turn to the third player. Consider the scenario where the third player sees $g_{1} = 1, g_{2} = 1$ . In this case, no matter what $X_{3}$ is, the third player will be better off guessing 1 under our rationality assumptions. Specifically, if they see three independent signals of equal strength, two of which point to either $W = 0$ or $W = 1$ , it is always better in expectation for the third player to ignore their own evidence and guess in line with the majority. This follows using exactly the same analysis with Bayes’ rule as before. For instance, to analyze the case when the third player’s own signal $X_{3} = 0$ , by the previous argument: $Pr [W = 1 ∣ X_{1} = 1, X_{2} = 1, X_{3} = 0] = \frac{{(\frac{1}{2} + 𝜖)}^{2} (\frac{1}{2} - 𝜖)}{2 Pr [X_{1} = 1, X_{2} = 1, X_{3} = 0]}$
and
$Pr [W = 0 ∣ X_{1} = 1, X_{2} = 1, X_{3} = 0] = \frac{{(\frac{1}{2} - 𝜖)}^{2} (\frac{1}{2} + 𝜖)}{2 Pr [X_{1} = 1, X_{2} = 1, X_{3} = 0]},$
which is smaller. Thus, we see that if the third player believes that the earlier two players output their signals as their guesses, which follows the assumption that player 3 knows that (a) player 1 is rational (and thus $g_{1} = X_{1}$ ), and (b) player 2 is rational and knows that player 1 is rational (and thus $g_{2} = X_{2})$ that in the event that they see $g_{1} = 1, g_{2} = 1$ , they should ignore their own signal and simply guess 1 (and analogously if they see $g_{1} = 0, g_{2} = 0$ ).

Of course, should

g_{1} = 1, g_{2} = 1

happens, then player 4, knowing that player 3 ignored their own evidence (by our rationality assumption), would be in exactly the same situation as player 3 and should thus also ignore their own evidence and output

g_{4} = 1

as well. And so, we end up with an information cascade, where everyone guesses in accordance with the first two players and ignores their own evidence. (In general, this will occur at any point where the number of guesses for either 0 or 1 outnumbers the other by 2!)

Notice that, if, say, $W = 0$ , a cascade where everyone guesses the incorrect state happens as long as $X_{1} = 1, X_{2} = 1$ , which occurs with probability ${(\frac{1}{2} - 𝜖)}^{2}$ . Even with a relatively large $𝜖$ —say, $𝜖 = 0.1$ —this would still occur with probability $0 . 4^{2} = 0.16$ . So, to conclude, if rational players make their guesses sequentially, rather than in parallel, then with probability ${(\frac{1}{2} - 𝜖)}^{2}$ , not only will the majority be incorrect, but we get a herding behavior where everyone “copies” the first two players and thus guesses incorrectly!

In fact, the situation is even worse: if $X_{1} \neq X_{2}$ , the remaining players are effectively ignoring the outputs of the first two players and “start over.” Thus, we will eventually (in fact, rather fast) get to a situation where a cascade starts, and with probability close to $1 / 2$ the cascade is “bad” in the sense that all the cascading players output an incorrect guess.⁴

Of course, in real life, decisions are never fully sequentialized. But a similar analysis applies as long as they are sufficiently sequential—in contrast, if they are sufficiently parallelized, then the previous section’s “wisdom of crowds” analysis applies.

Let us end this section by pointing out that there are several other reasons why cascades may not happen as easily in real life (even if decisions are sequentialized). For instance, in real life, people have a tendency to weight their own evidence (or, for instance, that of their friends or relatives) more strongly than others’ opinions or evidence, whereas in the above-described model, rational agents weight every piece of evidence equally. Furthermore, the herding model disregards the ability of agents to acquire new evidence—for instance, if the third player knows that their own evidence will be ignored no matter what, then they may wish to procure additional evidence at a low cost, if the option is available.

Notes

The term “the wisdom of crowds” was coined in [Sur05] by Surowiecki; Surowiecki’s book also contains many case studies that illustrate the phenomena that the aggregation of information in groups results in decisions that often are better than could have been made by any single member of the group.

Herding was first explored by Banerjee’s in [Ban92]; Banerjee’s analysis, however, relied on stronger rationality assumptions than we do here.

¹Note that this is not a normal-form game as we need to model the fact that players are uncertain about $W$ and receive the signals $X_{i}$ ; formally, this can be done through the notion of a Bayesian game that we have alluded to in the past, but (for our purposes) this extra formalism will only add cumbersome notation without adding any new insights.

²For the analysis of this binary signaling game, a simpler bound called the Chernoff bound [Che52] actually suffices, but the Hoeffding bound is useful also for nonbinary decisions.

³In fact, at this point (although it makes the argument cleaner) we do not even have to assume that player 2 knows that player 1 is rational, since no matter what $X_{1}$ is, player 2 can rationally output $g_{2} = X_{2}$ .

⁴Formally, the probability of the cascade being bad is $\frac{{(1 / 2 - 𝜖)}^{2}}{{(1 / 2 + 𝜖)}^{2} + {(1 / 2 - 𝜖)}^{2}}$ .