Chapter 16

The Wisdom and Foolishness of Crowds

In this chapter, we begin our explorations of the power of beliefs. We here explore how rational players update their beliefs when receiving information (signals), and how these beliefs affect the aggregate behavior of a crowd of people; in particular, as we shall see, depending on the context, two very different types of effects can arise. (We here make extensive use of notions from probability theory; see Appendix A for the necessary preliminaries.)

16.1  The Wisdom of Crowds

It has been experimentally observed that if we take a large crowd of people, each of whom has a poor estimate of some quantity, the median of the crowd’s estimates tends to be a fairly good estimate: the seminal paper by Sir Francis Galton [Gal07] (who was Charles Darwin’s cousin), published in Nature in 1907, investigated a crowd of people attempting to guess the weight of an ox at a county fair; while most people were individually far off, the “middlemost” (i.e., the median voter) was surprisingly close: the actual weight of the ox was 1,198 pounds and the middlemost estimate 1,207 pounds. And the mean of the guesses was even closer!

A simple signaling game Let us introduce a simple model to explain this “wisdom of crowds” phenomenon. We will focus on a simple “yes/no” (i.e., binary) decision (e.g., “does smoking cause cancer?” or “is climate change real?”). Formally, the state of the world can be expressed as a single bit W. Now, let us assume we have a set of n individuals. A priori, all of them consider W = 0 and W = 1 to be as likely—that is, according to their beliefs Pr[W = 1] = 1/2—but then each individual i receives some independent signal Xi that is correlated with the state of the world, but only very weakly so: for every b {0,1},

Pr[Xi = bW = b] 1/2 + 𝜖,

where 𝜖 is some small constant, and all Xi are independent random variables.

Consider a game where each individual/player is supposed to output a guess for W and receives “high” utility if they output the correct answer and “low” utility otherwise.1 The players all need to make these guesses simultaneously (or at least independent of the guesses of the other players), so the only information a player has when making his guess is their own signal (i.e., they cannot be influenced by the decision of others). Intuitively, in such a situation, a player i wishing to maximize his expected utility should thus output Xi as his guess. To formalize this, we will rely on Bayes’ rule (see Theorem A.1 in Appendix A): Since the utility for a correct guess is higher than for an incorrect one, a player i getting a signal Xi = 1 should output 1 if

Pr[W = 1Xi = 1] Pr[W = 0X1 = 1].

Let us analyze the left-hand side (LHS) and right-hand side (RHS) separately. By Bayes’ rule,

LHS = Pr[W = 1Xi = 1] = Pr[Xi = 1W = 1] Pr[W = 1] Pr[Xi = 1] (1 2 + 𝜖) 2Pr[Xi = 1].

By the similar logic,

RHS = Pr[W = 0Xi = 1] = Pr[Xi = 1W = 0] Pr[W = 1] Pr[Xi = 1] (1 2 𝜖) 2Pr[Xi = 1],

which clearly is smaller than the LHS. So, we conclude that whenever a rational player—who wants to maximize their expected utility—receives a signal Xi = 1, they should output 1 as their guess; by the same argument it follows that a rational player should output 0 whenever they receive Xi = 0 as their signal.

Analyzing the aggregate behavior What is the probability that the “majority bit” b that got the most guesses (which is equivalent to the “median vote” for the case of binary decisions) is actually equal to the true state W? At first sight, one might think that it would still only be 1/2 + 𝜖—each individual player is only weakly informed about the state of W, so why would majority voting increase the chances of producing an “informed” result? Indeed, if the signals were dependent, majority voting may not help (e.g., if all players receive the same signal). But, we are here considering a scenario where each player receives a “fresh,” independent signal. In this case, we can use a variant of the Law of Large Numbers to argue that the majority vote equals W with high probability (depending on n).

To show this, let us introduce the useful Hoeffding bound [Hoe63], which can be thought of as a quantitative version of the Law of Large Numbers:2

Theorem 16.1 (Hoeffding Bound). Let Y 1,,Y n be n independent random variables over [a,b]. Let M = 1 n i=1nY i. Then:

Pr[|M 𝔼[M]| 𝜖] 2e 2𝜖2n (ba)2 .

In other words, the Hoeffding bound is a bound on the deviation of the “empirical mean” of independent random variables (over some bounded range) from the expectation of the empirical mean. We will omit the proof of this theorem; instead, we will prove that in the above-described game, the majority vote will be correct with high probability (if players act rationally). Let Majority(X1,,Xn) = 0 if at least n/2 of the xi are zero, and 1 otherwise. We now have the following theorem.

Theorem 16.2. Let W {0,1}, and let X1,,Xn {0,1} be independent random variables such that Pr[Xi = W] 1/2 + 𝜖. Then:

Pr[Majority(x1,,xn) = W] 1 2e2𝜖2n .

Proof. Define random variables Y 1,,Y n such that Y i = 1 if Xi = W and Y i = 1 otherwise. Note that if

M = 1 n i=1nY i > 0

then clearly Majority(X1,,Xn) = W. We will show that Pr[M 0] is sufficiently small, which by the above implies the theorem.

By linearity of expectations, we have

𝔼[M] = 1 n i=1n𝔼[Y i].

Note that, for all i,

𝔼[Y i] 1(1 2 + 𝜖) + (1)(1 2 𝜖) = 1 2 + 𝜖 1 2 + 𝜖 = 2𝜖.

Thus,

Pr[M 0] Pr[|M 2𝜖| 2𝜖] = Pr[|M 𝔼[M]| 2𝜖],

which by the Hoeffding bound (setting a = 1, b = 1) is smaller than

2e2(2𝜖)2n (ba)2 = 2e24𝜖2n 22 = 2e2𝜖2n .

This concludes the proof.

Observe that, as we might expect, the (lower bound on the) probability that the majority of the guesses is correct increases with (a) the number of players n, and (b) the probability 𝜖 that individual players’ guesses are correct.

A connection to two-candidate voting The above theorem shows that the majority voting rule (studied in chapter 11) has the property that, if the outcome over which players vote objectively is “good” or “bad” (but players are uncertain about which is the case), majority voting will lead to the “good” outcome as long as players receive independent signals that are correlated with the correct state of the world.

Beyond binary decisions Let us briefly point out that the Hoeffding bound applies also in a setting where players need to guess some (nonbinary) real number in some bounded interval (like in the “guessing the weight of the ox” example). Consider, for instance, a scenario where everyone receives as a signal an independently “perturbed” version of the “true value,” where the expectation of the perturbation is 0 (i.e., positive and negative perturbations are as likely)—that is, everyone gets an independently drawn signal whose expected value is the “true value.” The Hoeffding bound then says that, with high probability, the mean of the signals of all the players is close to the true value.

16.2  The Foolishness of Crowds: Herding

Let us now return to the binary decision scenario, but let us change the problem a bit: Instead of having the players simultaneously announce their guesses, the players announce sequentially, one after the other; additionally, before making their own guess, each player gets to first observe the guesses of everyone who preceded them. (For instance, think of this as posting your opinion on Facebook after first seeing what some of your friends posted.)

How should people act in order to maximize the probability that their own guess is correct (and thus maximize their expected utility)? Again, formally, we can model the situation as a game where the player receives some high utility for a correct guess and a low utility for an incorrect guess, but this time it is a game over multiple stages (i.e., an extensive-form game). As before, a priori, the players consider W = 0 and W = 1 to be as likely; that is, according to their beliefs Pr[W = 1] = 1/2. Let us now additionally assume that all players’ evidence is equally strong—that is, for b {0,1},

Pr[Xi = bW = b] = 1/2 + 𝜖

for all i (i.e., we have equality instead of ). Most importantly, let us now also assume that players not only are rational themselves, but that it is commonly known that everyone is rational—that is, everyone is rational, everyone knows that everyone is rational, everyone knows that everyone knows that everyone is rational, and so on. We shall see how to formalize this notion of common knowledge of rationality in chapter 17, but for now we appeal to intuition (which will suffice for this example).

Analyzing the aggregate behavior To understand what happens in this game, let us analyze the reasoning of the players one by one:

Notice that, if, say, W = 0, a cascade where everyone guesses the incorrect state happens as long as X1 = 1,X2 = 1, which occurs with probability (1 2 𝜖)2. Even with a relatively large 𝜖—say, 𝜖 = 0.1—this would still occur with probability 0.42 = 0.16. So, to conclude, if rational players make their guesses sequentially, rather than in parallel, then with probability (1 2 𝜖)2, not only will the majority be incorrect, but we get a herding behavior where everyone “copies” the first two players and thus guesses incorrectly!

In fact, the situation is even worse: if X1X2, the remaining players are effectively ignoring the outputs of the first two players and “start over.” Thus, we will eventually (in fact, rather fast) get to a situation where a cascade starts, and with probability close to 1/2 the cascade is “bad” in the sense that all the cascading players output an incorrect guess.4

Of course, in real life, decisions are never fully sequentialized. But a similar analysis applies as long as they are sufficiently sequential—in contrast, if they are sufficiently parallelized, then the previous section’s “wisdom of crowds” analysis applies.

Let us end this section by pointing out that there are several other reasons why cascades may not happen as easily in real life (even if decisions are sequentialized). For instance, in real life, people have a tendency to weight their own evidence (or, for instance, that of their friends or relatives) more strongly than others’ opinions or evidence, whereas in the above-described model, rational agents weight every piece of evidence equally. Furthermore, the herding model disregards the ability of agents to acquire new evidence—for instance, if the third player knows that their own evidence will be ignored no matter what, then they may wish to procure additional evidence at a low cost, if the option is available.

Notes

The term “the wisdom of crowds” was coined in [Sur05] by Surowiecki; Surowiecki’s book also contains many case studies that illustrate the phenomena that the aggregation of information in groups results in decisions that often are better than could have been made by any single member of the group.

Herding was first explored by Banerjee’s in [Ban92]; Banerjee’s analysis, however, relied on stronger rationality assumptions than we do here.

1Note that this is not a normal-form game as we need to model the fact that players are uncertain about W and receive the signals Xi; formally, this can be done through the notion of a Bayesian game that we have alluded to in the past, but (for our purposes) this extra formalism will only add cumbersome notation without adding any new insights.

2For the analysis of this binary signaling game, a simpler bound called the Chernoff bound [Che52] actually suffices, but the Hoeffding bound is useful also for nonbinary decisions.

3In fact, at this point (although it makes the argument cleaner) we do not even have to assume that player 2 knows that player 1 is rational, since no matter what X1 is, player 2 can rationally output g2 = X2.

4Formally, the probability of the cascade being bad is (1/2𝜖)2 (1/2+𝜖)2+(1/2𝜖)2 .