Computer Science 280: Homework 9

Homework 9:
4/9/08 (due 4/16/08) Variance and Chebyshev's inequality is not covered in DAM2. For those using DAM2, I will hand out Section 6.6 from DAM3 that covers it on Monday.

Section   Number           Points    Comments/Hints
6.6        2 		    3         
           4                3         
           9(a)             4        Hint: use Chebyshev's Inequality
           19(b)            3
6.8 	   7 		    3        DAM2, 6.6, 7
Extra problems (the last two are due to Jon Kleinberg):

1. [5 points] You are trying to estimate the probability that a coin lands heads. How many times do you have to toss the coin to guarantee that your esimate is within 0.01 of the true probability with probability at least .9. That is, if p is the true probability, you toss the coin n times, and #H is the number of heads you get, you want Pr(|#H/n -p| ≤ 0.01) ≥ 0.9. (Hint: use Chebyshev's inequality.)

2. [3 points] From past experience, a professor know that the test score of a student taking the final exam is a random variable with mean 75. Use Markov's inequality to get an upper bound for the probability that a student's test score is greater than or equal to 85.

3. [3 points] Suppose that in addition the professor knows that the variance of the test score is 25. What is a lower bound on the probability that a student scores between 65 and 85.

4. [4 points] In class I proved Markov's inequality: If X is a nonnegative random variable, then

Show that this is the best we can hope for. Given a ≥ 1, construct a random variable X and a probability distribution Pr such that

5. [7 points] In performing blood donation, it is important to be careful that the type of the blood being donated is compatible with the blood type of the recipient. Concretely, this works as follows. A person's own blood supply has certain antigens present (we can think of antigens as a kind of molecular signature), and a person cannot receive blood with a particular antigen if their own blood does not have this antigen present. This principle underpins the division of blood into four types: A, B, AB, and O. [The Austrian scientist Karl Landsteiner received the Nobel Prize in 1930 for his discovery of the blood types A, B, O, and AB.] Blood of type A has the A antigen, blood of type B has the B antigen, blood of type AB has both, and blood of type O has neither. Thus, patients with type A can receive only blood types A or O in a transfusion, patients with type B can receive only B or O, patients with type O can receive only O, and patients with type AB can receive any of the four types. Notice that blood of type O has the property that it can be donated to anyone, so hospitals are particularly interested in having a reasonable quantity of type-O blood in supply. Roughly, the breakdown of blood types among people in the U.S. is as follows: 45% have type O, 42% have type A, 10% have type B, and 3% have type AB. Now, consider the following scenario. A large blood donation clinic gets 10,000 random visitors, each of whom donates one unit of blood. They want to claim that with high probability, the amount of type-O blood they receive will be at least reasonably large. Let's model this as the following random experiment: each of the 10,000 donors who arrive has a blood type chosen independently with probability .45 of O, probability .42 of A, probability .10 of B, and probability .03 of AB. Let X be a random variable equal to the number of people, from among this group, who have type-O blood.

(a) Determine E(X) and Var(X). Show the reasoning and/or calculations you use to get these values. (You can use the fact that the variance of the sum of independent random variables is the sum of the variances of each one.)

(b) Show that Pr(X < 4000) ≤ .01. In other words, with high probability there will be at least 4000 type-O donors in this group. Explain your reasoning and/or calculations in arriving at this answer.

6. [4 points] Suppose you're managing the storage for a data-intensive satellite survey of the Earth's surface. Each day the survey generates an amount of data that is a whole number of terabytes drawn uniformly at random from the integers between 5 and 15 inclusive. (In other words, the amount of data generated in one day is equally likely to be one of the values 5 TB, 6 TB, 7 TB, ..., 14 TB, 15 TB.) We'll assume that the (random) amount of data generated on one day is independent of the amount generated on any other day. The survey is going to last for 100 days, and the project's budget allows you to buy 1200 terabytes of disk storage. Naturally, you're interested in knowing how likely it is that the survey will generate so much data that it overows the disks you have available. Show that with probability at least 95%, the survey generates an amount of data that fits on your available disk storage. Give an explanation for your answer.