Homework 2

CS 482 - Spring 2005

Due: Friday, Feb 11

Note: Please include your Cornell NetID on your each section of your homework. This simplifies the process of recording your grades.

Part A

  1. This is problem 4 at the end of Chapter 4. 

    Some of your friends have gotten into the burgeoning field of time-series data mining, in which one looks for patterns in sequences of events that occur over time. Purchases at stock exchanges --- what's being bought --- are one source of data with a natural ordering in time. Given a long sequence S of such events, your friends want an efficient way to detect certain "patterns" in them --- e.g. they may want to know if the four events

    buy Yahoo, buy eBay, buy Yahoo, buy Oracle

    occur in this sequence S, in order but not necessarily consecutively.

    They begin with a finite collection of possible events (e.g. the possible transactions) and a sequence S of n of these events. A given event may occur multiple times in S (e.g. Yahoo stock may be bought many times in a single sequence S). We will say that a sequence S' is a subsequence of S if there is a way to delete certain of the events from S so that the remaining events, in order, are equal to the sequence S'. So for example, the sequence of four events above is a subsequence of the sequence 

    buy Amazon, buy Yahoo, buy eBay, buy Yahoo, buy Yahoo, buy Oracle

    Their goal is to be able to dream up short sequences and quickly detect whether they are subsequences of S. So this is the problem they pose to you: Give an algorithm that takes two sequences of events --- S' of length m and S of length n, each possibly containing an event more than once --- and decides in time O(m + n) whether S' is a subsequence of S.

Part B

  1. This is problem 19 at the end of Chapter 4. 

    Every September, somewhere in a far-away mountainous part of the world, the county highway crews get together and decide which roads to keep clear through the coming winter. There are n towns in this county, and the road system can be viewed as a (connected) graph G = (V,E) on this set of towns, each edge representing a road joining two of them. In the winter, people are high enough up in the mountains that they stop worrying about the length of roads and start worrying about their altitude --- this is really what determines how difficult the trip will be.

    So each road --- each edge e in the graph --- is annotated with a number ae that gives the altitude of the highest point on the road. We'll assume that no two edges have exactly the same altitude value ae. The height of a path P in the graph is then the maximum of ae over all edges e on P. Finally, a path between towns i and j is declared to be winter-optimal if it achieves the minimum possible height over all paths from i to j.

    The highway crews are going to select a set E' subset of E of the roads to keep clear through the winter; the rest will be left unmaintained and kept off limits to travelers. They all agree that whichever subset of roads E' they decide to keep clear, it should clearly have the property that (V,E') is a connected subgraph; and more strongly, for every pair of towns i and j, the height of the winter-optimal path in (V,E') should be no greater than it is in the full graph G = (V,E). We'll say that (V,E') is a minimum-altitude connected subgraph if it has this property.

    Given that they're going to maintain this key property, however, they otherwise want to keep as few roads clear as possible. One year, they hit upon the following conjecture: 

    The minimum spanning tree of G, with respect to the edge weights ae, is a minimum-altitude connected subgraph.

     (In an earlier problem, we claimed that there is a unique minimum spanning tree when the edge weights are distinct. Thus, thanks to the assumption that all ae are distinct, it is okay for us to speak of the minimum spanning tree.)

    Initially, this conjecture is a somewhat counter-intuitive claim, since the minimum spanning tree is trying to minimize the sum of the values ae, while the goal of minimizing altitude seems to be asking for a fairly different thing. But lacking an argument to the contrary, they begin considering an even bolder second conjecture: 

    A subgraph (V,E') is a minimum-altitude connected subgraph if and only if it contains the edges of the minimum spanning tree.

    Note that this second conjecture would immediately imply the first one, since the minimum spanning tree contains its own edges.

    So here's the question:

    1. Is the first conjecture true, for all choices of G and altitudes ae? Give a proof or a counter-example with explanation.
    2. Is the second conjecture true, for all choices of G and altitudes ae? Give a proof or a counter-example with explanation.
        
  2.  
    1. Consider the Change-Making Problem: choose the minimum number of coins when making change. A simple greedy algorithm works for the US coin system (i.e., always choose the largest possible coin first). Does the same simple greedy strategy always give the minimum number of coins for every potential coin system? Provide either an algorithm (and proof) or a counterexample.
    2. You are given a set of boxes to be packed into bins.  All the boxes have the same width and depth (the same as the width and depth of the bins), but they have different heights.  The heights are given in a list H=(h1,...,hn).  The goal is to pack the boxes in bins using as few bins as possible.  Consider the following greedy algorithm for this problem:
      FirstFit (H):
      	Sort H from largest to smallest;
      	for i = 1 to n do
      		Find the first bin that has enough room for H[i];
      		Place box i in this bin;
      end FirstFit.
      Does this algorithm always use the minimum number of bins? Either prove the algorithm produces an optimal solution or give an example to show that it does not.

Part C

  1. This is problem 6 at the end of Chapter 4.

    Your friend is working as a camp counselor, and he is in charge of organizing activities for a set of junior-high-school-age campers. One of his plans is the following mini-triathalon exercise: each contestant must swim 20 laps of a pool, then bike 10 miles, then run 3 miles. The plan is to send the contestants out in a staggered fashion, via the following rule: the contestants must use the pool one at a time. In other words, first one contestant swims the 20 laps, gets out, and starts biking. As soon as this first person is out of the pool, a second contestant begins swimming the 20 laps; as soon as he/she's out and starts biking, a third contestant begins swimming... and so on.)

    Each contestant has a projected swimming time (the expected time it will take him or her to complete the 20 laps), a projected biking time (the expected time it will take him or her to complete the 10 miles of bicycling), and a projected running time (the time it will take him or her to complete the 3 miles of running). Your friend wants to decide on a schedule for the triathalon: an order in which to sequence the starts of the contestants. Let's say that the completion time of a schedule is the earliest time at which all contestants will be finished with all three legs of the triathalon, assuming they each spend exactly their projected swimming, biking, and running times on the three parts. (Again, note that participants can bike and run simultaneously, but at most one person can be in the pool at any time.) What's the best order for sending people out, if one wants the whole competition to be over as early as possible? More precisely, give an efficient algorithm that produces a schedule whose completion time is as small as possible.