CS410: Solutions to Review Questions for Midterm

Solutions (mainly) by Prof. K. Kedem.
  1. Solve the following recurrences:
    1. T(n) = 2T(n/2) + lg(n)

      Using the master theorem, we have a=b=2 and f(n) = lg(n). Since lg(n) = O(n1-epsilon) for all epsilon between 0 and 1, we have T(n) = O(n).

    2. T(n) = 16T(n/2) + (n lg(n))4

      I didn't mean to give you this one. If we try to use the master theorem, we have a=16, b=2, f(n) = (n lg(n))4. It's easy to check (try it!) that none of the three cases apply. You should try to show by the substitution method that T(n) = O((n lg(n))5).

      Note: while it wouldn't be fair to ask you for the prelim to guess the right answer for T(n), it would be OK to ask you to verify the answer using the substitution method.

    3. T(n) = 9T(n/3) + n3 lg(n)

      Apply the master theorem. We have a=9, b=3, and f(n) = n3 lg(n). It's easy to see that we're in case 3 (with epsilon between 0 and 1) and 9f(n/3) < n3lg(n)/3 = (1/3)f(n). Thus, T(n) = Omega(n3 lg(n)).

  2. Fill in the table below with the expected time for each operation. Use big-O notation. The operations are insert (place a new item in the data structure), search (test if a given item is in the data structure), getMin (return the value of the minimum item in the data structure and delete it from the data structure), and successor (given an item, return the successor of that item).
    Data Structure insert search getMin successor
    sorted array O(n) O(log n) O(1)* O(log n)
    unsorted array O(1) O(n) O(n) O(n)
    balanced tree (red-black tree) O(log n) O(log n) O(log n) O(log n)
    hashing O(1)** O(1)** O(n) O(n)
    sorted linked list O(n) O(n) O(1) O(n)
    unsorted linked list O(1) O(n) O(n) O(n)

    * In a sorted array, getMin() leaves a gap, but the data doesn't have to be moved to fill in the gap until the next insert.
    ** For hashing, double hashing is assumed, so standard hashtable operations run in constant time.

  3. Short Answer.
    1. What is an abstract data type? Give an example.
      A model and a set of operation on that model. Examples: Dictionary, PriorityQueue.
    2. What is a data structure? Give an example.
      A way of organizing data. Examples: array, red-black-tree, singly linked list.
    3. Order the following running times from fastest to slowest. All logarithms are base 2.
      O(n), O(1), O(n2), O(log n), O(n log n), O(n/(log n)).
      O(1), O(log n), O(n/(log n)), O(n), O(n log n), O(n2)
    4. Where is the smallest element in a red-black tree?  In a B-tree?
      In both cases, it's the leftmost item in the node you reach by starting at the root and always moving left.
    5. Explain why double hashing is better than hashing with linear probing.
      With linear probing, clustering can occur. Double hashing distributes the data more evenly.
    6. Why must care be taken to delete an item from a hashtable with open addressing?
      If you simply leave an empty spot then it can look like an item is not in the table even when it is. So you have to leave a ``deleted'' mark.
    7. When the subject is balanced trees, what does rotation mean?
      This is the process of changing the root of a subtree. One of the children of this node becomes the new root of the subtree. The order of the data is not affected by this operation.

      The following questions refer to an implementation of an ADT with operation Insert, Delete, and isMember.
    8. Under what conditions would you use a red-black tree instead of hashing with chaining?
      If I needed a guaranteed worst-case time of O(log n) for Dictionary operations.
    9. Under what conditions would you use an unordered array instead of a red-black tree?
      If I were short of space or if the set to be stored is small.
    10. What implementation would you use to get the best expected time for a search?
      Hashing with table doubling.  Expected search time is O(1).
    11. What implementation would you use to get the best worst-case time for a search?
      Any balanced tree scheme (234-tree or red-black tree) or even sorted array provide worst-case time O(log n) for a search.
  4. Build a binary search tree by inserting the following integers: 7, 0, 5, 2, 4, 6, 3, 8, 1. What is the order of the integers if we do a preorder traversal of the tree? An inorder traversal? A postorder traversal?
    
    
    
            7
    
    
    
           / \
    
    
    
          0   8
    
    
    
           \
    
    
    
            5
    
    
    
           / \
    
    
    
          2   6
    
    
    
         / \
    
    
    
        1   4
    
    
    
           /
    
    
    
          3
    
    
    
    
    
    
    
    Preorder:  705214368
    
    
    
    Inorder:   012345678
    
    
    
    Postorder: 134265087
    
    
    
    
  5. For each of the following problems, choose the best of the listed data structures and explain why your choice is best. Where several operations are listed, you should assume, unless stated otherwise that the operations occur with about equal frequency.
    1. Operations are Insert, DeleteMax, and DeleteMin.  DeleteMax (Min) operation deletes the data with largest (smallest) key.
      balanced tree or sorted doubly-linked list
      The balanced tree is better since all operations take O(log n) time. The sorted doubly-linked list is O(1) for DeleteMax and DeleteMin, but Insert is O(n); thus, the average time per operation is O(n).
    2. Operations are Insert and FindMedian. (The median is the item m such that half the items are less than m and half are greater than m.)
      red-black trees or sorted array
      You can use two red-black trees plus an additional variable to hold the median, one red-black tree for items less than the median and one for items greater than the median. When you insert, you can keep track of the median by moving items from one tree to the other. With this scheme, Insert takes O(log n) time and FindMedian take O(1) time. The sorted array takes O(n) time for Insert.
    3. Operations are Insert and FindMedian.
      sorted linked list or sorted array
      Here, the sorted array is best. Both take O(n) time for insert while the sorted array can be used to do FindMedian in O(1) time.
    4. A large dictionary (operations: Insert, Delete, Find), but we must conserve space.
      balanced tree or singly linked list or array
      Because of space consderations the array is best; the others require space for pointers.
    5. Items are integers between 1 and 100. There is data associated with each item. Operations are Insert, Delete, and Find. Duplicate items do not occur.
      array or linked list
      The best solution here is to use an array indexed from 1 to 100. With this scheme, all operations can be done in O(1) time.
    6. You have a dictionary that contains at most 10 words.
      red-black tree or unordered array
      For a set this small, the unordered array is faster; a red/black tree requires a lot of overhead.
    7. You have a dictionary that can contain anywhere from 100 to 10,000 words.
      unordered linked-list or red-black tree
      The red/black tree is better, since the operations require O(n) time for the linked-list and O(log n) time for the red/black tree. For 10,000 words this could certainly be significant.
    8. You have a large set of integers with operations insert, findMax, and deleteMax.
      unordered array or hashing with linear probing
      Neither data structure is good for this problem. An unordered array is slightly better since it has less overhead and it's easier to program.
  6. You have a hashtable of size m=11 and (not very good) hash functions h1 and h2:

    h1(x) = (sum of the values of the first and last letters of x) mod m
    h2(x) = ((value of the last letter) mod (m-1)) +1

    where the value of a letter is its position in the alphabet (e.g., value(a)=1, value(b)=2, etc.). Here are some precomputed hash values:
    word ape bat bird carp dog hare ibex mud koala stork
    h1 6 0 6 7 0 2 0 6 1 8
    h2 6 1 5 7 8 6 10 5 2 2

    Draw a picture of the resulting hashtable after inserting, in order, the following words: ibex, hare, ape, bat, koala, mud, dog, carp, stork. Also highlight cells that are looked at when trying to find bird. Do this for each of the following techniques.

    chaining linear probing double hashing
    0 ibex bat dog
    1 koala
    2 hare
    3
    4
    5
    6 ape mud
    7 carp
    8 stork
    9
    10
    0 ibex
    1 bat
    2 hare
    3 koala
    4 dog
    5
    6 ape
    7 mud
    8 carp
    9 stork
    10
    0 ibex
    1 bat
    2 hare
    3 koala
    4
    5 mud
    6 ape
    7 carp
    8 dog
    9
    10 stork

    The cells that are visited when searching for bird are:

    • ape mud
    • ape mud carp stork
    • ape ibex mud stork

    Why not use h2(x) = (value of the last letter) mod m? (i.e., why did we include -1 and +1?)
    h2 cannot be allowed to return a value of zero. This would make the increment zero, thus successive probes would all land in the same place.

    Why not use the same function for both hash functions?
    This would lead to clustering, since any two words that had the same index on their first probe would have the same index on all subsequent probes. Also, this strategy could allow an increment of zero.

  7. We wish to implement a RangeSearch operation on sets of integers. For example, RangeSearch(a,b) will return all the integers c in the set such that a<c<b. The time for a RangeSearch will be of the form O(X+i) where i is the number of integers within the given range and X is some function that depends on n, the number of items in the set. (We are stuck with i here since it takes that much time just to print the answer to a given RangeSearch query.) For each of the following data structures, determine the best value of X in the RangeSearch query time. Give a short explanation of why you think that amount of time is needed.
    (a) Sorted array. (b) Balanced tree. (c) Hashtable.
    Sorted array: X = log n. Do a binary search on a then report all integers from there until b is reached.

    Balanced tree: X = log n. O(log n) time is needed to find a and b in the tree; branches of the tree that lead to numbers outside (a,b) can be marked at this time. The resulting integers can be reported in O(i + log n) time by doing a traversal of the unmarked part of the tree.

    Hashtable: X = n. The hashtable doesn't have any structure to it that we can use. Basically, we have to walk through the entire table, reporting any numbers between a and b as we go.
  8. Suppose you are given the following information about a hashtable.
    Space Available (in words) 10000
    Words per Item 7
    Words per Pointer 1
    Number of Items 1000
    Proportion Successful Searches 1/3

    Which hashing method, chaining or double hashing, looks best (i.e., the one that we expect to use the least amount of time)?  You should use the following assumptions:

    • It takes twice as long to compute a hash function as it does to do a key comparison.
    • For hashing with chaining use singly linked list.
    • Double hashing behaves like random hashing.
    • During double hashing, we always compute both hash functions, even when we find the item we're looking for on the first probe.
    • Items are not copied directly into the hashtable; instead we maintain a link to the item.  Note though that the items themselves still use up some of the space.

    We need to compute the value of the load factor, alpha.  In both cases, the items themselves take up 7000 words, leaving 3000 words for the hashtable.

    For chaining, we use 2 words per list-node (ptr to data and ptr to next in chain) and we use 1000 nodes.  Thus 2000  words are used for the items; this leaves 1000 words for the table.  The load factor, alpha, is thus 1000/1000 = 1.   The expected time for an unsuccessful search is 1+alpha = 2.  The expected time for a successful search is 1+alpha/2 = 1.5.  Using the data on the proportion of successful searches, we conclude that the expected time for a search is about 1.83333... = 11/6.

    For double hashing, store the pointers to the data directly into the hashtable; thus, all 3000 words are available for the hashtable.  The load factor, alpha, is 1000/3000 = 1/3.  The expected time for an unsuccessful search is 1/(1-alpha) or 3/2.  The expected time for a successful search is (1/alpha)ln(1/(1-alpha)) or 1.2164.  Using the data on the proportion of successful searches, we conclude that the expected time for a search is about 1.41.

    For this situation, the expected search time for double hashing is better than the time for separate chaining, but we still have to take into account the extra hash function computed during double hashing.  A hash function computation costs about as much as 2 probes, so separate chaining costs about 3.833 probes while double hashing costs about 5.41 probes.  Conclusion: separate chaining will be better.

  9. Suppose we wish to insert the keys D A T A S T R U C T U R E into a Dictionary.   Assume we are using a tree-based implementation. Further, assume that identical keys share a single node (i.e., their data reside in a linked list attached to this single node). 
    1. Show the resulting Binary Search Tree.
    2. Show the resulting 2-3-4 Tree.
    (You were only asked to do the BST, but I've included the 234-Tree for your edification.)

    Here are the solutions for the BST and the 234-Tree. 

  10. BST  
        D
    
    
    
      /   \
    
    
    
    A       T
    
    
    
     \     / \
    
    
    
      C   S   U
    
    
    
         /
    
    
    
        R
    
    
    
       /
    
    
    
      E 
    
    
    
         ( D S )
    
    
    
         /  |  \
    
    
    
    (A C) (E R) (T U)
     

    Assume now that we are using a hashtable of size 17 with the hash function h(x) = x mod 17 for the letter at position x in the alphabet (e.g., A=1, B=2, etc.).

    3. Show the hashtable that results when we use chaining.
    4. Show the hashtable that results when we use linear probing.
    5. Show the hashtable that results when we use double hashing. Use g(x) = 1 + (x mod 13) for the second hash function.

    Chaining Linear Probing Double Hashing
     0:
    
    
    
     1: A R
    
    
    
     2: S
    
    
    
     3: T C
    
    
    
     4: D U
    
    
    
     5: E
    
    
    
     6:
    
    
    
     7:
    
    
    
     8:
    
    
    
     9:
    
    
    
    10:
    
    
    
    11:
    
    
    
    12:
    
    
    
    13:
    
    
    
    14:
    
    
    
    15:
    
    
    
    16:
    
    
    
    
     0:
    
    
    
     1: A
    
    
    
     2: S
    
    
    
     3: T
    
    
    
     4: D
    
    
    
     5: R
    
    
    
     6: U
    
    
    
     7: C
    
    
    
     8: E
    
    
    
     9:
    
    
    
    10:
    
    
    
    11:
    
    
    
    12:
    
    
    
    13:
    
    
    
    14:
    
    
    
    15:
    
    
    
    16:
    
    
    
    
     0:
    
    
    
     1: A
    
    
    
     2: S
    
    
    
     3: T
    
    
    
     4: D
    
    
    
     5: E
    
    
    
     6:
    
    
    
     7: R
    
    
    
     8:
    
    
    
     9:
    
    
    
    10:
    
    
    
    11: C
    
    
    
    12:
    
    
    
    13: U
    
    
    
    14:
    
    
    
    15:
    
    
    
    16: