- Fill in the table below with the expected time for each operation. Use big-O
notation. The operations are insert (place a new item in the data structure), member (test
if a given item is in the data structure), getMin (return the value of the minimum item in
the data structure and delete it from the data structure), and successor (given an item,
return the successor of that item).
| Data Structure |
insert |
member |
getMin |
successor |
| sorted array |
O(n) |
O(log n) |
O(1) |
O(log n) |
| unsorted array |
O(1) |
O(n) |
O(n) |
O(n) |
| balanced tree (red-black tree) |
O(log n) |
O(log n) |
O(log n) |
O(log n) |
| hashing |
O(1)* |
O(1)* |
O(n) |
O(n) |
| sorted linked list |
O(n) |
O(n) |
O(1) |
O(n) |
| unsorted linked list |
O(1) |
O(n) |
O(n) |
O(n) |
*I am assuming that table doubling is being used for
the Hashtable.
- Short Answer.
- Where is the smallest element in a red-black tree? In a 234-tree?
In both cases, it's the leftmost item in the node you reach by starting at the root
and always moving left.
- When the subject is balanced trees, what does rotation mean?
This is the process of changing the root of a subtree. One of the children of this
node becomes the new root of the subtree. The order of the data is not affected by this
operation.
- What is path compression in the union/find algorithm?
When doing the find operation, each node passed on the way to the root is made to
point at the root.
- How long does it take to insert a new element into a heap? To return the smallest thing
in a min-heap? To delete the smallest thing in a min-heap? To find the largest thing in a
min-heap?
insert = O(log n); findMin = O(1); deleteMin = O(log n). Finding the largest thing in
a min-heap is expensive: O(n).
- The following picture represents a 234-tree.
24
/ \
2 , 22 30 , 40 , 48
/ | \ / / \ \
1 5,11,19 23 29 32,36 41,42,43 50
- Draw an equivalent red-black-tree.
I'll use parens () to indicate a red node. The 3-nodes (2,22) and (32,36) can be
represented in more than one way. 24
/ \
22 40
/ \ / \
(2) 23 (30) (48)
/ \ / \ / \
1 11 29 36 42 50
/ \ / / \
(5) (19) (32) (41) (43)
- Draw a picture of the 234-tree that results from inserting 31 into the original 234-tree.
24 , 40
/ | \
2 , 22 30 48
/ | \ / \ / \
1 5,11,19 23 29 31,32,36 41,42,43 50
- For each of the following problems, choose the best of the listed data structures and
explain why your choice is best. Where several operations are listed, you should assume,
unless stated otherwise that the operations occur with about equal frequency.
- Operations are Insert, DeleteMax, and DeleteMin.
balanced tree or sorted doubly-linked list
The balanced tree is better since all operations take O(log n) time. The sorted
doubly-linked list is O(1) for DeleteMax and DeleteMin, but Insert is O(n); thus, the
average time per operation is O(n).
- Operations are Insert and FindMedian. (The median is the item m such that half the items
are less than m and half are greater than m.)
red-black trees or sorted array
You can use two red-black trees plus an additional variable to hold the median, one
red-black tree for items less than the median and one for items greater than the median.
When you insert, you can keep track of the median by moving items from one tree to the
other. With this scheme, Insert takes O(log n) time and FindMedian take O(1) time. The
sorted array takes O(n) time for Insert.
- You have a dictionary containing the keywords of the Pascal programming language.
ordered array or red-black tree
In this situation the ordered array is best. Both data structures take time O(log n) to
find an item. An ordered array takes longer to insert or delete, but we don't expect to be
creating or destroying keywords, so there shouldn't be any insertion or deletion. The
ordered array is simpler to program and takes less space.
- You have a dictionary that can contain anywhere from 100 to 10,000 words.
unordered linked-list or red-black tree
The red/black tree is better, since the operations require O(n) time for the linked-list
and O(log n) time for the red/black tree. For 10,000 words this could certainly be
significant.
- You have a large set of integers with operations insert, findMax, and deleteMax.
unordered array or Hashtable
Neither data structure is good for this problem. An unordered array is slightly better
since it has less overhead and it's easier to program.
- You have a hashtable of size m=11 and a (not very good) hash function h:
h(x) = (sum of the values of the first and last letters of x) mod m
where the value of a letter is its position in the alphabet (e.g., value(a)=1, value(b)=2,
etc.). Here are some precomputed hash values:
| word |
ape |
bat |
bird |
carp |
dog |
hare |
ibex |
mud |
koala |
stork |
| h |
6 |
0 |
6 |
7 |
0 |
2 |
0 |
6 |
1 |
8 |
Draw a picture of the resulting hashtable (using chaining) after
inserting, in order, the following words: ibex, hare, ape, bat, koala, mud, dog, carp,
stork. Which cells are looked at when trying to find bird.
| chaining |
| 0 |
ibex |
bat |
dog |
| 1 |
koala |
| 2 |
hare |
| 3 |
| 4 |
| 5 |
| 6 |
ape |
mud |
| 7 |
carp |
| 8 |
stork |
| 9 |
| 10 |
|
- Suppose you are given the following information about a hashtable.
| Space Available (in words) |
10000 |
| Words per Item |
7 |
| Words per Link |
1 |
| Number of Items |
1000 |
| Proportion Successful Searches |
1/3 |
What is the expected number of probes for a search operation when
hashing with chaining is used?
We need to compute the value of the load factor, alpha. The items themselves
take up 7000 words and their links would use another 1000; this leaves 2000 words for the
hashtable (the table of list headers). Thus the load factor alpha = (number of
items)/(number of table positions) is 1000/2000 = 0.5
For chaining, the expected number of probes for an unsuccessful search is alpha or
about 0.5. The expected time for a successful search is 1+alpha/2 or about 1.25.
Using the data on the proportion of successful searches, we conclude that the
expected number of probes for a search is (2*0.5 + 1*1.25)/3 = 0.75.
- Consider a tree implementation for the union/find problem in which the smaller set is
merged to the larger and the name of the set is taken to be the element stored at the root
of the tree. Suppose we initialize our sets so that each integer between 1 and 8
(inclusve) is contained within its own set.
- Give a sequence of seven unions that produces a tree whose height is as large as
possible. Your answer should be a sequence of procedure calls of the form Union(a,b)
where a and b are integers between 1 and 8. Draw the resulting tree.
There are lots of possible sequences that would work here. One of them is: U(1,2),
U(3,4), U(5,6), U(7,8), U(1,3), U(5,7), U(1,5).
- Give a sequence of seven unions, on the original eight sets, that produces a tree of
minimum height. Draw the resulting tree.
Again, there are many correct answers. Here's one: U(1,2), U(1,3), U(1,4), U(1,5),
U(1,6), U(1,7), U(1,8).
- Explain why both the min- and max-height trees use seven unions.
| max height |
min height |
1
/|\
2 3 5
/ | \
4 6 7
/
8
|
1
/| | | | |\
2 3 4 5 6 7 8
|
We are building trees on 8 nodes and all such trees have 7 edges. (You should
be able to prove by induction that all n-node trees have n-1 edges.) Each Union
operation creates a single edge, so to build an 8-node tree, we always need at least 7
Union operations.
- The following questions refer to an implementation of an ADT with operation Insert,
Delete, and isMember.
- Under what conditions would you use a red-black tree instead of hashing with chaining?
If I needed a guaranteed worst-case time of O(log n) for Dictionary operations.
The good results for hashing are only expected time.
- Under what conditions would you use an unordered array instead of a red-black tree?
If I were short of space or if the set to be stored is small.
- Under what conditions would you use a binary search tree instead of a heap?
Always. A BST does each of these operations in O(log n) expected time. A
heap does these operation no better than using an unordered list.
- What implementation would you use to get the best expected time for a search?
Hashing with table doubling. Expected search time is O(1).
- What implementation would you use to get the best worst-case time for a search?
Any balanced tree scheme (234-tree or red-black tree) or even a sorted array.
These all provide worst-case time O(log n) for a search.
- Consider the following program outline where |A| represents the number of items in array
A.
method Review (array A) {
if (|A| > 1) {
Do something to A that takes time O(|A| log(|A|));
Split A into two equal size pieces called B and C;
Review(B); Review(C);
Modify A using O(|A|) time;
}
}
What is the recurrence that describes the time used by this program?
T(n) = 2*T(n/2) + n log n
| 234 Tree (Dictionary) |
Max-Heap (Priority Queue) |
(3 , 5)
/ | \
(1,2) (4) (6)
|
6
/ \
3 4
/ \
1 2
|
For each of the preceding trees, find a sequence of appropriate operations that will
produce it starting from an empty tree.
There are lots of sequences that will do this. Here's one possible sequence for the
234-tree: Insert 2,3,4,5,6,7,1 then Delete 7. Here's one that doesn't use Delete:
Insert 4,5,6,3,2,1. And here's one for the Max-Heap: Insert 6,3,4,1,2.
- Determine the best possible big-O bounds on T(n) where T(n) is defined by the following
recurrence:
T(n) = cT(n/2) + n2 and T(1) = 1. The variable c here represents a
positive constant. You should have 3 different answers depending on the value of c.
For c = 4, the solution is O(n2 log n). For c < 4, the solution is
O(n2). For c > 4, the solution is O(nlog c) where the log
is base 2.
- You are given a k×k checkerboard with a nonnegative number in each square (the square
colors are not significant). A token is moved from square to square on the board.
Each time the token enters a square it is charged the amount written in that
square. Assume that the only legal moves are to-the-right, down, and diagonally
right-down. Give an algorithm that runs in O(k2) time to find the cost of
the minimum cost sequence of moves beginning in the upper left corner of the board and
ending at the lower right corner.
This is very similar to the algorithm used for protein alignment.
best(array B[1..k,1..k]): // B[i,j] is cost in square [i,j].
C[1,1] = 0;
// C[i,j] is best cost
to reach [i,j].
for i = 1 to k do
for j = 1 to k do
C[i,j] = B[i,j] +
min{C[i-1,j],C[i-1,j-1],C[i,j-1]};
return C[k,k];
end.
- Suppose we are given a sequence A = (a1,a2,...,an) of
numbers. Describe an O(n2) time algorithm for finding the longest
subsequence of A such that the numbers in this subsequence are monotonically increasing.
(Hint: For each ai, compute the longest increasing subsequence that ends
at ai.) It's possible, but more difficult, to develop an algorithm that
takes time O(n log n). [from Goodrich & Tamassia 98]
Let S[i] represent the length of the longest sequence that ends at ai.
We know that S[1] is 1. The following algorithm will compute the other
values:
for i = 2 to n do
S[i] = 1 + max0<b<i{S[b] such that a[b]<a[i]};The
idea is that we look over all the preceding longest sequences to find the ones that fit
with the current a[i] (a[i] has to be larger than the last thing to continue the
sequence); then we choose the longest. It's easy to see that this runs in time O(n2).