CS312 Lecture 24: Hash Tables

CS312 Lecture 24: Hash Tables

Data Sets

Until now we have used different types of sets to store values, as lists and trees. The following table shows the running time associated with different operations over this structures:
Set type Insert Delete Member
Link list O(1) O(n) O(n)
Red Black trees O(log n) O(log n) O(log n)

We are interested in improving this results. For that, we will introduce a structure that will take time O(1) in all the above operations.

Hash Tables

The basic idea is to define a Map as a set of (key,value) pairs. Map is nothing else but a partial function from keys to values. When the keys are string we say that this map is a Dictionary.

A mutable map is a map where an element (key-value pair) can be removed or changed after it was inserted. We call this kind of maps a Hash Table.

The running time is obtained by exploiting the fact that arrays have O(1) access to any position. We define a bucket as a block of this array where we can store one element of the map.

As we are storing our elements in an array, we would like to compute an index from the key of the element. This index will allow us to choose where to store the element in the array. The function that computes this indexes is called a hash function .

What if the hash function returns the same index for two different keys? This is a case where there is a conflict, it generally happens in one of the following situations:

There are many ways to solve conflicts. A simple approach is to store a list of elements on each bucket, but if the load factor is too high, then the structure will start behaving like a linked list, decreasing the performance we were looking for.

There are some hash functions frequently used. For instance modular hashing that takes a integer key and produces the modulus on a base m = 2p. Multiplicative hashing with a integer key, computes k*m/2p mod 2g, with an appropriate choice of p and q. These functions in general are well behaved.

Practical use

An immediate use of a hash table would be to represent the bindings-environment in our evaluator. We could implement our environment as a hash table, where the keys-values correspond to names(bindings)-values.

This could work to define a top level environment, but how do we model nested environments? For instance, if we have

val x = 2;
let x = 4 in x end;

We could do this by creating a list of values in each bucket. Inserting and deleting them as we enter and exit the scope of local environments.

On next lecture we will look closely to several implementations for this problem.


CS312  © 2002 Cornell University Computer Science