Hashing

HASHING

We introduce hashing, in which a hash table is used to implement a set. The amazing point is that determining whether a value e is in the set takes expected constant time O(1), requiring on the average about two tests, or probes, of e to see whether e is in the set, even if the set contains more than 1,000 elements. We start by reviewing the speed of a simple data structure to implement a set.

A standard array implementation of a set requires time O(n)

We can implement a set using an array b and an int variable n, with the set elements being in b[0..n-1]. However, to add an element to this set requires expected and worst-case time O(n), even if we keep the array sorted (say, in ascending order). This short video explains why. If you know this fact already, you don't have to watch this video. (1:50 minutes) 01ArrayImplementation.pdf

A hash table for sets of Objects (part 1)

We illustrate the basic idea behind hashing in Java, by providing an implementation for a set of Objects using "chaining". There is a caveat: None of the Objects in the set can override class Object's functions equals and hashCode. We'll relax this caveat in a later video. This is only a partial, basic implementation (3:20 minutes) 02hashing.pdf

A hash table for sets of Objects (part 2)

We now extend the hash table data structure discussed in part 1 by showing how to create a larger hash table when the "load factor" gets above 0.75. This causes us to introduce the notion of amortization. Adding an element takes amortized time O(1). (5:35 seconds). 03loadFactor.pdf

Now that you know a little bit about hashing, read this 2-page pdf file to learn about hash functions.

Writing functions equals and hashCode

The work discussed above uses function equals and hashCode in class Object. We now discuss overriding functions equals and hashCode. The requirement is that if e1.equals(e2) then this must hold: e1.hashCode = e2.hashCode(). We give a few examples of hash codes. (3:23 minutes). 04equalsHashcode.pdf

==========================================

The idea behind open addressing with linear probing

You now know the basics of hashing. Pretty simple, but powerfully efficient. You saw hashing with chaining, in which each bucket is a LinkedList ---a doubly linked list. We now introduce open addressing with linear probing, in which linked lists are not used. The main advantage is the saving of space. (2:45 minutes). 05LinearProbing1.pdf

The class invariant and two methods used in open addressing with linear probing

Based on the previous video, we declare and describe the fields needed for open addressing with linear probing. We then write two of the methods: linearProbe and add. (4:30 minutes). 06LinearProbing2.pdf

2-page discussion of linear probing, quadratic probing, and double hashing: pdf file.
Paper titled How Caching Affects Hashing: pdf file.
2-page intro to cuckoo hashing: pdf file.
2-page intro to Robin Hood hashing: pdf file.