Discussion 10: Reverse Indexing with Hashing

Compression was fun, but do you know how to efficiently reshape data for a different access pattern?

In today’s discussion, you’ll flip a map of class rosters from courses → students (that is, associating each course with its list of enrolled students) into students → courses (that is, associating each students with their list of enrolled courses) using Java’s hashing collections (HashMap, LinkedHashMap). This gives practice with hash-based maps and stable iteration order while enforcing invariants

Learning Outcomes

Use Java’s hash-based maps to traverse and reorganize nested collections.
Enforce/document iteration order (first-seen via LinkedHashMap) and de-duplicate efficiently.
Reason about asymptotic costs for HashMap/HashSet operations and iterator semantics.

Reminder: Discussion Guidelines

The work that you complete in discussion serves as a formative assessment tool, meaning it offers the opportunity to assess your understanding of the material and for our course staff to get a “pulse” on how things are going so we can make adjustments in future classes. Your grade in discussion is based entirely on attendance and participation; if you show up and you are actively engaged with the activity (working on the activity on your computer, discussing the activity with your group, asking and answering questions, etc.) for the entire 50-minute section period, you will earn full credit. If you complete the activity early, helping other students is a great way to further your own understanding. You do not need to submit any of the work that you complete during discussion.

Since discussion activities are not graded for correctness, we do not place any restrictions on resources that you may use to complete them, which include notes, books, unrestricted conversations with other students, internet searches, and the use of large language models or other generative AI. We advise you to be pragmatic about your use of these resources and think critically about whether they are enhancing your learning. Discussion activities are intended to serve as “strength training” for programming tasks we will expect on assignments and exams (and that you will encounter in future courses and careers), and their main benefit comes from critically thinking to “puzzle” them out.

Working together in small groups is encouraged during discussion!

Background: Reverse Indexing with Hash Maps

Today, you are given a forward index (hash map of rosters): Map<Course, List<Student>> rosters that you’ll clean (creating a new Map<Course, List<Student>> coursesToStudents) and use to generate a reverse index (hash map of schedules): Map<Student, List<Course>> studentsToCourses Then, you’ll use both of these indices to efficiently answer queries about their data.

Concretely, we are trying to answer the inverse question efficiently:

Given a course, find its students. (Forward index you already have.)
Given a student, find their courses. (Reverse index you’ll build.)

Hash maps shine here because they provide expected O(1) inserts/queries and do not force you to sort or scan all entries.

Ordering within hashed and nested structures

When we created the forward index (and later will traverse it to construct the reverse index), there may have been some significance to the order in which the courses were added (for example, sorted by course number, arranged by college, etc.). We might want to preserve this order and also have it reflected in the students’ course lists in the reverse index.

In general, we don’t have any guarantee of how the (key, value) pairs will be arranged in the underlying memory (i.e., the buckets). In particular, we won’t be able to easily recover the insertion order of these pairs given just the array of buckets. We generally think about the hash function as “scrambling” up the entries, which could cause us to lose track of this order. As a result, the Java library includes a LinkedHashMap class that supports the iteration over its keys/entries in insertion order.

Data cleaning: duplicate and null entries

Real rosters may include some repeated and missing data. Some students may have enrolled in the course, dropped it, and re-enrolled later. Other students may have transferred out of the university after registering for classes, leaving an empty spot in the roster. When you construct your index you should skip over null entries. You should also remove duplicate entries so that each student appears at most once on the roster of each course and each course appears at most once on the schedule of each student. You should preserve the order of first occurrence of each student.

Your Tasks

Exercise 1: Reverse Index by Hand

In this exercise you will convert the (uncleaned) forward map below to its reverse counterpart. As you do this, keep track of what steps you are taking. Which tasks are you repeating for each table row, or for each entry in one of the lists in the second column? These will correspond to loops in your code. What checks are you performing before writing new information into the reverse map?

Notes:

As mentioned above, any null student should be ignored.
As mentioned above, a student may appear multiple times in a course list; keep only the first occurrence of that student.
The first-seen student order is determined by scanning the table top-to-bottom, left-to-right.

Forward map (courses → students)

Course	Enrolled Students (in order)
"CS 1110"	["s1111", "s1234", "s9001"]
"CS 2110"	["s1234", "s9001", "s1111", "s9001", "s2222"]
"CS 2800"	["s7777", "s1111", `null`, "s1234"]
"CS 5154"	["s9001", "s2222", "s7777"]

Reverse map (students → courses)

(You may not need all of the rows.)

Student	Enrolled Courses (in order)

Exercise 2: Programming the Reverse Index

Now that we have a high level understanding, lets get ready to implement this! Download the starter code and take a couple minutes to read through the functionality that is already provided.

Download Starter Code download

The provided Course and Student record are very simplistic, but could be expanded with additional fields and functionality to build up a more realistic course registrar system (like Cornell's Student Center).

All of the code that you will write will be in EnrollmentIndex.java.

(a)

buildReverseIndex() Constructor

Most of the work of the EnrollmentIndex class is done by its constructor, which initializes both its forward and reverse indices to satisfy their class invariants. Complete the definition of this constructor to build these indices in one pass over the rosters map. It may help to reference the Map interface documentation to remind yourself what operations are available.

(b)

Index Query Methods

Complete the definitions of the isEnrolled(), courseCountOf(), and enrollmentOf() methods, which run small queries against the indices that you built. Determine which index (forward or reverse) to use to get the best performance. Make sure that you carefully consider the case that a student or course may not be present in the index to avoid propagating any exceptions.

Determine the runtimes of your implementations of these methods. If you only had the forward index coursesToStudents, which of these operations would have a strictly worse runtime?

(c)

Runtime Analysis

Determine the runtimes of your implementations of the methods from part (b) in terms of \(|S|\), the total number of students, and \(|C|\), the total number of courses.

If you only had the forward index coursesToStudents, which of these operations would have a strictly worse runtime? Explain your answer.

(d)

Iterator Methods

Complete the definitions of the students(), coursesOf(), and studentsIn() methods, which each return an Iterator. In the case of a student or course that is not present in the index, return Collections.emptyIterator(). You should not need to define any nested iterator classes to implement these methods.

Once you have completed your code, you can run the EnrollmentIndexDemo to confirm visually that your constructor correctly builds the reverse index and that your accessor methods work as intended. You can also run the provided test cases in tests/EnrollmentIndexTest.java to confirm the correctness of your implementation.