# Discussion 12 handout

### Group members (names & NetIDs)

## Objectives

- Add auxiliary properties to objects
- Share data with recursive helper functions
- Modify depth-first search to implement topological sort
- Extend topological sort with filtering

Note: the skills practiced here will be very useful when completing A6.

## Overview

Download the dis12 code and open it as a project in IntelliJ. Briefly explore the `Graph`

and `Vertex`

interfaces; you will only need a few of their methods for this activity:

`Vertex::label()`

: Observe the vertex’s label`Vertex::successors()`

: Allow iterating over other vertices linked via outgoing edges`Graph::vertices()`

: Allow iterating over all vertices in a graph`OopGraph.readGraph()`

: Static factory method for reading directed graphs from a file.

Next, look at `Main1`

, which uses `Vertex`

’s factory method to load a graph from “courses.txt”. This graph is a simplified representation of the prerequisites of Cornell’s Computer Science courses (ignoring corequisites and equivalencies); edges point from prerequisites to the courses that depend on them. The goal of `Main1`

is simply to print every course as discovered by depth-first search from multiple starting vertices; later, you will make modifications to print the courses in topological order, and then to only print the courses on the prerequisite chain of the courses that *you* want to eventually take.

Finally, note `VertexState`

: it succinctly defines an `enum`

type for the “discovery states” that vertices go through during traversal; you can refer to a value as if it were a static field; e.g. `VertexState.SETTLED`

is the value to use to indicate that all of a vertex’s successors have been discovered by the traversal.

## Implementing depth-first search

Depth-first search can be expressed very succinctly using recursion:

```
/** Settle all vertices reachable along undiscovered paths from `start`.
* Requires: `start` is undiscovered. */
dfsVisit(Vertex start) {
start.state = DISCOVERED;
for (v : start.successors()) {
if (v.state == UNDISCOVERED) {
dfsVisit(v);
}
}
g.state = SETTLED;
}
```

But there are some potential obstacles to implementing this as written:

- How should the starting vertex
`start`be selected? - What if the user wants to explore the entire graph? If the graph has disconnected components, no single
`start`will suffice. - The notion of “discovery state” is convenient for the algorithm, but what if the
`Vertex`

type doesn’t have a`state`

property that the algorithm can modify?

Start with the last obstacle. **What data structure** could the algorithm employ itself to efficiently associate each vertex with a state without modifying the `Vertex`

class?

Now that you’ve identified a way to track additional properties associated with each vertex, you need to share that data with each recursive invocation, and you need to create that data structure in the first place. Having each recursive call create its own copy of the data structure wouldn’t work, because then the data wouldn’t be shared between calls. This suggests that the task should be assigned to a different method, rather than our recursive `dfsVisit()`

. This is a common pattern, and we’ve seen it before with trees: the other method becomes a “driver,” while the recursive method serves as its helper function.

In `Main1`

, method `dfs()`

serves as the driver, while `dfsVisit()`

is the recursive helper. Note how the driver is responsible for looping over all possible starting vertices, resolving the concern above.

**Modify the signature** of `dfsVisit()`

so that you can provide each call with state data for all vertices.

Then **construct an appropriate data structure** for storing that state data at the appropriate place in the driver. Does this data structure need to be initialized with any data? If so, write the initialization; if not, explain how you can get by without it? (Hint: either answer can be correct.)

Finally, translate the pseudocode of each method into Java. Run your program to verify that all courses are printed.

### Returning the vertices

Right now `dfs()`

prints the label of each course it visits, but it would be a more versatile function if it instead returned a list of these vertices. Since Java’s standard collection classes are *mutable*, this is again a situation where it is most efficient to share objects between multiple recursive calls. **Modify the signatures** (and specifications) of `dfs()`

and `dfsVisit()`

so that `dfs()`

returns a list of the vertices visited (in the same order they would have been printed). Then modify `main()`

to print the contents of this list to check your work.

## Topological sort

By printing or appending vertices at the top of `dfsVisit()`

, we are processing the vertices in *discovery order* (that is, we do something special with a vertex before visiting its neighbors, analogous to *preorder* traversal in trees).
It turns out that, in order to perform a topological sort using DFS, we simply need to process vertices in *reverse settled order* (analogous to *reverse postorder* in trees).
“Settling” means processing the vertex only after discovering all of its neighbors, while the reversal can be achieved by prepending, rather than appending, to the list that is returned.

You will make these changes in `Main2`

, which has updated method names and specifications accordingly. Feel free to copy your solution from `Main1`

as a starting point.

Since you will be *prepending* vertices to the results list, **which implementation of the List interface** should your driver construct and hand to its recursive helper? Why?

Modify your previous implementation of `dfsVisit()`

to satisfy the new specifications and thus help perform a topological sort. Note that a topological sort is impossible if there is a cycle in the graph, but DFS can detect cycles if it ever encounters a successor that is already discovered but not settled.

Check your results against the graph above; do courses always come somewhere after their prerequisites?

## Filtering results

Most students do not take every CS class that the department offers; instead, they are interested in the topologically sorted *subgraph* that fulfills all of the prerequisites of the courses they want to take. You will implement this filtering in `Main3`

(again, feel free to copy your solution from `Main2`

as a starting point).

In `Main3`

, `dfsVisit()`

now returns a value. Read the specifications carefully, then **modify your implementation** to compute the appropriate return value. This should feel familiar to the “counting pattern” used in some recursive methods on trees.

An important case to consider is what happens when a successor is already marked as settled. This does not indicate a cycle; it simply means that an earlier search started in the middle of the dependency structure, while the current search started earlier (in a topological sense) and has now caught up with the previous results. **How can you determine** whether the previously searched vertex was on a path to a target?

Try out your new course planning tool! List the CS classes you are most interested in (e.g. “CS4450 CS4120”) in the program arguments and run the program, and it will show you which prerequisites to take in which order.

## Submission

- Open the assignment page for “Discussion activity 12” in CMSX
- [Recorder] Find the “Group Management” section and invite each group member
- [Others] Refresh the page and accept your invitation
- [Recorder] Take a picture of your work and save as either a JPEG or a PDF file named “discussion_responses”.
*After all invitations have been accepted*, upload your picture along with your code as your group’s submission.- Recommended scanning apps: Microsoft Office Lens, Adobe Scan, Genius Scan, Evernote Scannable

Ensure that your group is formed and your work submitted before the Friday evening deadline.

### Tips and reminders

- Discussion is not a race. Focus on the current activity being facilitated by your TA and engage your whole group to propose and explain ideas.
- Elect a recorder to maintain the “master” copy of your work (others are still encouraged to jot down ideas on scratch paper). Rotate this position each week.
- It is a violation of academic integrity to credit someone for work if they were not present when the work was done, and the whole group is accountable. Your CMS group must only include classmates who attended section with you on that day. Remember that our participation policies accommodate occasional absences without penalty.
- It is your individual responsibility to ensure that you are grouped on CMS and that your group’s work is submitted before the deadline. Log into CMS the day after section to ensure that everything is in order, and contact your groupmates if not. It would be prudent for multiple members to photograph the group’s work.
- Only one group member (typically the recorder) needs to upload a submission. But their submission must not be uploaded until after all group members have confirmed their membership in CMS (contact your TA if you have trouble grouping in CMS).