T-Th 9:05
or
T-Th 11:15
in Olin 155

CS 1110: Introduction to Computing Using Python

Fall 2014

Assignment 4:
Word Puzzles

Due to CMS by Sunday, November 2nd at 11:59 pm.

jumbles
Image Credit: A Fool and His Money

Cross word puzzles. Jumbles. Cryptograms. Word games have been popular since ancient times. Newspapers (aka. web pages on paper) devote entire pages for to these puzzles. If you are great with words, they can be a lot of fun. But if you have trouble unscrambling words, or simply do not know the word the puzzle is using, they can be really frustrating.

Fortunately, with a little bit of coding, word puzzles become really easy. While this might seem like cheating, this actually a matter of debate, particularly when the puzzle uses obscure words like "variegate". But whether or not you think it is fair, that is exactly what you will do in this assignment: cheat at word puzzles.

Of course, cheating is no fun unless you have something to cheat at. In addition to the programming assignment, we have provided you with a simple word game (also written in Python). We want you to solve the word puzzles and submit your saved game as part of the assignment. You are free to solve the word puzzles without your cheat program, but you will be graded on whether your solution is correct. So cheat away!

Update (October 26, 2014): Several students are doing clever (but ugly and slow) ways to get around the recursion in parts C and D. So we are forced to introduce new rules:
  • You must use recursion in Parts C and D
  • You may not make any helper functions beyond the ones we have given you.

A Mild Warning

This assignment is brand new. It has never been used in this class before. We have beta-tested this assignment on other students, but that will not catch all of the errors and misunderstandings that will inevitably pop up. Furthermore, while we have some idea of how long the assignment should take, that is different than how long it will actually take you. The survey for this assignment is incredibly important.

First of all, we highly recommend that you have a partner, even if you have never had a partner for this class before. That will make things a lot easier. In addition, we recommend that you start early, and work on a little bit each day. In particular, we have indicated several suggested mini-deadlines throughout the assignment. These are not enforced, but if you follow them, this assignment should be very manageable.

Finally, make sure that you read all of the instructions before starting. Most of your questions will have answers here somewhere. If not, you should check this site and Piazza regularly for updates and clarifications. If we hear enough complaints that this assignment (or parts of this assignment) are too hard, we will make modifications to adjust the difficulty.

Learning Objectives

This assignment serves several important roles.

  • It gives you experience with writing for-loops.
  • It gives you experience with writing recursive functions.
  • It gives you experience working with text files.
  • It introduces the notion of a prefix map
  • It gives you (more) experience with using asserts to enforce your preconditions.

Table of Contents

Author: W. White


Academic Integrity and Collaboration

Academic Integrity

This is a harder assignment. It is also brand new. The upperclassmen in your dorm have never seen it before, and they may or may not be able to help you. You might be tempted to talk to students in other groups. We would like to remind you of our Academic Integrity Policy.

If you talk to other students, please cite the collaboration in your source code. Depending on the amount of collaboration, we may or may not take off points. We generally reserve point deductions for sharing large amounts of code, and not just a few lines. However, we are not going to explain our deductions ahead of time. The only way to guarantee that you do not lose points is if you do not look at anyone else's code.

The bigger problem is if you do not cite your collaborators at all. If we discover large similarities between two submissions, and neither of you cited the other, we will be forced to hold an Academic Integrity Hearing. Neither of us wants that.

Collaboration Policy

You may do this assignment with one other person. If you are going to work together, then form your group on CMS as soon as possible. If you do this assignment with another person, you must work together. It is against the rules for one person to do some programming on this assignment without the other person sitting nearby and helping.

With the exception of your CMS-registered partner, is not a good idea to look at anyone else's code or show your code to anyone else, in any form what-so-ever. If you do see someone else's code, and it affects how you write your own, then you must cite the source in your submission.


Working with Word Puzzles

The key to solving word puzzles is to have a good dictionary. We mean is a traditional dictionary like Websters, not the dict type built into Python (though that is useful too). On a computer, a dictionary is often just a list of words, one on each line. For example, the file complete.txt provided as part of this assignment is a simple text file with over 130,000 words.

As part of this assignment, you are going to have to read a text file like complete.txt and load it into Python. And after you do load into Python, you are going to have to arrange the words in some special way so that it is easy to search them. In this section, we cover how to do both of these.


Reading from a Dictionary

Reading a file in Python is (relatively) easy. When you open a file, Python creates a file object (e.g. a folder) that behaves sort-of like a sequence. You can use it as the loop sequence in a for-loop, just like a list or a string. However, unlike a list or string, you cannot slice a file.

To access a file, you use the built-in function open. This function takes a string argument, which is the name of the file you want to open. This function creates an object for the file, and returns the name of this object when evaluated. For example, suppose you had a file called 'myfile.txt'. You would process it with the following code:

  file = open('myfile.txt')

  for line in file:
      # Do something with the line
      
  file.close()

The loop variable line contains a string. Python coverts each line (up to the '\n' character) into a string for you automatically. You are now free to slice up the string further, or put it in an accumulator variable.

Note that we close the file when we are done with it. The file is an object and close() is a method. This is not completely necessary -- Python will work without it -- but it is a very good idea to do this. If you do not close the file when you are done, there is a small chance that you might corrupt the file, making it useless for later word puzzles.

If you have more questions about working with files, read Chapter 14 in your text.


Storing Words in Prefix Maps

Once you have loaded the file complete.txt, you now have a list with 130,000 strings in it. Assuming that you stored the name of this list in a variable called wordlist, you can check if a string s is a valid word with the boolean expression

  s in wordlist

This suggests one way to unscramble a string: try all possible orders of the letters, until you find one that is in the word list. However, this approach is going to be very slow. The number of ways to reorder a string with n distinct letters is factorial(n). That means that a word with 10 letters has over 3 million possible orders! There has got to be a better way.

Fortunately, there is. The secret is based on the same principle of autocompletion that you use on your phone. Type a letter into the box on the right. This web page has a very small word list associated with it, and it will suggest all of the words in its list that start with that letter. When you type a second letter, it will narrow this list down to words starting with the first two letters, and so on. This process makes sure that we are always working with strings that are prefixes to a valid word. If you type in a string that is not a prefix to a word, you will get nothing back.

For example, type y into the box above. You will get a small list of words. You will notice that all the words have either a, e, or o. Typing one of these three letters will eliminate the other possibilities from the list. Typing anything other than these three letters causes the list to disappear, as there are no valid words (in our dictionary).

It is clear that this can help us to unscramble words faster. Once we have the first few letters, the number of possible words drops tremendously, so we can limit the number of reorderings to look at.

What we need is some way to take a prefix and list the words that start with this prefix. There are many different ways to do this. In this assignment, you will use what is known as a prefix map. This is not the most efficient way to handle words (in CS 2110 you may learn about an alternative called a trie), but it is straight-forward enough for beginning programmers.

A prefix map is a dict where the keys are strings and the values are lists of single characters. The key is a prefix, and the value is the list of all valid "next" letters. For example, suppose we have a dictionary with the four words 'at', 'ate', 'are', and 'by'. In that case, the prefix map is

  {
    ''   : [ 'a', 'b' ],
    'a'  : [ 'r', 't' ],
    'ar' : [ 'e' ],
    'are': [ '' ],
    'at' : [ '', 'e' ],
    'ate': [ '' ],
    'b'  : [ 'y' ],
    'by' : [ '' ]
  }

The words 'at' and 'ate' illustrate an unusual problem. Sometimes a string is both a prefix of another word, and a word in its own right. To take care of this, we sometimes add the empty string '' to the list of next letters. If the empty string is there (as in the care of 'at'), that prefix is a word. If the empty string is not there (as in the case of 'ar'), then it is not.


Word Puzzles and Recursion

We are going to work with several different types of word puzzles in this assignment, but the solutions will all have the same flavor. They are all essentially variations on "how many words can you make in Scrabble?" Once you get this idea down, you should be able to work through the whole assignment.

To understand what we mean, suppose you have a tile rank in Scrabble, like the one below. We want to figure out all of the words that that you can make from this rack. What do we do?

scrabble_rack

One way to approach it is set aside some space next to the rack that we will call the "prefix space". We will slowly add letters to prefix space. When we form a word, we add it to the list. We then check to see if there are any other words with this prefix (which we can do with a prefix map). If there are none, we start from the beginning. Otherwise, we pick the next valid letter and look at the prefix again.

This is perhaps a bit confusing. As we said in class, you should always write a clear specification before writing code. If we try to specify what we are doing with the scrabble rack, it sounds something like this:

Find all words that start with the letters in the prefix space, and whose additional letters all come from the tile rack, and only use each element in the rack once.

We will call this process of finding Scrabble words to extend the prefix the "prefix search". To illustrate this process, suppose we have started making words with our tile rack and we have 'st' in the prefix space. This will look something like the picture below.

scrabble_st

The only letters left to use are 'o', 'r', 'd', and 'w'. In addition, if we use the dictionary provided in complete.txt, we see that the only possible "next letters" after 'st' are 'a', 'e', 'h', 'i', 'o', 'r', 'u', and 'y'. The only letter both of those have in common is 'o'. So we add this letter to our prefix and start a new prefix search. As you can see, this is a recursive process.

Now suppose we have 'wor' in the prefix space. This will look something like the picture below.

scrabble_st

Again, if we use the dictionary provided in complete.txt, we will see that 'd', 's', and 't' are all valid ways to add a next letter. This means that we have to try out three more prefix searches, namely 'word' 'wors', and 'wort'. So while the process is recursive (at each step we get a new prefix search to try), we also have to repeat the search for all of the acceptable next letters. That means that many of our functions will combine a for-loop and a recursive call.


The Word Puzzle App

When you write code, it is always good to have some motivation for the code you are writing. Therefore, we have provided you with your own personal word puzzle that you can cheat at. In fact, we want you to cheat, as you will be submitting a save game file as part of the assignment.


Starting the App

The word puzzle app is the most complex Python application we have seen so far in this class. It is made up of several modules. It also includes its own sound effects. To make everything nice and neat, we have stored everything inside of a folder, called wordapp. More importantly, you launch the app by running the folder as a script, not by running any of the modules inside the folder.

To start up the app, navigate the command line to the folder that contains the folder wordapp; do not go inside the wordapp folder. Then type

  python wordapp

This tells Python to run the modules inside of this folder as a script. It will launch a GUI window with several buttons at the top and a lot of white boxes, as shown below.

wordapp-start


Starting a New Puzzle

Before you can do anything, you must start in a new puzzle. Press the New button to do this. Once you press the button, you will see a box asking you for your netid, as shown below.

wordapp-login

The purpose of asking for your netid is to ensure that every single person has their own unique puzzle to cheat at. You will get the same puzzle every time you type in the same netid, but a different puzzle when you type in a different netid.

Once you have entered your netid, the app will create the puzzle for you. For example, if you enter your instructor's netid (wmw2), you will see the following puzzle.

wordapp-first


Solving the Puzzle

We call this type of a puzzle a seven-cross The purpose of the puzzle is to unscramble the letters so that you get six seven-letter words that match on all of the letters where they intersect. You will solve one word at a time, alternating vertical and horizontal, until the entire cross is solved.

When you start the puzzle, you will notice that only the rightmost column of boxes is white, while all of the other boxes are greyed out. This is a hint that you are only supposed to unscramble the letters in those boxes, and nothing else.

To unscramble the word, click on two different letters. They app will then swap the two letters. If you click on a letter by mistake, you can always unselect it by clicking again. Once the word is successfully unscrambled, the column will turn blue, as shown below.

wordapp-second

The puzzle will now make a new set of boxes white, directing you to unscramble the next word. Notice that the last letter of this word is in a blue box. This letter is now locked and you can no longer swap it with other letters.

You complete this process until the puzzle is completely solved. When the puzzle is finished, all of the boxes will be blue and the app will say "SOLVED!" in the top right corner, as shown below.

wordapp-solved

Other Features

The puzzle app has several other features. First of all, you can save your game at any time by pressing Save. This will create a file called data.save inside your assignment directory (not inside of wordapp). To restore your saved game, press the Load button. This will reset the boxes to where they were when you saved.

Finally, you can always undo all of your progress in the puzzle. If you press the Reset, it will rescramble all of the words, and remove any blue boxes.


Working on the Assignment

As with the previous assignment, this assignment provides you with a lot of code already written for you. We have both an primary module for the assignment and a script for testing it. You also have a graphical application to try out your module. The only difference between this and the previous assignment is the difficulty of the functions.


Assignment Source Code

Before you do anything else, you should download the zip archive A4.zip from this link. You should unzip it and put the contents in a new directory. As with the previous assignment, this directory will contain a few files:
a4.py
This is the primary file that you will need to modify. It contains a series of function stubs that you will complete.
a4test.py
This is a unit test module to verify that the module a4 is working properly. We have gotten you started on some tests, but there are many tests still missing.
wordapp
This is a folder that contains a lot of Python modules in it. These files are uses to launch the word puzzle app described above. You will not modify this file. However, you will use it to generate a save file that you will submit.
complete.txt
This is a dictionary of 130,000 English words. We promise that all of the puzzles in the word game use words taken from this list.
common.txt
This dictionary is a list of the 100 most common English words. It is provided so that you have something smaller that you can test with.
short.txt
This dictionary is a list of the 10 most common English words. This is the dictionary that a4test.py uses for most of its testing.

Pacing Yourself

You should start as this assignment soon as possible. This is a new assignment and we are not sure how long it will take. We have tested it, and believe that it is feasible. However, the earlier you start, the more help we can give you on the assignment.

At the end of each part, we have a suggested completion date. While this is not enforced, we recommend that you try to hit these deadlines. If you cannot make this deadlines, it might be a sign that you are having difficulty and need some extra help.


Testing and Debugging

You should continue using the testing and debugging methodologies that you used in the previous assignments. We have already started several of the test procedures in the unit test with a4test.py. Once again, will be graded on the completeness of your test cases. Remember our comments from the previous assignments.

At the top of a4test.py you will see a new function called assert_lists_equal. You cannot use assert_equals to compare two lists. Just like the colors in the previous assignment, lists are different folders, so we need to compare them element by element. This is a new testing function that does that for you automatically. You do not need to understand this function, but if you read it over, you can see how to make your own testing functions.

Debugging recursive functions can be hard, so you will probably add a lot of print statements to a4.py. This is encouraged, but remember to remove them before you submit the assignment.


Asserting Preconditions

As with the last assignment, we would like you to continue the policy of enforcing the preconditions with assert statements. However, this is a little more difficult this time. For example, note that one of the preconditions of the function autocomplete is that pmap a prefix map. To enforce this precondition, we have to verify the following:

  1. pmap is a dictionary.
  2. The keys of pmap are strings, and the values are lists.
  3. The contents of all the value lists are either single characters or the empty string.
  4. For every key of pmap, the prefixes of that key are also keys.
  5. For every key k and character m in pmap[k], k+m is also a key.

Checking all of these will take more time than executing any of the functions that we are writing for this assignment. And if we have to check this at every recursive call, the entire assignment grinds to a halt.

Remember that enforcing preconditions is a courtesy to other programmers. We are not required to do it, though it can help with debugging later on. In a situation like this, we might choose to enforce (1) and (2) above, or maybe just (1), since it is quick to do that. But we certainly would not enforce all 5.

Choosing which parts of the precondition to enforce and which not to enforce are a bit of art form. To make this easier, we have spelled it out for you. In each specification, after the precondition, there is a paragraph called "enforced preconditions". This explains what part of the precondition the function is expected to enforce with an assert statement. Your implementation should enforce these parts of the precondition, and nothing more.


Assignment Instructions

This assignment is broken up into five tasks. The first four are programming tasks, while the last task is using your assignment to solve the puzzles in the Word Puzzle App.

The programming tasks all correspond to function stubs in a4.py. The tasks are clearly labeled, and the functions are fully specified. You will find this assignment to be a lot easier if you complete and fully test one task before moving on to the next.


Task 1. Word Lists

The first task is a collection of functions working with word lists. There are three functions to implement.

  def build_word_list(file):
      """Returns a list of words from the given file"""

  def word_list_by_size(words, size):
      """Returns the elements of words that have length size"""

  def word_list_extend(words, prefix):
      """Returns the word list that is the result of adding prefix
      to the start of every word in the list words."""

The complete specifications for these functions are given in a4.py. There are examples there to help you understand the functions. Remember to enforce (part of) the preconditions. The function specifications clearly outline what you should and should not enforce.

You do not need recursion for any of the functions in this task. You can easily implement these with for-loops.

We have provided you with completed test procedures for these three functions in a4test.py. These tests use the small dictionary short.txt. You might want to test your functions on the longer dictionaries. However, this is not required.

Try to finish this part by Tuesday, October 21.

Task 2. Prefix Maps

In our overview of word puzzles above, we introduced the notion of a prefix map. Your second task is to implement several functions for working with prefix maps. In particular, you need to be able to create them and test whether or not a word is in a prefix maps. This task requires that you implement four functions.

  def pmap_add_word(pmap,word):
      """Adds a single word to a prefix map."""

  def word_list_to_pmap(words):
      """Returns the prefix map for the given word list."""

  def pmap_to_word_list(pmap):
      """Returns the word list for the given prefix map."""

  def pmap_has_word(pmap,word):
      """Returns True if word is in the prefix map."""

The complete specifications for these functions are given in a4.py. There are examples there to help you understand the functions. Again, remember to enforce (part of) the preconditions.

Once again, you do not need recursion for any of the functions in this task. You can easily implement these with for-loops. One of these functions will not need loops at all.

We have provided you with completed test procedures for these four functions in a4test.py. These tests use very short word lists of no more than a few words. You might want to test your functions out on longer words lists, such as the list in short.txt or common.txt. Once again, this is not required.

Try to finish this part by Friday, October 24.

Task 3. Autocomplete

The third task is to implement a single function:

  def autocomplete(prefix, pmap):
      """Returns the list of all words that complete prefix in pmap"""

Once again, the complete specification is given in a4.py. You should reread our discussion of prefix maps, as it explains how to use a prefix map to help with autocompletion. Remember to enforce the appropriate preconditions.

You must use recursion to implement this function. However, we will tell you that you also need to combine recursion with a for-loop. You need the for-loop to build the list of words (using a list as an accumulator); you need recursion to compute the words that go into the list.

This time, we have not provided any tests in a4test.py. It is up to you to provide the tests this time. Hopefully you can figure out what tests you need to write from looking at the tests for the previous tasks. Off the top of our head, we can think of at least four tests that would be good to try.

Doesn't This Break the Rules?

When you write the autocomplete function, you will discover that prefix gets larger and larger at each recursive call. This is completely the opposite of what we said in class, that the argument must get smaller at each recursive call in order for it to terminate.

While it is true that the prefix is getting larger, the number of words that can complete the prefix is always getting smaller. That is why we are not breaking the rules we laid out in class.

Update (October 26, 2014)

We are adding new rules for this function.

  • You must use recursion to implement this function.
  • You may not make any helper functions beyond the ones we have given you.
Try to finish this part by Tuesday, October 28.

Task 4. Puzzle Functions

The final task is to implement the functions that will help you with the puzzle game. There are two functions to implement.

  def scrabble(rack,size,pmap):
      """Returns the list of all valid words that you can form 
      from the tile rack using EXACTLY size letters."""

  def match(template,pmap):
      """Returns the list of all valid words that match the given 
      template."""

Once again, the complete specifications are given in a4.py. The function scrabble is useful for unscrambling words. The function match is useful for solving crossword style puzzles where you know some, but not all, of the letters. You can see how both of these functions are useful for the word puzzle game.

You will notice that both of these functions already have implementations. Both of them call a helper function (scrabble_helper and match_helper, respectively). The helper functions change the specification of the original functions so they look like the Scrabble strategy we described above. You are to implement these helper functions using recursion together with a for-loop.

These helper functions show off a common strategy with recursive functions. Many times, it is easier to make a function recursive if we rewrite the specification. However, as we have said many times, you are never allowed to change specifications. On the other hand, you are always allowed to make a new function and give it a different specification. That is what we have done here.

Once again, we have not provided any tests in a4test.py for these two functions. It is up to you to test them. You should have at least four tests for each, and possibly more.

Update (October 26, 2014)

We are adding new rules for these two functions.

  • You must use recursion to implement these two functions.
  • You may not make any helper functions beyond the ones we have given you.
Try to finish this part by Saturday, November 1.

Task 5. Solve Your Personal Puzzle

You now have all the tools that you need to cheat at the word puzzle app. Start a new puzzle with your netid. If you are working with a partner, you only need one puzzle, so pick one of your two netids. Once you have your puzzle, solve it.

We do not care how you solve the puzzle. If you know enough words to solve it without help, then great. However, we have picked some pretty obscure words, so you will probably want to cheat.

When you are done, save your game. We want you to submit data.save as part of this assignment.


Finishing the Assignment

Once you have everything working you should go back and make sure that your program meets the class coding conventions. In particular, you should check that the following are all true:

  1. There are no tabs in the file, only spaces (recall, tabs look like arrows if whitespace is visible)
  2. Functions are each separated by two blank lines.
  3. Lines are short enough (80 chars) that horizontal scrolling is not necessary.
  4. The specifications for all of the functions are complete and are docstrings.
  5. Specifications are immediately after the function header and indented.
  6. The enforced preconditions are all fully checked and asserted.

Furthermore, at the top of both a4.py and a4test.py you should have three single line comments with (1) the module name, (2) your name(s) and netid(s), and (3) the date you finished the assignment.

To finish the assignment, you will need to upload three files: a4.py, a4test.py, and data.save. The last file is your saved game from your solved puzzle. Upload these files to CMS by the due date: Sunday, November 2nd at 11:59 pm. Do not submit any files with the extension/suffix .pyc. It will help to set the preferences in your operating system so that extensions always appear.

Survey

In addition to turning in the assignment, we ask that you complete the survey posted in CMS. Once again, the surveys will ask about things such as how long you spent on the assignment, your impression of the difficulty, and what could be done to improve it. The survey is particularly important this time because it is the first time that we have ever given this assignment. We want to know whether we should keep it, or look for a new assignment next year.

Please try to complete the survey within a few days of turning in this assignment. Remember that participation in surveys comprise 1% of your final grade. Remember that individuals must complete the survey, even if done as a group.


The Origin of this Assignment

CS 1110 first switched to Python in 2012. Believe it or not, I (your instructor) learned Python that same year so I could teach this class. Before that year, I had taught this class in Java. I came to like Python a lot, and I am glad that we made the switch.

At the same time that I was learning Python, Cliff Johnson released the puzzle game A Fool and His Money. Among puzzle gamers, this was a game that we had been expecting for a very, very long time. It is a sequel to the famous puzzle game The Fool's Errand, which I played when I was an college freshman, just like many of you.

A Fool and His Money has a lot of word puzzles in it, including the seven-cross that you played as part of this assignment. It has a lot more word puzzles than that, many of them much more difficult. I was convinced by Andrew Plotkin's blog post (himself a famous puzzle designer) that writing a program to solve a word puzzle was not actually cheating. The interesting part is coming up with the program anyway.

So I decided to use this language that I had just learned -- Python -- to solve the word puzzles in the game. As part of this process I discovered that Python is great for working with strings and word puzzles. I wrote a lot of functions to solve the game. And all of them were recursive. So when we decided to make changes to the recursion assignment, I knew exactly what to do.

The functions that I wrote for the seven-cross were some of the simplest. However, there are a lot more functions that I wrote that are not part of this assignment. I am not going to tell you what those functions are. If you want a challenge, get your own copy of A Fool and His Money and figure them out on your own. Have fun!


Course Material Authors: D. Gries, L. Lee, S. Marschner, & W. White (over the years)