**Deadline:** Friday, 03/31/17, 11:59 pm
**Special late policy:** Because of the snow day, the course schedule
had to slip. To give you maximum time to work
on the assignment, the revised deadline is the
day before Spring Break, which [officially][key-acad-dates]
begins Saturday, 04/01/17, 1:10 pm. So you may submit between
the deadline and the beginning of Spring Break for a penalty of 10%
(and the usual 20 min grace period will be in effect),
and no submissions will be accepted after the beginning of Spring Break.
*This assignment may be done as individuals or with one partner.*
## Problem 1 (25 pts)
*based on [Schneider, chapter 5, problem 5.3]*
You have been asked to determine the feasibility of conducting a
brute-force attack on passwords. Your goal is to estimate the resources
it would require for an attacker to produce a dictionary for use in an
offline guessing attack.
Assume that the attacker does not yet have the password database at the
time they construct the dictionary. Also assume that the attacker must
construct a complete dictionary with all the possible passwords and
salts. (Note that latter assumption rules out *rainbow tables* and other
probabilistic dictionaries.) And assume that the cryptographic hash
function `H` used below is SHA-512.
1. How much time and space would it take to create a complete dictionary
under both of the following assumptions?
- Passwords are chosen from this supposed list of [leaked MySpace
passwords](myspace.txt) [source: www.skullsecurity.org].
- The stored form of password `pwd` is `H(salt, pwd)`, where
the salt is four bytes. (That's how OS X Lion stores
2. How much time and space would it take to create a complete dictionary
under both of the following assumptions?
- Passwords are chosen from the space of all
eight-or-fewer character passwords over the alphabet containing
lower-case letters and digits (i.e., `a-z` and `0-9`).
- The stored form of password `pwd` is `H(pwd)`.
No salt is used.
3. Based on your answers above, which dictionary, (1) or (2), requires
more resources to compute?
4. Now suppose that the attacker instead did have the password database
(including salts) before computing the dictionary, and that the attacker's
goal was to crack only one particular user's password. Assume that the user
had indeed chosen their password in accordance with the ways given above.
Which scenario, (1) or (2), will require fewer computational resources
for the attacker to construct a complete dictionary to crack
that single user's password? Justify your answer.
5. Consider the following hypothetical scenario. You are a system administrator,
and your system supports two configurations for automatic password creation
and storage. Those configurations correspond to the two scenarios
above. That is, the system can either (1) randomly choose a leaked MySpace
password, assign it to a user, and store it with four-byte salt; or
(2) randomly choose an alphanumeric password of at most eight
characters, assign it to a user, and store it without salt. Only
one configuration may be active. The system does not allow users
to choose their own passwords. Which way would you configure the
system, (1) or (2)? Justify your choice, based in part on your
answers to the previous questions. *Hint: This is one of those questions
where there isn't necessarily a right answer, so your answer will be
evaluated primarily on the criteria you identify and how you apply them.*
To answer the questions above, you will need to do some programming and
experimentation to measure how much time and space it takes to hash and
store passwords to a dictionary file. Report on how you went about this
experimentation, including the kind of processor, machine, OS, etc.
Your answers for questions 1 and 2 should be broken into three parts:
1. Formulas for calculating the amount of time and space.
2. Your experimental measurements.
3. Time and space estimates constructed from the previous two parts.
**What to submit:** a file named `bruteforce.pdf` containing your answers to the
above questions. Include any important fragments of source code as part of
## Problem 2 (40 pts)
Rather than enforcing a particular password recipe, some websites will
instead indicate to users whether a new password they have
chosen is strong or weak. Your task is to program your own password
classifier that, given a password, classifies it as either strong or
weak. More nuanced classification—e.g., very strong, strong,
weak, very weak—could be possible, but we're considering only a binary
You may use any heuristics that you want to build your classifier,
including those we've covered in class as well as any you might discover
by doing independent research. You may not, however, reuse any code or
libraries specifically designed for password classification.
Any particular algorithms that you've identified by
independent research must be attributed to their proper inventor in
comments in your source code.
The executable for your program should be named `classify`.
Your program should read a string from standard input and write back to
standard output either the string `weak` or `strong`. Here are some
We will evaluate your classifier by running it against passwords that we
have previously classified ourselves. We will generate high-strength
passwords, which we will label `strong`, and low-strength passwords,
labeled `weak`. How will we generate and label them? Based on the
[work by Kelley et al.][kelley12] that we discussed in lecture. Our
high-strength passwords will be generated by recipes that their Figure 1
suggest are hard to crack, and likewise for low-strength passwords. *We
are not attempting to be tricky here. Your classification does not need
to be 100% identical to ours for you to get full credit on this problem.*
Your classifier must output exactly the string `weak` or the string `strong`
so that our automated scripts can test your classifier; any other outputs
will be deemed incorrect.
It is permissible for your program to perform an initial download of
static data files (e.g., wordlists) as part of installation or its first
execution. Any wordlists we happen to use in generation will be those we
could freely download, not wordlists for which payment is required. So
you have no motivation to pay for wordlists. Needless to say, using a
download to update your source code itself would be a serious violation
of academic integrity.
You're welcome to develop your classifier in any environment you wish.
But grading will take place in a uniform execution environment, which is
a Ubuntu 16.04 Desktop machine, as further specified below. So you will
either want to develop your classifier in that environment, or port and
test your classifier in it. If you opt for the latter (which is reasonable),
check up front that your language of choice, as well as any libraries,
are actually available in the grading environment.
You can install a virtual machine running Ubuntu 16.04 Desktop by following these
1. Install [VirtualBox][virtualbox].
2. Install [Vagrant][vagrant].
3. Run `vagrant init box-cutter/ubuntu1604-desktop`.
4. Run `vagrant up`. The VM will download and launch.
The username and password are both `vagrant`.
There is a known issue with Gnome Terminal in the VM, but `xterm` works
fine. If you wish to use Gnome Terminal, run these commands in `xterm`:
sudo locale-gen en_US en_US.UTF-8
sudo dpkg-reconfigure locales
The second command will cause a text interface to appear. On the first
screen, simply choose OK. On the second, change the default locale to
en_US.UTF-8. After the command finishes regenerating locales, log out
and log back in. Gnome Terminal should now be working.
You may use any programming or scripting language that can be installed
on the virtual machine you created according to the instructions above, as long
as that language can be installed by `apt-get`. For example, here are
some suitable choices:
* Python 2 and 3: already installed
* Perl 5: already installed
* Ruby: `sudo apt-get install ruby`
* Java 8: `sudo apt-get install openjdk-8-jdk`
* OCaml: `sudo apt-get install ocaml`
Before issuing any of the `apt-get` commands, make sure to first
run `sudo apt-get update`.
Provide a file named README.txt that documents the exact commands a
grader needs to run to install, configure, and run your program. For
example, if you use any packages installed by `apt-get`, `pip`, or
`opam`, then document in the README the commands that need to be
executed. Your documentation must be sufficiently clear that the grader
can get your classifier installed and running within a couple minutes of
active work. Documentation that is unclear may be penalized.
**What to submit:** an archive named `classifier.zip` with your source
code, README, and any additional data files your program needs for operation.
There will be a CMS-enforced limit of 10 MB on the size of this archive;
design your classifier with that constraint in mind.
If you work with a partner, first form a group on CMS; submit as that group,
rather than submitting the same solution independently.
Submit the files specified above to [CMS][cms].
For PDFs, use 10 point or larger type.
Be succinct; it's unlikely you will need to write more than a couple pages per problem.
You will be evaluated on the quality of your solutions and on your
adherence to the submission stipulations above. We'll use the following
criteria in evaluating quality:
- *Validity:* do you present a logical, lucid, coherent, clearly
focused, well structured, and appropriately detailed argument?
- *Consistency:* do you employ concepts, principles, and terminology
as they are used in this course?
- *Evidence:* do you adequately support your conclusions?
- *Writing:* do you use proper mechanics, grammar, and style?