CS5430 System Security

CS5430 Homework 5: Password Strength Classifier

General Instructions. You are expected to work alone on this assignment.

Due: March 24, 2020 at 11:59pm. No late assignments will be accepted.

Submit your solution using CMS.

Some websites indicate to users whether a password they choose is strong or weak. Your task in this assignment is to write a program that implements such a password classifier. Given a password as input, it should classify that password as either strong or weak.

A more nuanced classification (e.g. very strong, strong, weak, very weak) is possible, but we are only seeking a binary classification here.
Do not implement "machine learning" techniques in your solution. Although they can be quite effective, we have deemed them unsuitable for the intended deployment setting (which lacks the memory capacity).

The project involves three parts, each of which contributes to your overall grade.

Part 1. Study the Literature

Learn about prior work by studying the research literature. Use a search engine to find recent and/or influential research papers that talk about this topic. Such papers are likely to have appeared in conferences like "IEEE Symposium on Security and Privacy", "USENIX Security", and "ACM Conference on Computer and Communications Security".

There will be many papers -- too many for you to read in the limited time available. By reading the abstract and introduction of a paper, though, you should be able to ascertain whether that paper will be useful for this assignment. A paper will be useful because it suggests an approach for evaluating password strength or because it explains why some previously proposed approach is flawed. Skip papers that talk about machine learning, because that approach has been ruled out.

Document your investigations in this part by listing, for between 3 and 5 papers, the following.

A bibliographic citation for the paper. Include a URL so we can find and read the paper, too.
In your own words, a summary of the main points of the paper. What problem did it try to solve? What assumptions does it make about the context? What conclusions does the paper make?
A list of points in the paper that influenced the design of your password strength checker.

The write-up for each paper (typeset with 10 point font and in a document called priorwork.pdf) will likely be between 1/2 and a full page --- certainly no more than 2 pages.

Part 2. Design Documentation

Write a design document (to be called design.pdf) that discusses your classifier's implementation and explains how any findings from the literature (documented in Part 1) influenced the design. One sensible way to structure this document would be to use the following sections.

Section 1. Discuss the overall structure of the system. This discussion should aim to help somebody who is going to look at the code you provide.
Section 2. List the URL for any files of candidate passwords that your program downloads and uses.
Section 3. Discuss the rules and/or password classification scheme being used in your code. For each element of your password classification scheme, include:
- a pointer to prior work that justifies the use of this idea -or-
- if this design element is your own idea then give your justification for employing this design element. Implementing your own clever idea without providing suitable justification is not acceptable---recall, security concerns both function and assurance.

Part 3. Actual Implementation

Implement your classifier based on the description you gave in design.pdf) from Part 2. We will execute this implementation on some test files. We may also inspect the code.

The launch script for your program should be named: classify.sh. This launch script will run your classifier source code, assuming you are not using bash to implement your classifier.

Your script should be invoked as follows, where file input.txt contains the passwords to be classified:
```
  
  $ ./classify.sh path/to/input.txt path/to/output.txt
  
  
```

So your script takes two inputs:

the path to an input .txt file containing one or more passwords for classification. Each password in the file should be separated by a newline, and the file must end with a newline as well.
the path to an output .txt file where your program will write classifications for each password it reads from the input file (in order read). For each string it reads from the input file, your program should write either strong or weak, followed by a newline, to the output file (according to the strength of the password it just read). The output file should end with a newline. An automated grading script will be used to test your classifier, and this grading script considers other outputs as incorrect.

As an example, for a given input file passwords.txt stored in directory ~/inputs/ with the following contents:

  
  i love you 
  1234 
  2984borawQ!

your program would be called as follows:

  
  $ ./classify.sh ~/inputs/passwords.txt ~/outputs/classified_passwords.txt

and would produce a corresponding output file classified_passwords.txt stored in directory ~/outputs/ with the following contents:

  
  weak 
  weak 
  strong

A setup script is allowed. You may provide a setup script setup.sh to perform any initial compilation or configuration needed prior to execution of your program on test data. This script may download data files (e.g. wordlists). But note that any wordlists we use in generating test cases will be those we could freely download---not wordlists for which payment is required. So you have no motivation to pay for wordlists.

There are many password classifiers available for download on the web. Needless to say, you shouldn't be consulting them to write your system and you shouldn't be downloading them as part of the files that your system downloads prior to execution. To do otherwise would be a serious violation of academic integrity.

Target Environment. You may develop your system anywhere. But we will grade your system by running it on the Linux hosts in UGCLab (ugclinux.cs.cornell.edu; see here for more information). So use a programming or scripting language available within this environment, and use Linux hosts in UGCLab to test what you will submit.

Submissions that do not run on the Linux hosts in UGCLab will receive no credit for executing correctly. Visit the UGCLab and test your system before you submit it, leaving plenty of time to make changes that may be needed.

Submission and Grading

What to submit. CMS will be set-up for submissions of various elements, as follows.

priorwork.pdf giving your research findings/annotations, as discussed above in Part 1.

design.pdf documenting discusses the design of your classifier, as discussed above in Part 2.

An archive named classifier.zip containing:
- Source code for your classifier,
- Bash scripts setup.sh and classify.sh for preparing and running your classifier respectively.
- README.txt that documents how these scripts install, configure, and run your classifer. This document must be sufficiently clear that we can get your classifier installed and running within a couple of minutes. Instructions that are unclear will be penalized.
- Any additional data files your program needs for operation.
- Files named strong.txt and weak.txt that should contain 10 strong and 10 weak passwords, respectively. Each password in these files should be separated by a newline, and the files must end with a newline as well. We will use our classifier to check the passwords in these files (among others), and we may also use these files as input for testing other submissions.
There will be a CMS-enforced limit of 10MB on the size of the archive; design your classifier with this constraint in mind.

Notes on Grading. Here is a rough breakdown of the relative importance of your submission.

30% Part 1. Choice of papers, quality of write-up.
30% Part 2. Clarity of description, justification of elements.
30% Code that was submitted, strong.txt, and weak.txt.
10% Whether code runs and how well it classifies test sets of passwords.

Note, a particularly "bad" classifier could get penalized both because the prior work was poorly chosen and because those methods led to bad performance.

Resources. For a refresher on bash scripting, see these CS 2043 lectures: here and here.