CS5430 Homework 5: Password Strength Classifier

General Instructions. You are expected to work alone on this assignment.

Due: March 24, 2020 at 11:59pm. No late assignments will be accepted.

Submit your solution using CMS.

Some websites indicate to users whether a password they choose is strong or weak. Your task in this assignment is to write a program that implements such a password classifier. Given a password as input, it should classify that password as either strong or weak.

The project involves three parts, each of which contributes to your overall grade.

Part 1. Study the Literature

Learn about prior work by studying the research literature. Use a search engine to find recent and/or influential research papers that talk about this topic. Such papers are likely to have appeared in conferences like "IEEE Symposium on Security and Privacy", "USENIX Security", and "ACM Conference on Computer and Communications Security".

There will be many papers -- too many for you to read in the limited time available. By reading the abstract and introduction of a paper, though, you should be able to ascertain whether that paper will be useful for this assignment. A paper will be useful because it suggests an approach for evaluating password strength or because it explains why some previously proposed approach is flawed. Skip papers that talk about machine learning, because that approach has been ruled out.

Document your investigations in this part by listing, for between 3 and 5 papers, the following.

The write-up for each paper (typeset with 10 point font and in a document called priorwork.pdf) will likely be between 1/2 and a full page --- certainly no more than 2 pages.

Part 2. Design Documentation

Write a design document (to be called design.pdf) that discusses your classifier's implementation and explains how any findings from the literature (documented in Part 1) influenced the design. One sensible way to structure this document would be to use the following sections.

Part 3. Actual Implementation

Implement your classifier based on the description you gave in design.pdf) from Part 2. We will execute this implementation on some test files. We may also inspect the code.

So your script takes two inputs:

  1. the path to an input .txt file containing one or more passwords for classification. Each password in the file should be separated by a newline, and the file must end with a newline as well.

  2. the path to an output .txt file where your program will write classifications for each password it reads from the input file (in order read). For each string it reads from the input file, your program should write either strong or weak, followed by a newline, to the output file (according to the strength of the password it just read). The output file should end with a newline. An automated grading script will be used to test your classifier, and this grading script considers other outputs as incorrect.

As an example, for a given input file passwords.txt stored in directory ~/inputs/ with the following contents:

  i love you 
your program would be called as follows:
  $ ./classify.sh ~/inputs/passwords.txt ~/outputs/classified_passwords.txt
and would produce a corresponding output file classified_passwords.txt stored in directory ~/outputs/ with the following contents:

A setup script is allowed. You may provide a setup script setup.sh to perform any initial compilation or configuration needed prior to execution of your program on test data. This script may download data files (e.g. wordlists). But note that any wordlists we use in generating test cases will be those we could freely download---not wordlists for which payment is required. So you have no motivation to pay for wordlists.

There are many password classifiers available for download on the web. Needless to say, you shouldn't be consulting them to write your system and you shouldn't be downloading them as part of the files that your system downloads prior to execution. To do otherwise would be a serious violation of academic integrity.

Target Environment. You may develop your system anywhere. But we will grade your system by running it on the Linux hosts in UGCLab (ugclinux.cs.cornell.edu; see here for more information). So use a programming or scripting language available within this environment, and use Linux hosts in UGCLab to test what you will submit.

Submissions that do not run on the Linux hosts in UGCLab will receive no credit for executing correctly. Visit the UGCLab and test your system before you submit it, leaving plenty of time to make changes that may be needed.

Submission and Grading

What to submit. CMS will be set-up for submissions of various elements, as follows.

Notes on Grading. Here is a rough breakdown of the relative importance of your submission.

Note, a particularly "bad" classifier could get penalized both because the prior work was poorly chosen and because those methods led to bad performance.

Resources. For a refresher on bash scripting, see these CS 2043 lectures: here and here.