CS 2110 Homework Assignment 1.  Due: Wed 9/9 11:59pm

New! Assignment 1 FAQ

New! Assignment 1 Test Suite (please run before submitting your work!)

Reading the Animals data files

For this assignment, you must install the JDK and Eclipse on your machine and copy the gzip files containing the animals data and image files to your machine, extracting them into a working directory.  The data files have names like A12.dat and the images have names like Fuzzy_Tribble.bmp.  You may want to use the fake animal data in this ZIP file as a starting point during development: the DNA sequences are simplified ones.  But once we have things runnings, we'll work with the pictures here and the data files here.

You will write a Java program that takes a list of animal data file names as its input, on the command line (or, in Eclipse, through the « Run Configurations » menu item, which also lets you specify the folder in which your program will be executed.  The run and debug configurations are normally the same and you can name and save a configuration if you need more than one).

For example, you might run « my_hw1 A1.dat A2.dat A3.dat »

Your program will loop over the list of input file names, and for each must:

1)      Open and read the file line by line

2)      Parse each line, extracting the associated « command ».  Our data files have the form :

Command=”value”

This is the list of commands in use:

Name=”name of this animal”

LatinName=”name of this animal in scientific gibberish”

Image=”name of the file where the image for this animal is found”

DNA=”string of genetic symbols (A C I O)”

Note that the equal sign and quotes aren’t part of the value.   Also, do keep in mind that for an unqualified file pathname such as we see here, your program will look for the file in the place where it gets executed.  From Eclipse, you control that choice of running directory from a form called the "Run Configurations..." on the "Run" pull-down tab.  You can also specify the arguments to the Main method of your program there.

For example:

Name=”Biscuit”

LatinName=”Retrieverus_Aurum”

Image=”Biscuit.bmp”

DNA=”AIOCOICAIAIAOOCA”

                                                  "Biscuit.dat"

golden-retriever.jpg  "Biscuit.bmp"

 

Obviously, for the animals in our real data set, the DNA will be much longer than the string above.

3)      Now, parse each DNA string, using (for example) an ArrayList so that you can iterate over the gene substrings within it.  Recall that a gene starts after an AOC and ends with an ICA.  Note that AOC can appear inside a gene, and ICA can appear in the junk DNA region.  The AOC and ICA are not part of the gene.

4)      Store each gene using a HashMap, such that any single gene is only inserted into the HashMap once.  Number each gene sequentially starting with zero, in the order that it was encountered.(Note: You may use something other than a HashMap, but if you do so you must explain your design decision! A HashMap is really the best structure for this)  

5)    Close the file when the EOF is reached. We recommend that you not do this in a finalize() method (as we heard in class, there is no way to know when a finalize method will be called, and in some implementations of Java, it might not get called at all).

How we plan to test your code

We’ll be using an automated testing program that takes specific classes from your solution and subjects them to various tests with our own inputs.    For this to work, you need to implement the classes using the interfaces we’ve uploaded on CMS, including the constructor methods (which unfortunately, aren’t included in the interfaces – Java has a peculiar restriction in that respect and for some reason, an interface can’t include the type signature required by our test program, so we’ve specified that in the comments).

We basically extract your class implementations, compile them against a test program of our own that lacks implementations of these classes but does use the same interfaces, and then we call your methods with various arguments to see if you do the right things.  This is called “JUnit testing” and is standardized in Eclipse.  Most Java development efforts use this approach, so this is good practice for working in teams down the road if you ever code as part of your future career.

What classes we’ll be testing

We’ll be using your versions of the class called constants, plus SpeciesReader,  DNAParser and Genome.  We will use our own main procedure in our own main class.  We’ll also look at your code by hand to make sure it was well written, well commented, and (when we required that it use an ArrayList or a HashMap) that it actually was implemented in the way we requested.

But there isn’t any output!

Obviously, to develop and test your program, you’ll need to print all sorts of stuff on the console.  But our test procedures won’t be looking at output this time, so in fact the decision about what to print while debugging is up to you.  We prefer that the classes themselves not print anything because our test procedures aren’t set up to deal with console output.  So if you have console print statements in your core, please put an if statement that tests a boolean flag (for example, “constants.verbose_mode”).  You can set this to true in your code when developing and then change it to false before handing it in.  Keep in mind that we’ll be using YOUR version of the constants class.

What about the input?

For your tests, the main procedure you code should take its input as command-line arguments (most easily done via the Eclipse run configuration).  When we test your code, we’ll be doing that in a folder that has .dat files (perhaps not the same ones you used to develop and test your code!) and we’ll just pass in the file names we want you to use as inputs to SpeciesReader.    In effect, we’ll take your code, compile it into a program of our own, and then we’ll check that this combination follows all the rules and does what we expect it to do.  

Some students feel that it is unfair if their code works on all the data in our given data set, and yet there is a bug that we manage to expose in our more aggressive testing.  Actually, though, this is the real world of real programs: once you write a program and put it out there, you don’t know what inputs it will be executed on.  They could have errors in them, like null strings or very long strings, etc, and a well-written program should protect itself against bad inputs.

The interfaces we want you to support

For this assignment, you must write implementations of three interfaces:

SpeciesReader

 Reads commands and values from a data file

DNAParser

Parses genes out of a DNA string

Genome

Stores a collection of genes and assigns them unique numbers

These interfaces are contained in cs2110_a1.jar(See: How to add libraries to your Eclipse project)

What to turn in

Structure your code so that all of your classes belong the package netid.assignment1, where netid is your Net ID.

For example, if your NetID is "foo123" your Eclipse workspace might contain classes:

foo123.assignment1.MySpeciesReader
foo123.assignment1.MyDNAParser
foo123.assignment1.MyGenome

...along with any other classes you create.  Your classes can be named however you want.

You will submit a source jar to CMS.  A source jar bundles a package's .java sources and compiled .class files.

To create a source jar in Eclipse,

1. Right-click on your package in the Package Explorer tab and select "Export..."
2. Choose to export a JAR File
3. Be sure to check Export generated class files and resources and Export Java source files and resources
4. Click Finish

Submit this .jar file to CMS.


CMS submission is now open! Please run the test suite on your .jar before submitting!