CS
2110 Homework Assignment 1. Due: Wed 9/9
11:59pm
Reading the Animals data files
For this
assignment, you must install the JDK and
Eclipse on your machine and copy the gzip
files
containing the animals data and image files
to your
machine, extracting them into a working directory.
The data files have names like A12.dat and
the images have names like Fuzzy_Tribble.bmp.
You will
write a Java program that takes a list of
animal data file names as its input, on the command line (or, in
Eclipse,
through the « Run Configurations » menu item, which also lets
you
specify the folder in which your program will be executed.
The run and debug configurations are normally
the same and you can name and save a configuration if you need more
than one).
For
example, you might run « my_hw1 A1.dat A2.dat
A3.dat »
Your
program will loop over the list of input file
names, and for each must:
1)
Open and read the file line by line
2)
Parse each line, extracting the associated
« command ». Our data files
have the form :
Command=”value”
This is the
list of commands in use:
Name=”name
of this animal”
LatinName=”name
of this animal in scientific gibberish”
Image=”name
of the file where the image for this animal is found”
DNA=”string
of genetic symbols (A C I O)”
Note that the
equal sign and quotes aren’t part of the value.
For example:
Name=”Biscuit” LatinName=”Retrieverus_Aurum” Image=”Biscuit.bmp” DNA=”AIOCOICAIAIAOOCA”
|
"Biscuit.bmp" |
Obviously,
for the animals in our real data set, the DNA
will be much longer than the string
above.
3)
Now, parse each DNA string, using (for example) an ArrayList
so that you can iterate over the gene substrings
within it. Recall that a gene starts
after an AOC and ends with an ICA. Note
that AOC can appear inside a gene, and ICA can appear in the junk DNA
region. The AOC and ICA are not part of
the gene.
4)
Store each gene using a HashMap, such that any single gene is only
inserted into the HashMap
once. Number each gene sequentially
starting with zero, in the order that it was encountered.(Note: You may use something other than a HashMap, but if you do so you must explain your design decision! A HashMap is really the best structure for this)
5)
How we plan to test your code
We’ll be
using an automated testing program that takes
specific classes from your solution and subjects them to various tests
with our
own inputs. For this to work,
you need
to implement the classes using the interfaces we’ve uploaded on CMS,
including
the constructor methods (which unfortunately, aren’t included in the
interfaces
– Java has a peculiar restriction in that respect and for some reason,
an
interface can’t include the type signature required by our test
program, so
we’ve specified that in the comments).
We
basically extract your class implementations,
compile them against a test program of our own that lacks
implementations of
these classes but does use the same interfaces, and then we call your
methods
with various arguments to see if you do the right things.
This is called “JUnit
testing” and is standardized in Eclipse.
Most Java development efforts use this approach, so this is good
practice for working in teams down the road if you ever code as part of
your
future career.
What classes we’ll be testing
We’ll be
using your versions of the class called
constants, plus SpeciesReader,
DNAParser
and Genome.
We will use our own main procedure in our own main class. We’ll also look at your code by hand to make
sure it was well written, well commented, and (when we required that it
use an ArrayList or a HashMap)
that it
actually was implemented in the way we requested.
But there isn’t any output!
Obviously,
to develop and test your program, you’ll
need to print all sorts of stuff on the console. But
our test procedures won’t be looking at
output this time, so in fact the decision about what to print while
debugging
is up to you. We prefer that the classes
themselves not print anything because our test procedures aren’t set up
to deal
with console output. So if you have
console print statements in your core, please put an if statement that
tests a boolean
flag (for example, “constants.verbose_mode”).
You can set this to true in your code when developing and then
change it
to false before handing it in. Keep in
mind that we’ll be using YOUR version of the constants class.
What about the input?
For your
tests, the main procedure you code should
take its input as command-line arguments (most easily done via the Eclipse run configuration).
When we test your code, we’ll be doing that in a folder that has
.dat files (perhaps not the same ones you
used to develop
and test your code!) and we’ll just pass in the file names we want you
to use
as inputs to SpeciesReader. In
effect, we’ll take your code, compile it into a program of our own, and
then
we’ll check that this combination follows all the rules and does what
we expect
it to do.
Some
students feel that it is unfair if their code
works on all the data in our given data set, and yet there is a bug
that we
manage to expose in our more aggressive testing. Actually,
though, this is the real world of
real programs: once you write a program and put it out there, you don’t
know
what inputs it will be executed on. They
could have errors in them, like null strings or very long strings, etc,
and a
well-written program should protect itself against bad inputs.
The interfaces
we want you to support
For this
assignment, you must write implementations of three interfaces:
Reads commands and values from a data file |
|
Parses genes out of a DNA string |
|
Stores a collection of genes and assigns them unique numbers |
These interfaces are contained in cs2110_a1.jar. (See: How
to add libraries to your Eclipse project)
What to turn in
Structure
your code so that all of your classes belong the package netid.assignment1,
where netid is your Net ID.
For
example, if your NetID is "foo123" your Eclipse workspace might contain
classes:
foo123.assignment1.MySpeciesReader
foo123.assignment1.MyDNAParser
foo123.assignment1.MyGenome
...along with any other classes you create.
Your classes can be named however you want.
You will submit a source jar to CMS. A source jar bundles a package's .java sources and compiled .class files.
To create a source jar in Eclipse,
1. Right-click on your package in the Package
Explorer tab and select "Export..."
2. Choose to export a JAR File
3. Be sure to check Export
generated class files and resources and Export Java source files and resources
4. Click Finish
Submit this .jar file to CMS.
CMS submission is now open! Please run the test suite on your .jar before submitting!