cs2110.assignment1
Interface DNAParser


public interface DNAParser

Utility class to extract genes from raw DNA strings

         Parses a DNA string, ignoring (discarding) junk DNA.
 
         Also implement the constructor, not shown: public DNAParser(String
         DNA);
 
         Recall that a DNA string is a very long list of characters, from the
         set {I,T,Y,A}. A gene is a substring that starts with the
         three-character sequence IAY and ends with TYI Note that IAY could
         also occur inside the gene, and this is perfectly legal Similarly,
         TYI could occur in the junk region, and that would be legal too
 
         A typical DNA string will contain between 5 and 15 genes and a
         typical gene will be 150 to 500 characters in length, and no gene
         will ever be longer than 999 characters. Just the same, we recommend
         that you write your code so that violations of these rough limits
         won't cause any problems or buggy behavior.
 
         The ideal tool for parsing genes from DNA is regular expressions. For
         this assignment, you don't have to know how to use them, but in the
         real world a regex is the fastest and most secure solution.
         www.Rubular.com is a great tool for developing regular expressions.
 

Author:
CS2110 Course Staff

Method Summary
 java.util.List<Gene> parse()
          Parses the DNA string previously by setDNA().
 void setDNA(java.lang.String DNA)
          Set the DNA string to be parsed
 

Method Detail

setDNA

void setDNA(java.lang.String DNA)
Set the DNA string to be parsed

Parameters:
DNA - A string of amino acids I,T,Y,A

parse

java.util.List<Gene> parse()
Parses the DNA string previously by setDNA().
 Rules for parsing DNA are as follows: - Parse from left to right. - When
 you encounter the start sequence IAY, begin a gene. - The next occurrence
 of the end sequence TYI terminates the gene. - If already inside a gene,
 IAY does not start another gene. - The start and end sequences are not a
 part of the gene. - Ignore any empty genes. - A string of DNA may contain
 duplicate genes. In this case, the parser does not remove duplicates
 before returning.
 

Returns:
A list of genes found in the string.