CS 1132 lab exercise 5

Do all three problems first using Matlab, testing and debugging using your own analysis and test cases. Then submit your solutions for Problems 2 & 3 on Matlab Grader by Wednesday, October 13, at 2:30pm (before next week’s lab).

1. chars, not strings

In Matlab, there is the type char and a new type string. We will work with the basic type char only. A string of text, e.g., a word, is a vector of characters. Here is an example: think of the string of text “key lime” as the length-8 vector ['k','e','y',' ','l','i','m','e']. Notice that a space is a character. Type each of the following statements in the Command Window and note the result.

a= pi;   % A numeric scalar
b= 'pi'  % A char array. Use SINGLE quotes to enclose a char or multiple chars

c= length(b)            % __________  b is an array, so one can use function length on it

d= ['apple '  b  'es']  % Vector concatenation. d should be the string 'apple pies'

e= [d; 'muffin']        % __________________________________________
e= [d; 'mmmuffins ']    % Note the two extra 'm's and one trailing space

[nr,nc]= size(e)        % _____________ e is a matrix, so one can use function size on it

f= e(1, 7:9)            % _________________________________________ Accessing a subarray

e(1, 7:10)= 'core'      % _________________________________________

g= ones(2,3)*67;        % A NUMERIC 2-by-3 matrix, each component has the value 67

h= char(g)              % ________________________________ 67th character in ASCII table

i= double(h)            % ______________________________ Get ASCII code of the character 

jj= char(floor(rand(1)*26) + 'A')  % ________________________ A random upper case letter

k= jj>='a' && jj<='z'   % __________ True or false: character stored in jj is lower case

L= jj - 'A'             % __________ "Distance" of a character from 'A'

M= jj - 'A' + 1         % ______ A convenient way to use letters as indices from 1 to 26.
                        % If jj is 'A', then M is 1; if jj is 'B', then M is 2; ...

n= strcmp('abcd', 'ab') % ________________________________ strcmp compares the arguments

o= 'abcd'=='ab'         % ERROR: attempted vectorized code on vectors of different lengths

p= 'abcd'=='abCd'       % __________________________ Vectorized code--result is a vector

q= sum('abcd'=='abCd')  % __________________________ The number of matches

r= sum('abcd'~='abCd')  % __________________________ The number of mismatches

2. Counting a DNA pattern

Write a function countPattern(dna,p) that returns how many times a pattern p occurs in dna. Assume both parameters to be char vectors that contain the letters ‘A’, ‘T’, ‘C’, and ‘G’ only. Note that if p is longer than dna, then p appears in dna zero times. Use a loop to solve this problem.

  1. (Version 1) Use the built-in function strcmp() to compare two char vectors.
  2. (Version 2) Do not use strcmp(); instead use vectorized code and sum as demonstrated in Part 1 above to compare two char vectors.

3. Splitting a 1-d char array

Write a function splitString(str, sep) that given a 1-d char array str and a separating character sep splits str at the locations of sep. The function should return a 1-d cell array of 1-d char arrays, each containing an individual piece of the original str but not including the occurrences of the separating character. For example, if

str = 'MATLAB is cool!!';
sep = ' ';                 % a single space

then splitString(str, sep) should return the length 3 cell array

  {'MATLAB', 'is', 'cool!!'}

You can assume that neither str nor sep is empty and sep never appears side by side in str. Do not use any built-in functions other than length().

4. Obtaining subarrays—simple array vs. cell array

Type the following commands into the Matlab Command Window and observe both the screen output and the class (type) name in the Workspace pane.

m= rand(4,3)  % A 4-row-by-3-column matrix of numbers

a= m(2,:)     % One row vector that is the 2nd row of m
              % Note use of ()

C= {'ant', 'boo', 'cat'}  % A row cell array of length 3


D= C{2}                   % ______________________________ Note use of {}
                          % Braces are used to access an INDIVIDUAL cell.
                          % D is a string, i.e., a simple char array
                          

E= C(2:3)                 % ______________________________ Note use of ()
                          % Parentheses are used because that is the
                          % notation for VECTORIZED, as in multi-cell, code.
                          % E is a cell array; it has length 2.