02/01 NOTE: + contains helpful items not covered in lecture! + pay attention to vector operations and try them out yourself to learn what they do! Announcements + Hand in E2 + Marianna Klochko, mak47@cornell.edu, research assistant in Organizational Behavior department, needs paid volunteers for experiments + E3 already posted; due in lecture Tuesday Reminders + newsgroup required. start homework early. + P2 due Thu 2/8, R1 at 3pm on Sun 2/11, T1 at 7:30pm on Tue 2/13 + bonuses: online evaluation, asking questions, finding errors Topics = P2 help + strings + $-notation + random numbers + more loops + more vectors and operations _______________________________________________________________________________ P2 connections P2.1 structs + use strings to represent color P2.2 e series + use loops or element-wise vector operations + use $sort$ + $factorial$ won't take a list, but $cumprod$ can be used to replace it P2.3 dna + strings are a special case of vectors P2.4 bacteria + use ideas from Pascal's triangle P2.5 simulation + reuse ideas from birthday problem _______________________________________________________________________________ $-notation (CS100 invention -- not used elsewhere) + need way to indicate in text which items are program elements, e.g. variables or function names + cannot use double quotes (") since they are Java program elements + cannot use single quotes (') since they are Java and Matlab program elements + so use dollar sign ($) since $ is not a Matlab program element and, for us, is never used as a Java program element note: + matlab has online help (e.g. try $help plot$), which you should use + matlab's online uses capitalization to indicate variable and function names + BAD! matlab is case sensitive, so how can you tell what to capitalize? + BAD! only works for letters, but sometimes need to mark digits and/or punctuation as program elements version 6 of matlab has another help system -- access using the Help menu + probably uses bold or different font to indicate program elements _______________________________________________________________________________ string = text = what you read = sequence of characters character + one symbol + in matlab, put inside single quotes: $'a'$, $'A'$, $'3'$, $' '$ + for the single quote as a character, type it in twice: $''''$ + note: do not confuse $''''$ (single quote character) with $'"'$ (double quote character) + internally represented as a "small" (magnitude) number, but for now, we don't care what the numbers are string + list of characters, i.e. special kind of vector + examples: $'Babylon 5'$, $''$, $'can''t'$, $'hello world'$, $'78'$ note: now know how to represent colors in P2.1, e.g. $'cyan'$, $'pink'$ _______________________________________________________________________________ question: what is "expected" (average) max frequency of shared birthdays? solution: do computer simulations (experiments) algorithm: throw darts at a calendar (assume all days exposed, equally likely) count max number of darts on a day problem: we get some number. is it expected? solution: use multiple trials (i.e. run the experiment many times) to find the expected value _______________________________________________________________________________ random numbers + $rand$: return random number in range 0(----)1 + how get random integers 1, 2, or 3? (equally likely) + round(rand*3)? NO! returns 0 + 1+round(rand*2)? NO! numbers not equally likely: 0(--).5(--1--)1.5(--)2 0 1 2 $1$ is equally likely as $0$ and $2$ combined + use $ceil(rand*3)$ or $1+floor(rand*3)$ ceil: 1 2 3 0(----)1(----)2(----)3 floor: 0 1 2 _______________________________________________________________________________ matlab code to solve problem: >> ntrials = 10000; % number of trials >> nbuckets = 365; % number of buckets >> ndarts = 300; % number of darts >> freqlimit = 20; % for now, guess $maxfreq$<=$freqlimit$ (smarter later) >> count = zeros(1,freqlimit); >> for k = 1:ntrials >> % perform one trial: throw $ndarts$ darts at $nbuckets$ buckets >> % and compute maximum frequency $maxfreq$ >> buckets = zeros(1,nbuckets); >> % throw darts >> for i = 1:ndarts >> dart = ceil(rand*nbuckets); >> buckets(dart) = 1+buckets(dart); >> end >> maxfreq = max(buckets); >> count(maxfreq) = 1 + count(maxfreq); >> end notes + $-notation: easier to spot variable names and other program elements + named constants ($ntrials$, $nbuckets$, $ndarts$) make it easy to modify parameters without changing many lines of code + variable/constant definitions are allowed to go off to the side + indentation under statement-comments + indentation of loop bodies + good comments don't mention irrelevant details: to understand at high-level, can ignore statements under statement comments! + $buckets$ are reset to empty for each trial: for each trial, throw darts at a fresh, "empty" calendar questions: what to do with $count$ data? how plot histogram? how print mean? >> bar(count) % plot histogram >> sum(count .* (1:length(count))) / sum(count) % print mean note: now know how to do simulation part of P2.5 (coincidences) _______________________________________________________________________________ vector operations (principle: avoid loops for sequential access) dimensions: + a vector can be a *row* (horizontal) or *column* (vertical) + in matlab, every vector has 2 dimensions: 1st dimension is vertical 2nd dimension is horizontal + for now, we use only 1-dimensional vectors + $size(x,d)$: size of $x$ along $d$-th dimension: $size(x,1)$, $size(x,2)$ + $[m,n] = size(x)$: assign both vertical and horizontal dimensions + $length(x)$: size of longest dimension creation + *scalar* is 1-by-1, e.g. int (integer), double, char (character) + $[]$ is empty + "concatenation" = "glue together" + "juxtaposition" = "write side by side", e.g. multiplication: 2x = 2*x + horizontal concatenation: juxtaposition within [ ] + vertical concatenation: semicolon or newline within [ ] + $:$ create vector with evenly spaced numbers: $lo:hi$, $lo:step:hi$, $hi:step:lo$ ! $linspace(lo,hi)$, $linspace(lo,hi,n)$ + $zeros(1,n)$, $zeros(m,1)$, $ones(1,n)$, $ones(m,1)$ ! careful: $zeros(n)$, $ones(n)$ are square 2-D matrices + special case: strings careful! *overloading* + meaning of $;$ depends on context: "suppress printing" versus "vert. cat." + somehow figure out which meaning from "context" + e.g. plus (addition) for us is different on ints, fractions, time of day functions and operations + transposition: $'$, i.e. switch between vertical and horizontal + $sort$, $fliplr$, $flipud$ + accumulation: $min$, $max$, $sum$, $prod$, $cumsum$, $cumprod$ example: $cumsum([1 3 8])$ is [1 1+3 1+3+8], i.e. cumulative sums + pairwise: $diff$ example: $diff([1 3 8 2])$ is [3-1 8-3 2-8], i.e. pairwise differences + element-wise: $abs$, $exp$, $log$, $sqrt$ + element-wise with another vector of same length: $+$, $-$, $.*$, $./$, $.^$ + element-wise with scalar: $+$, $-$, $*$, $v/s$, $v./s$, $s./v$, $s.^v$, $v.^s$ + logical (0=false, 1=true): $<$, $<=$, $==$, $>=$, $>$, $~=$ + special case (strings only): $upper$, $lower$ note: try to use $*$, $/$ instead of $.*$, $./$ when legal for the curious: why $.*$, $./$, $.^$ instead of $*$, $/$, $^$? + answer: $+$, $-$, $*$, $/$, $^$ are matrix operations, + matrix addition and subtraction are as you expect + matrix multiplication (and therefore division and exponentiation) are kind of weird examples: + $sum(2 .^ (0:4))$ = 2^0 + 2^1 + 2^2 + 2^3 + 2^4 + $sum( s .^ [5 3 1 0] .* [7 4 -8 6])$ = 7 s^5 + 4 s^3 - 8 s + 6 + $sum(x == 5)$ = number of 5s in vector x + $sum('hello world' == 'l')$ = number of 'l's in "hello world" note: now know how to do computations in P2.2 (e series) note: now know how to do data analysis parts of P2.5 (coincidences) note: now know how to manipulate strings for P2.3 (dna) _______________________________________________________________________________ another example: print rows 0..N of pascal's triangle notes: + by "row", we mean "the non-zero elements of a row" + print 1+N rows: 0, 1, 2, ..., N + for now, without nice formatting triangle matlab representation of row i 0 0 1 0 0 [ 1 ] 0 \ / \ / \ / \ / |\ 0 1 1 0 [ 1 1 ] 1 / \ / \ / \ / \ |\|\ 0 1 2 1 0 [ 1 2 1 ] 2 \ / \ / \ / \ / |\|\|\ 1 3 3 1 [ 1 3 3 1 ] 3 / \ / \ / \ / \ |\|\|\|\ 1 4 6 4 1 [ 1 4 6 4 1 ] 4 solution 1: >> N = 4; >> row = [ 1 ]; >> disp(row); >> for i = 1:N >> new = zeros(1,i+1); >> for j = 1:i >> new(j) = new(j) + row(j); >> new(j+1) = new(j+1) + row(j); >> end >> row = new; >> disp(row); >> end observation: [ 1 3 3 1 ] makes 2 copies: [ 1 3 3 1 ] and [ 1 3 3 1 ] |\|\|\|\ | | | | \ \ \ \ 1 3 3 1 1 3 3 1 so the result is the sum [ 1 3 3 1 ] |\|\|\|\ 1 3 3 1 * = [ row 0 ] * 1 3 3 1 = [ 0 row ] [ 1 4 6 4 1 ] solution 2: >> N = 4; >> row = [ 1 ]; >> disp(row); >> for i = 1:N >> row = [row 0] + [0 row]; >> disp(row); >> end note similarities with P2.4 (bacteria): + both take an old row/population and produce a new row/population + both process each item twice (add left + add right versus age + replicate) + both can be solved with nested loops or with just one loop (might also need conditionals -- only mature bacteria replicate) _______________________________________________________________________________ answers to questions asked in lecture: Q: in $sum(x==5)$ counting "number of 5s", does 55 count as two 5s or zero 5s? A: as zero 5s. Q: in birthday code, if a dart (birthday) is missing, would a zero automatically appear in $count$? A: yes, that is what $count=zeros(1,freqlimit);$ is for. Q: how does $round$ round remainders of .5? A1: try it out and see! A2: rounds towards the larger magnitude integer: $round(2.5)$ = 3, $round(1.5)$ = 2, $round(-1.5)$ = -2 Q: in birthday code, what happens if in some trial we see $maxfreq$>20? A: then $count(maxfreq)=1+count(maxfreq)$ tries to read the value of a non-existent variable, so matlab would complain. Q: with $-notation, do you actually $s into matlab code? A: yes, inside comments, no otherwise. e.g., if i tell you to type $help plot$ into Matlab, then at the $>>$ prompt you would type just "help plot" (without the quotes): >> help plot Q: how many decimals does $rand$ give? A: about 16. _______________________________________________________________________________ (bonus) technical tangent to $rand$ accuracy question: + suppose we compute with non-zero floating point numbers of the form $x.y * 10^z$ (e.g. $1.3 * 10^20$), where + $x$ is a non-zero digit (integer from 1 to 9, inclusive) + $y$ is a digit (integer from 0 to 9, inclusive) + $z$ is a signed (+ or -) two-digit integer (from -99 to +99, inclusive) + suppose we have a fair 10-sided die with numbers 0 to 9 on its sides. how do we generate a random floating point number x, 0