Folding is one of the most useful tools in functional
programming. You will use folds many times the semester. The following
exercises are designed to get you comfortable with folds that perform
useful computations on lists. Note: All of these functions can be
written without folding, but the point of the exercise is to get used
to folding, so every function must include
List.fold_right. Note that
tail recursive and
List.fold_right is not. For full
credit, your solution must not contain the
min2 : int list -> intthat returns the second-smallest element of a list, when put into sorted order. Note that if list contains duplicates, the second-smallest element and the smallest element may be identical; your code should return it.
min2 [2110; 4820; 3110; 4120] = 3110.
consec_dedupe : ('a -> 'a -> bool) ->'a list -> 'a listthat removes consecutive duplicate values from a list. More specifically, the
consec_dedupefunction takes two arguments: a function
equivrepresenting an equivalence relation, and a list
lst. It returns a list containing the same elements as
lst, but without any duplicates, where two elements are considered equal if applying
equivto them yields
consec_dedupe (=) [1; 1; 1; 3; 4; 4; 3] = [1; 3; 4; 3].
prefixes: 'a list -> ('a list) listthat returns a list of all non-empty prefixes of a list, ordered from shortest to longest.
prefixes [1;2;3;4] = [; [1;2]; [1;2;3]; [1;2;3;4]].
k_sublist : int list -> int -> int list. Given a list of integers
lstand an integer
k, the function
k_sublistcomputes the contiguous sublist of length
kwhose elements have the largest sum.
k_sublist [1; 2; 4; 5; 8] 3 = [4; 5; 8].
powerset : 'a list -> 'a list listsuch that
powerset sreturns the powerset of
s, interpreting lists as unordered sets. You may assume there are no duplicate elements in
s. More precisely, for every subset
s, there is a list in
powerset sthat contains exactly the elements of
powerset [1; 2; 3] = [; ;
; ; [1; 2]; [1; 3]; [2; 3]; [1; 2; 3]].
The order of the sublists you return as well as the order of the elements in each sublist do not matter. Your program may return the elements in a different order from the example above.
In this part you will use higher-order functions to implement an interpreter for a simple stack language. The commands in this language are:
||Initializes the stack to empty. This is always the first command in a program and never appears again in a program.|
Pushes the integer
||Removes the top element of the stack.|
||Pops the two top elements of the stack, computes their sum, and pushes the result back onto the stack.|
||Pops the two top elements of the stack, computes their product, and pushes the result back onto the stack.|
Pops the top element of the stack
||Pushes a duplicate copy of the top element onto the stack.|
||Terminates the execution of the program. This is always the last command.|
We can implement each of the commands as OCaml functions, in a way
that lets us write a series of stack-language commands directly as
OCaml expressions. The value of the OCaml expression should be the
final stack configuration. For example, the following series of
start (push 2) (push 3) (push 4) mul mul halt
evaluates to a stack containing a single element,
haltcommands are already done for you in the starter code provided on CMS. You should represent the stack as a list, such that the previous example returns the stack . Your code should fail with
StackExceptionif a command cannot be performed on the current stack. For example, the execution of the command add in the program
start add haltshould raise a
|| Stores the value current value on top of the stack in memory
under the string
|| Loads the value associated with the string
start, assume the memory is empty, and for
haltyou can throw away anything in memory (that is, just return the stack). If done correctly, programs without the new commands should behave identically, with no additional syntax required. If you have trouble with
loadworking, but have the first part working correctly, submit the working version for the first part and a commented-out version of what you did for the second part. This way we can test your first part before you modified them for the second part, and then award partial credit for the second part.
In this problem, you will write a function that implements a very
basic search engine, using term frequency-inverse document frequency
weighting to rank the similarity between documents. Your function will
take a search query
q as input and a database of
documents db and returns the identifier of the document
most similar to the terms appearing in the query.
First, a little background. The term frequency (TF) of a
t in a document
d is defined as the
number of times that
t appears in
d. We will
use TF to give more weight to documents that contain many occurrences
of a given term mentioned in the search query.
The inverse document frequency (IDF) of a term
is defined as
N is the total
number of documents, and
df is the number of documents
a occurs in. We will use IDF to reduce the weight of
terms which are very common and appear in many documents (e.g.,
"the") and hence are of less use.
In addition, we will weight these basic statistics by a logarithmic
factor. The log-weighted TF is
tf is greater
tfis the statistic described above. Likewise, the
log-weighted IDF is
To compute the similarity between a term and a document, we multiply the log-weighted TF of the term by its log-weighted IDF. To compute the similarity between the search query and a document, we sum up the similarities computed for each term in the query.
tf : string list -> string -> floatthat takes in a document as a string list, and a term as a string, and returns the log-weighted TF as described above.
tf ["3110"; "is"; "fun"; "or"; "is"; "it"] "is" = 1.30102999566398125
tf ["3110"; "is"; "fun"; "or"; "is"; "it"] "fun" = 1.0
idf:string list list -> string -> floatthat takes in the database (list of documents) and a term and returns the log-weighted IDF as described above.
idf [["Ocaml"; "is"; "fun"];["because";"fun";"is"; "a"]; ["keyword"]] "fun" = 0.176091259055681237
idf [["Ocaml"; "is"; "fun"];["because";"fun";"is"; "a"; "keyword"]] "fun" = 0.0
doc_rank: (int * string list) list -> string list -> int * floatthat takes a database of documents, represented as a list of pairs of integers (representing document identifiers) and strings (representing document contents); a search query, represented as a list of strings; and returns the identifier of the document with the highest similarity, as well as its similarity.
load_documentsand several test databases. The load_documents function takes a database and returns an (int*string list) list. For the sake of consistency sake, if two or more documents have the same score, your code should return the first one in the database.
doc_rank (load_documents "test1.txt") ["the"; "Bahia"] = (1, 1.28863187775142785)