Question 2: Similarity matrix
(a)  What is a similarity measure?
(b)  What is a similarity matrix?
(c)  Suppose that you are clustering documents based on co-
occurrence of citations.  Suggest a similarity measure that you
might use.
(d)  Explain the ideas behind the inverted file algorithm for
calculating a similarity matrix.
3