No Slide Title

Question 2: Similarity matrix


(a) What is a similarity measure?

(b) What is a similarity matrix?

(c) Suppose that you are clustering documents based on co-

occurrence of citations. Suggest a similarity measure that you

might use.

(d) Explain the ideas behind the inverted file algorithm for

calculating a similarity matrix.