CS 430 / INFO 430
Information Retrieval
Fall 2006
Questions and Answers

General questions

Question: I am running late on the assignment and will not be finished by the time it is due. Can I get an extension?

Answer: It is often possible to get an extension, if you request it before the assignment is due. Speak to the instructor after class or send email to wya@cs.cornell.edu.

Question: How should file names be specified in Java programs?

Answer: While not required, the graders would be grateful if you also kept the following things in mind

Java is meant to be portable but it allows file paths to be system dependent. For example, the command new File("sourcefiles\datafile1.txt") will only function properly on a Windows machine. Consider the following alternatives for hard coding path names:

new File(new URI("sourcefiles/datafile1.txt"))
new File("sourcefiles", "datafile1.txt")
new File("sourcefiles"+File.pathSeparator+"datafile1.txt")

Alternatively, do not hard code paths. Keep them in an external file, or allow the user to specify them on the command line (making note of this in your report).

When zipping your files for submission, zip a directory containing your files rather than the files themselves.

Not hard coding filenames is a good thing. It makes your code and your program modular. However, assignments often include large lists of files. If you avoid hard coding filenames, please don't make your grader type in all the filenames by hand (and be sure note how the grader can avoid this in your instructions).

Assignment 2

Question: How do I choose k, the number of singular values used to represent the concepts?

Answer: In the paper, the authors carried out a systematic experiment. This required a set of test queries for which the relevant documents were known. Since, you do not have such a set of tests, you will have to use your judgment. Perhaps you could make a series of test runs with several values of k. Or you could inspect the values of S. Or you could look at the original data and estimate the number of distinct concepts. There is no correct answer.


Question: What is intended by, "Store the representation of the pages in the concept space"?

Answer: Both terms and documents can be represented in the concept space by some product of the T, S, and D matrices. It is possible to carry out the full multiplications every time that you wish to refer to a document, but it is more efficient to carry out the calculation a single time. Either is acceptable, but state in your report which you choose.

Assignment 3


[Home]



Last changed: October 15, 2006