The archive contains two files:
Because the data set is large and most implementations of k-nn are O(n2) in the number of cases in the training set, we strongly suggest you develop and debug your code using a smaller sample until you are sure it is working, then run the final experiments with the entire data set. If you have performance problems such a large data set, you should do experiments using a sample from the data set, such as 5000 or 10000 cases.
We apologize for the delay in making the dataset available. We had trouble dealing with missing values.