Sunita Sarawagi: Explaining Differences in Multidimensional Aggregates. VLDB 1999: 42-53
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1 (1):29-53, 1997.
Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi: On the Computation of Multidimensional Aggregates. VLDB 1996: 506-521
S. Sarawagi, R. Agrawal, N. Megiddo "Discovery-driven
exploration of OLAP data cubes", Proc. of the Sixth Int'l
Conference on Extending Database Technology (EDBT), Valencia, Spain, March
1998. PDF
format. Abstract.
Expanded version available as IBM
Research Report RJ 10102 (91918) , January 1998. PDF
format.
Class was cancelled 11/7.
Efficient algorithms for mining outliers from large data sets. Published in the Proceedings of the ACM SIGMOD Conference, 2000.
Breunig S., Kriegel H.-P., Ng R., Sander J.: LOF: Identifying
Density-Based Local Outliers, Proc. ACM SIGMOD Int. Conf. on Management
of Data (SIGMOD 2000), Dallas, TX, 2000.
Paper
(pdf 312K)
Real-World Data is
Dirty: Data Cleansing and The Merge/Purge Problem, M. Hernandez and S.
Stolfo,
Journal of Data Mining and Knowledge Discovery, 1997.
Jeff A. Bilmes. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models (1998).
Classic reference: The paper that started it all. Cornell people can get this paper online, although printing it is a pain. (scanned in, and some restrictions apply). Maximum Likelihood from Incomplete Data via the EM Algorithm , pp. 1-38 A. P. Dempster, N. M. Laird, D. B. Rubin [ Citation / Abstract ] [ View Article ] [ Print ] [ Download ]
A useful reference is Michael Collins' (UPenn) exam paper The EM Algorithm. http://www.cis.upenn.edu/~mcollins/papers/wpeII.4.ps
Edwin M. Knorr and Raymond T. Ng. "Finding Intensional Knowledge of Distance-Based Outliers", Proc. VLDB, Edinburgh, Scotland, September 7-10, 1999, pp. 211-222. Postscript
Edwin M. Knorr, Raymond T. Ng, and Vladimir Tucakov. "Distance-Based Outliers: Algorithms and Applications", The VLDB Journal, 8(3), February, 2000, pp. 237-253. Postscript or Compressed Postscript. This is the conference version of the paper: Edwin M. Knorr and Raymond T. Ng. "Algorithms for Mining Distance-Based Outliers in Large Datasets", Proceedings of the 24th VLDB Conference, New York, August 24-27, 1998, pp. 392-403. Postscript
Edwin M. Knorr and Raymond T. Ng. "A Unified Notion of Outliers: Properties and Computation", Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, August 14-17, 1997, AAAI Press, pp. 219-222. Postscript
J. E. Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, and Wei-Yin Loh. BOAT -- Optimistic Decision Tree Construction. In Proceedings of the 1999 SIGMOD Conference, Philadelphia, Pennsylvania, 1999.
Johannes Gehrke, Wei-Yin Loh, and Raghu Ramakrishnan. Data Mining with Decision Trees. (Slides and References.) Tutorial at the 1999 SIGKDD Conference, San Diego, California, 1999.
R. Agrawal, R. Srikant: ``Mining
Sequential Patterns'', Proc. of the Int'l Conference on Data Engineering
(ICDE), Taipei, Taiwan, March 1995. PDF
format. Abstract.
Expanded version available as IBM
Research Report RJ9910, October 1994. PDF
format.
Nevill-Manning, C.G. and Witten, I.H. (1997) " Identifying Hierarchical Structure in Sequences: A linear-time algorithm ," Journal of Artificial Intelligence Research, 7, 67-82.
R. Agrawal, R. Srikant: ``Fast
Algorithms for Mining Association Rules'', Proc. of the 20th Int'l
Conference on Very Large Databases, Santiago, Chile, Sept. 1994. PDF
format. Abstract.
Expanded version available as IBM
Research Report RJ9839, June 1994. PDF
format.
R. J. Bayardo Jr., "Efficiently Mining Long Patterns from Databases", Proc. of the ACM SIGMOD Conference on Management of Data, Seattle, Washington, 85-93, June 1998. PDF format. Abstract.
R. J. Bayardo Jr., R. Agrawal, and D. Gunopulos. "Constraint-Based
Rule Mining in Large, Dense Databases". Proc. of the 15th Int'l
Conf. on Data Engineering, 188-197, Sydney, Australia, March 1999. PDF
format. Abstract.
Expanded version available as IBM
Research Report RJ 10146, July 1999. PDF
Format.