Art Munson @ Cornell University

Art waving while sitting in a kayak.

May 2010: I just finished my Ph.D. in Cornell's computer science department. Please contact me if you would like a copy of my dissertation for research purposes.

My research focus is on applied machine learning and data mining, with a wide interest in areas of applications. In no particular order, I am interested in natural language processing problems, dimensionality reduction techniques, anomaly detection (esp. in the domain of security), information-theoretic approaches to learning, and the intersection of machine learning and systems research (e.g. can we learn how to monitor or even tune a complex system so that it dynamically adapts to changes in running conditions?). I am also intrigued by the idea of programs "talking" to each other and the opportunities and requirements for such a computing environment.

Since my arrival in 2003 I have worked on a variety of projects including:

My advisor is Rich Caruana.

Contact information (email and phone are best choices):

Art Munson
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
607-255-4428 (FAX)


Professional Activities

Current Work

NEW! I'm conducting a survey on the difficulty and importance of various modeling steps.

Currently most of my time is spent in a collaboration with Cornell's Lab of Ornithology. The Lab of O, as we call them, is building a large data warehouse of bird observation data as part of their Avian Knowledge Network project. This data is collected from across North America and spans a number of years. There are several interesting aspects of this data, not the least of which is that many observations are collected by volunteers---people who just plain like birds. The project challenges include: many missing values, noise, and the requirement to ultimately build understandable models. And of course, more data is collected yearly, so the solutions we find need to scale.

We have successfully a) built bagged tree models for the winter feeding habits of almost 100 bird species across the contintental United States, b) analyzed the models to determine which features (a.k.a. predictor variables if you speak statistician) are most important to a model's predictions, and c) isolated and plotted the effects of the most important features on the probability of seeing particular birds. You can find a paper from KDD 2006 that describes this work in my publications list. The analysis results are publically available through the Avian Knowledge Network (Warning: the Lab of O people are constantly tweaking, revising, and updating this website, so the link might not work. You can probably find it from the AKN home page under Exporatory Analysis.)

My current focus is finding a way to reduce the number of features while maintaining the performance level achievable using the full set (currently at 500 features and counting). The motivation is that there are too many closely correlated (or loosely correlated but related) features. For example, we have more than 30 features that describe human population, taken from the 2000 US census. What we would really like is to find one (or a few) constructed feature(s) that captures all the information in those 30 human population features that is needed to make predictions about bird abundance. That would make studying the effects of important features much easier. In some sense, we are searching for the latent factors that really matter for making the predictions. The interesting wrinkle is the natural grouping of features into related clusters. One of our goals is to preserve this grouping (i.e. discovered factors correspond to a single group) to improve the understandability of the factors.

Things I Wish I Found Sooner

Resources for Technical Paper Reviewing

Links about Publishing Research and Access to Published Work

Computer Science Education: Things to Consider

Fun Artificial Intelligence

Fun Computer Science

Web Standards: HTML vs XHTML

(Hmmm, I guess this page does not follow these best practices...)


Words to live by

I am not very diligent about updating this page; please check the last modified date below to gauge how reliable the information is.