CS 501
Software Engineering
Spring 2003

Project Concepts

MIDAS


Title

Updating a DOS Social Science lab tool, MIDAS, for the Web

Client

Professor Michael Macy, Department Chair, Sociology (mwm14@cornell.edu)
Noni Korf Vidal, Instructional Designer, Academic Technology Center, CIT (nk11@cornell.edu)

Project outline

Professor Michael Macy developed a DOS-based software program in 1989 called MIDAS (Microcomputer Integrated Data Analysis System). This program interacts with a large public domain data set via the statistical program STATA. It enables non-statisticians to review actual data in meaningful ways. His aim was to incorporate quantitative reasoning and empirical research throughout the undergraduate sociology and political science curriculum. Although this program was used successfully for over 10 years, it is now in danger of dying unless it can evolve to run under Windows. 

Professor Macy was approached two years ago by a commercial enterprise that was so interested in the product that they purchased the rights to develop the program. In the end, for various reasons, including a downturn in the online ed dot com market, and after spending quite a bit of money on the rights and development costs, they abandoned the project. The rights to develop the project were returned to Professor Macy. The pedagogical approach and functionality are of obvious value�is it possible within an academic environment to implement a project that industry failed to complete? 

There are several approaches to this project, and several possibilities of ways to break it down into either phases of development or sets of tools. At first glance, project activities could include the following:

At the outset of the project, after a requirements and scoping phase, expectations would be set about the progress that could be made during a semester. For work that could not be completed during the semester, an estimate would be drawn up for hours of work remaining. And there would be the possibility of paid work in the summer. Obviously, this is a real project with an existing real implementation, a forecast for real implementation in future Cornell classes, and even perhaps real uses beyond Cornell. 

Resources Available

Cornell has a STATA site license the General Social Survey data set is in the public domain there is an existing working copy of the DOS-based program.


Below there are some additional comments about MIDAS that were part of a recent proposal that Professor Macy made to earn a Faculty Innovation in Teaching Grant. 

MIDAS is not a statistics program but an expert user interface designed to facilitate the incorporation of quantitative reasoning and empirical research into the general social science curriculum. MIDAS uses a "point-and-click" user interface to link a powerful statistical engine (STATA, from Computing Resource Center) with a massive social science data archive -- the General Social Survey. The menu-based interface guides the user step-by-step from the selection of variables to the interpretation of the results. Any time the student wonders what to do, they can press the Help Key and get a clear, context-sensitive explanation. The program also provides extensive explanations of statistical concepts and procedures through its online statistical glossary, as well as detailed codebook information on all variables. 

After the student starts the program, MIDAS responds with a menu of available datasets, such as the GSS, US Census, or a more specialized dataset that the instructor has created and installed. After pressing HELP for a brief description of the datasets, the student picks one, say the GSS. MIDAS then offers a menu of topics into which the 750 variables have been organized, and then a menu of variables, identified by the 40-character labels. The student can again press HELP to read the original codebook documentation describing what the variable measures in greater detail. After picking the variables they want to use, the student has the choice of analyzing one or more of those variables as well as using a variable to focus their analysis on a specific subpopulation. For each procedure, the student has only to press HELP for a full explanation. 

MIDAS interprets the menu-choices made by the student, locates and retrieves the necessary variables from the data archive, and writes and submits a program for STATA which is running invisibly in "background." MIDAS then displays the graphs, interprets the STATA output, and writes a report in "plain English," translating the results (including significance tests) into non-technical language. 

Throughout this process, MIDAS keeps track of the selected dataset, variables, cases, skip patterns, measurement level, number of values, and causal order and decides on the fly what procedural options to give the user, what statistical tests to use, and how best to construct graphs and tables. Knowledge of the measurement limitations of the data, statistical rules, design of graphs and tables, and even the statistical meaning of the results, are built into the program so that novice users cannot apply inappropriate procedures, lose all their cases, or be baffled by arcane presentation of the findings. If something still manages to go wrong (e.g. losing all one's cases because of a control variable with many missing values) MIDAS diagnoses the problem and suggests that the student drop that control. Novice users cannot apply inappropriate procedures (like correlating nominal variables) or be baffled by arcane presentation of the findings. The program thus frees the student to focus on the substantive dimensions of their assignment while exposing them to the research process using appropriate methodology. 

MIDAS includes a full range of univariate, bivariate, and multivariate procedures, with an emphasis on graphic displays. Users can focus their analysis on subpopulations and can also look at relations between variables while controlling for confounding factors. Results are presented along with non-technical annotations that explain what the numbers mean. Where appropriate, the output concludes with a "Technical Report" that exposes students to more sophisticated and detailed presentation of results as well. 

MIDAS imposes several important limitations. It will not recode or transform variables, does not allow N-way contingency tables, ANOVA, time-series analysis, and other advanced procedures. However, as students become more adept, MIDAS grows with them, eventually letting them "go behind" its menus and analyze their data directly with STATA. 

The result is a structured research process that is flexible, manageable, and user-responsive yet also remarkably bulletproof. The program frees the student to focus on the substantive dimensions of their assignment while exposing them to the research process using appropriate methodology.


[CS 501 Home Page]

William Y. Arms
(wya@cs.cornell.edu)

Last changed: January 27, 2003