Lillian Lee and Regina Barzilay's research featured in NYTimes
section features the work of Lillian Lee and Regina Barzilay:
CS Students win ACM Greater New York 2003 Regional Programming Contest

Cornell's Department of Computer Science is pleased to announce that CS students have won the ACM Regional Collegiate Programming Contest held on November 9th at New York Institute of Technology. This is their third victory in a row.
The CS programming team, consisting of Bill Barksdale and Pet Chean Ang (juniors), and Xin Qi (first year PhD student) placed first in a competition of 60 teams from 28 schools, with 7 problems solved. They will advance to the World Finals to be held in Prague, Czech Republic this spring. A team from Columbia University with 6 problems solved came second place. New York University, with 5 problems solved, came in third place.
The department's second team, consisting of Alex Harn (freshman), Dongjae Lim and Bo Wang (juniors) took a nice 12th place. They can do even better next year!
The trip was sponsored by Greenhills Software.
The official contest homepage:
http://www.acmgnyr.org/
Some photos from the trip:
http://www.cs.cornell.edu/people/mpal/contest/03regionals/
Two CU computer science professors receive NSF Early Career Awards
By Bill Steele
Cornell Chronicle (September 11, 2003)
Two Cornell researchers have been awarded Faculty Early Career Development (Career) grants from the National Science Foundation (NSF). They are Thorsten Joachims and Jayavel Shanmugasundaram, both assistant professors of computer science. Coincidentally, both are working on better ways to search the Web.
The Career program is the NSF's most prestigious award for new faculty members, designed to recognize and support the early career-development activities of those teacher-scholars "who are most likely to become the academic leaders of the 21st century." Each award carries a substantial grant to support the faculty member's research.
Before coming to Cornell in the fall of 2001, Shanmugasundaram worked for two years at the IBM Almaden Research Center in San Jose, Calif., a major center for database research. He will use his five-year, $406,750 NSF grant to develop a new data management system that can search plain text or query structured databases and, if necessary, combine the results. He hopes to make it possible to search plain text with the kinds of sophisticated queries possible in databases, such as searching for a limited range of prices or dates.
One application is to make accessible what's called the "deep Web"-- information stored in formally structured relational databases that can be reached online but ordinarily can be searched only one at a time by proprietary systems. Examples range from eBay's list of items for auction to lists of used cars maintained by thousands of individual car dealers. A prototype Web site that will allow sophisticated searches of Shakespeare's plays will go online in a few months, he said.
Shanmugasundaram received a B.E. in computer science from the Regional Engineering College, Tiruchirappalli, India, in 1995, an M.S. in computer science from the University of Massachusetts-Amherst in 1997 and a Ph.D. in computer science from the University of Wisconsin-Madison in 2001.
Joachims, who receives a $400,000 NSF grant, is working on tools that will help individual Web users zero in on the material they want out of the vast array that sometimes appears in response to an Internet search. "The top 10 results are prime real estate," he pointed out. They should be the items the user most wants to see, not necessarily the ones Google or some other search engine selects, he said.
"I want to have a system that gets better the more you use it, that learns by looking over your shoulder," he explained. He is developing software that will notice which results users click on, then rank search results in accordance with their interests. Academics, for example, may be more likely to click on links to .edu than .com sites. "And when I type in 'Michael Jordan,' I usually mean the professor at Berkeley, not the basketball player," he said.
Joachims plans to build a prototype search engine in about a year, possibly to search Cornell's online physics arXiv, a collection of scholarly papers in physics and mathematics, which offers special challenges for searches.
Joachims received his B.S. in 1997 and Ph.D. in 2001, both in computer science, from the University of Dortmund, Germany. From 1994 to 1996 he was a visiting scholar at Carnegie Mellon University. He became a postdoctoral research associate with the Knowledge Discovery Team of Fraunhofer Institute of Autonomous Intelligent Systems in Germany before coming to Cornell in the fall of 2001.
Network Programming
Research in wireless sensor networks is moving to prime time, and undergrad Jay Ayres plays a leading role.
By Joe Wilensky
Reprinted from Cornell Engineering Magazine, Summer 2003
Take a young, dynamic computer science professor and add an ambitious and intellectually curious undergrad. Mix well against the backdrop of Cornell's reputation as a leading research university. The result: cutting-edge research in wireless sensor networks and an investment in the next generation of researchers in the field of computer science.
Jay Ayres '04 is the student; Johannes Gehrke is the prof. But what are wireless sensor networks? It's something of a futuristic idea reaching the prototype stage only recently.
The Defense Advanced Research Projects Agency (DARPA) began funding work on this concept at the University of California, Berkeley, in 1998, with a vision of "smart dust" for military applications. In their scenario, thousands of tiny wireless sensors the size of dust motes could be scattered over a battlefield without arousing the enemy or risking human life. These "dust particles" would organize themselves as a network, gather data on such things as troop location or presence of chemical warfare agents, then relay significant information back to headquarters. Like the Internet (another DARPA brainchild), these wireless sensor networks may have even more utility in non-military settings. Once you wrap your brain around the concept "that the physical world can become a computing platform"any number of consumer and research applications come to mind. For example, "intelligent" buildings outfitted with sensors could measure and adjust temperature, noise, and light and even respond to queries to report whether Johannes is in his office or if there's an empty seat available in a meeting room. Used in the environment, such networks could detect particular animal species and record patterns of movement or migration. Deployed in a forest, sensor networks could monitor fire emergencies. Networks could be used to control inventory, monitor product quality, and provide an interface for people who are disabled.
Clearly there is vast potential across a wide range of fields for using networks made up of many small sensor nodes, but there are many problems to be worked out, including power consumption, computing power, and communication quality. Working with Gehrke's research group, Ayres is taking on the challenge to make the sensor network itself do much of the processing of queries, thereby making it far more flexible as an adaptable system and more powerful for the user.
A Cornell Presidential Research Scholar, Ayres had little experience with research before coming to Cornell, other than a short paper on computer graphics he wrote in high school for a class assignment. "It wasn't anything like what I'm doing now," he says. "I didn't really know what college-level research would entail. I knew Cornell was a large research university, and that the research opportunities would definitely be something I would want to take advantage of while I'm here."
Up to 75 Cornell Presidential Research Scholars (CPRS) are admitted to Cornell each year in all seven undergraduate colleges. The four-year program offers each student an opportunity to work with a faculty mentor on an individualized program of faculty-directed research.
All CPRS students attend a colloquium during their freshman year to get acquainted with some of the research opportunities on campus. Ayres became involved with the Cornell Database Group, a group of faculty, researchers, and students in the Department of Computer Science who work on new database and data mining technologies.
Gehrke, an assistant professor in computer science and a member of the database group, received his Ph.D. from the University of Wisconsin at Madison and came to Cornell in 1999, just a year ahead of Ayres. He welcomed the collaboration with an eager freshman looking for exciting research work.
"It's like an investment," Gehrke says of working with undergraduates. "The first year, there's a very steep learning curve. It's during the second year that students start actually becoming productive." Ayres admits he didn't come to Cornell with much research experience, which he says is true for most freshmen. During his first winter break, he started reading research papers to get acquainted with the area of data mining research, in which he worked initially. "When I first started reading these research papers, it was very daunting just trying to understand the language," he says. "It was extremely technical. That was the biggest hurdle to overcome."
Ayres began his research with Gehrke by working on the Himalaya data mining project. Data mining uses algorithms to extract useful information from very large databases. Imagine a database, kept by a grocery chain, of every customer receipt issued in every store throughout the history of the chain's existence. (And such databases are not only imagination today.) Through data mining, customer buying patterns can be analyzed by finding sequences in the database. As a simple example, Gehrke offers that data mining could be used to figure out what percentage of customers who bought milk on one visit bought bread on the next visit.
His first year on the project, Ayres helped develop an algorithm that used a novel technique to quickly mine huge databases and return results. "At the time and probably even up until now, it is the fastest published algorithm for this problem that's out there," Gehrke says.
Ayres explains that the algorithm, which was implemented in C++, uses a bitmap method to store the transactional database as the algorithm is performing operations on it. That allows the algorithm to use simple Boolean and/or operations to find larger and larger sequences throughout the database.
"One of the ways the algorithm can be described is that it uses clever data structures," Gehrke says, "which are optimized for our current processors and for the specific problem." Gehrke describes the concept of a "market basket" what a customer buys on one visit, whether it's to a physical store or a website. What the algorithm can do is find temporal or sequential patterns in market baskets over time that suggest what customers like and how they make purchases, he says. An example of such a pattern might reveal a number of customers who first purchased the book The Lord of the Rings and later bought the second and third books in the trilogy. The algorithm can cull these temporal patterns from very large amounts of data.
Ayres worked on the data mining project with Gehrke for about a year and a half and published a paper, "Sequential Pattern Mining (SPAM)" Using a Bitmap Representation, which they presented at the Association for Computing Machinery's International Conference on Knowledge Discovery and Data Mining in 2002.
Since then, Ayres has been working with Gehrke on the Cougar project. "The Cougar system investigates a novel paradigm of interacting with wireless sensor networks," Gehrke explains. By abstracting the sensor network as a database, users can program the sensor network in a declarative language-- the sensors are told what to do without specifying how to do it. Everything the sensors do--from finding an average temperature over a geographic area or tracking a moving object--relies on the Cougar system to optimize user queries and to implement them across the network.
By setting the system up this way, the network can be constantly customized for a variety of applications. Sensors can be added to the network at any point in time, and they can be queried in an ad-hoc way without requiring the user to write a complicated program. A traditional sensor network, in contrast, relies on individual sensors programmed for specific applications with a predefined set of actions they can take and with a predefined set of data to be extracted. "The novelty lies in the in-network processing," Gehrke says. "The network is not just being used as a big data-gathering system, with data sent to a gateway node. For example, there might be aggregation of the data taking place within the network, or other types of processing." This not only simplifies the programming of the sensor network thus making it easier for applications to use the network, but also saves energy. "An application can write the kinds of queries it needs," Gehrke explains. "The application doesn't have to know how to write code on the sensors or how to disseminate the data from the sensors back to the gateway node."
Scientists could deploy these sensors and use a query system like this in any number of applications--bird or other animal habitats, for example. The researchers wouldn't be tasked with programming the sensor network at the lowest level, but they would have flexibility in changing variables, such as what data they need or the frequency with which they make queries to the network.
The small, commercially available sensors are each like a miniature desktop computer, Gehrke says. They use a special operating system for networked sensors, the TinyOS operating system developed at the University of California, Berkeley. Continuing challenges include reducing the energy use of each sensor, supporting more sophisticated queries, and improving the communication structures between sensors.
The Cougar project is supported by the National Science Foundation, the Defense Advanced Research Project Agency, the Cornell Information Assurance Institute, and a gift from Intel.
Besides looking at systems issues, Gehrke is also working on developing a graphical user interface (GUI) that individuals might use to query the sensor network, as well as the code on the sensors themselves. Another CPRS student, Joel Ossher '06, is working with Gehrke on the GUI end of the project.
Gehrke says involving undergraduates in research is one of the exciting opportunities at Cornell. "Although I could probably work more efficiently only with graduate students, it's such a nice experience to see undergraduate students come in who do not know how to do research, and then to really see them grow into a stage where they can do research on their own," he describes.
Ayres says the professor's enthusiasm for working with undergraduates on new research projects was immediately apparent. "He really spends a lot of his time with the undergraduates on his research team," Ayres says. "It's a lot of one-on-one time, just discussing the research. He clearly is very enthusiastic about what he's doing and the research itself."
Ayres speaks with passion about the value of undergraduates conducting research alongside not only other undergraduates, but master's students, doctoral students, and a faculty mentor. "This experience has given me exposure to what it's like to be a Ph.D. student," he says. Ayres plans to pursue at least a master's degree and has found himself drawn to the wireless sensor network area of research. He enjoys looking at novel areas of research and reading new research papers to expand his knowledge.
Computer science is one of the most active and important research fields today, Gehrke says. "Computation is becoming the foundation for a lot of the scientific endeavors of today." He notes that researchers in his group are collaborating with physicists, astronomers, and biologists on current projects. "Computer science is one of the disciplines that permeates all these fields," he says. It's not just core computer science anymore, but the application of computer science techniques to other areas, that makes this collaboration possible--and crucial to the development of new science.
Students like Ayres can develop invaluable skills in a major like computer science by conducting research with a faculty mentor, Gehrke stresses. "It depends on the skills the students bring along and their willingness to do research and to spend the time learning it. If students devote enough time to it, they can make tremendous contributions to research projects and lift their educational experience here at Cornell to a new level."
Joe Wilensky is a staff writer in Cornell's office of Communications and Marketing Services.
Cornell University professors Keshav Pingali, Steve Vavasis, and Tony Ingraffea, and research associates Paul Stodghill, Gerd Heber, and Rob Cronin have successfully demonstrated a geographically-distributed simulation system, based on industry-standard Web Services, for solving coupled fluid/thermal/mechanical fracture problems.
"Grid computing is a metaphor representing many styles of distributed computing," said Cornell's India Professor of Computer Science Keshav Pingali, who is also an Associate Director at the Cornell Theory Center (CTC). "What we have shown is that some of the more useful styles of grid computing can be done quite effectively using existing industry-standard protocols and software such as SOAP and XML."
The path to discovery
The breakthrough was made in the course of implementing the Adaptive Software Project (ASP), a multi-institutional, multi-disciplinary computational science project, which is studying adaptivity in computational science applications. Researchers from the University of Alabama, Mississippi State University (MSU), Ohio State University, the College of William and Mary, and Clark-Atlanta University are partnering with Cornell in this project.
The benefits of the ASP approach of using industry-standard Web Services became evident while the team was exploring a simulation of a fracture in rocket engine components, such as those used in the space shuttle. These components transport high-pressure, high-velocity chemically reacting gases, which can create large thermo-mechanical stresses on component walls. To simulate fracture initiation and growth, the group had to integrate a number of large software systems, including a finite-element mesh generation code developed jointly by Cornell and the College of William and Mary, a chemically-reacting flow simulation code developed at MSU's Engineering Research Center and the University of Alabama, and a linear elastic fracture code developed at CTC.
"The traditional approach to integrating such software modules is to port all of them to a single computing platform," said Pingali. "Not only is this very time-consuming, but every time a new release of a module becomes available, some poor soul has to repeat the entire process of downloading and porting the code, re-compiling it, re-linking the compiled code with the rest of the software, and so on."
To simplify the job of integrating software components while respecting individual software developers' choices of hardware platforms, operating systems, and programming languages, the ASP team decided to deploy each major component as a Web Service running on a server at the institution where that component was developed. The flow simulation code for example runs on an IBM x330 Linux server at MSU, while the fracture simulation code runs on CTC's Windows cluster. The team uses industry-standard Web Service implementations such as Apache SOAP, and XML-based data exchange formats developed by Professor Steve Vavasis of the Cornell Computer Science Department.
"We view the person running the simulation as a client who writes a few hundred lines of code to invoke the various Web Services to orchestrate the simulation," said research associate Paul Stodghill, who wrote software for deploying legacy Unix codes as Web Services. "Our motto is 'write once, run from anywhere.'" He said that the overhead of using geographically-distributed Web Services for their simulation is about 10%.
Tony Ingraffea, the Dwight D. Baum Professor of Civil Engineering at Cornell and CTC Associate Director, feels that this overhead is worth paying. "Most applications people are not interested in using geographically-distributed computers to solve a large linear system, for example," said Ingraffea. "What we need is a way of building virtual organizations within which project members can work with each other's codes easily, while being sensitive to intellectual property issues."
Gordon Bell, senior researcher at Microsoft's Bay Area Research Center, concurs. "This project demonstrates the potential of a new way to build applications and the potential for a new software industry structure based on delivering results," he said. "Users don't have to buy apps programs and maintain a more complex software environment; they simply call a program or database. This is one of the few projects that I would call a Web Service, and it is well beyond what is running on today's experimental grid."
Frederica Darema, Senior Science and Technology Advisor in the CISE Directorate at NSF, is the cognizant NSF official for the ASP project. "It is to the credit of the scientists working on this project that they have developed such a cohesive collaboration," she said. "I am pushing for a new paradigm in application simulation and measurement capabilities called Dynamic Data Driven Application Systems, and the ASP model of multidisciplinary collaboration, together with the technology advances made by the project, are essential for enabling this new paradigm. I am very pleased with the outcomes of this project and its broader impact."
About the Cornell Theory Center
CTC is a high-performance computing and interdisciplinary research center located on the Ithaca campus of Cornell University with additional offices in Manhattan. CTC currently operates a Dell/Intel/Windows cluster complex consisting of more than 1500 processors, in addition to Unisys ES7000 Servers. Scientific and engineering projects supported by CTC represent a vast variety of disciplines, including bioinformatics, behavioral and social sciences, computer science, engineering, finance, geosciences, mathematics, physical sciences, and business. For more information, visit http://www.tc.cornell.edu or http://www.ctc-hpc.com.
According to some colleagues, senior Omar Khan is the best undergraduate computer science researcher in the whole country. Thats a bit overblown, Khan said. He did, however, win the Computing Research Association award for Outstanding Male Undergraduate for 2003.
One cool thing about the award, he said, is that he got a free trip to San Diego for the presentation in June. The ceremony was held at the same time as a major computer science meeting, attended by many big names in the field.
The research that led to the award, done with Professor John Hopcroft and Associate Professor Bart Selman, is hard to visualize, because it involves graphs in many dimensions. Khan and his professors are trying to see how the links between ideas can be used to predict trends. Their database is the vast number of papers published each year in computer science, and the links are formed by the ways these papers contain citations to one another. By comparing the pattern of citations from one year to the next, the researchers hope to spot emerging trends.
Were still exploring, Khan said. Were not ready to make predictions yet. The principles involved might be applied to a wide variety of topics, from text searching to inquiring into the structure of the human brain.
Computing has been a big part of Khans life for as long as he can remember. He recalls getting his first computer at the age of 6, and writing a few simple programsmostly gamesin later years. Some of the interest probably rubbed off on him from Khans father, who immigrated from India to Canada in the 1960s, working first in computer science and then moving to administration.
At Cornell, Khan at first devoted considerable time to Cornell Mock Trial, serving on a team that placed in the top five in the 2001 national competition. Later he worked as a teaching assistant and course consultant in computer science and, at the Cornell Theory Center, helped to develop a virtual world designed to teach genetics to high school students. And he received the Frank and Rosa Rhodes Scholarship for 200203.
He spent summers doing research internships at the McGill University School of Computer Science and Xerox PARC. After graduation Khan will go to work for Google, a search engine company that is also very much interested in how things with complex linking patterns behave. His work there will build upon his current research and will give him an opportunity to develop and improve Googles search services. They have so many ideas and they dont have enough people to implement them, he explained.
Bill Steele, Cornell News Service
(Cornell Chronicle, April 3, 2003)
Three members of Cornell's faculty, two from the Ithaca campus and one from the Weill Cornell Medical College in New York City, have been named Alfred P. Sloan Foundation fellows. They are among 117 outstanding young researchers from 50 colleges and universities in the United States and Canada to receive awards of $40,000 over two years.
The three fellows are Johannes Gehrke, assistant professor of computer science, and David Lin, assistant professor of biomedical sciences, both on the Ithaca campus, and Diana Murray, assistant professor of microbiology and immunology and director of the Computational Genomics Core Facility at Weill Cornell.
The Sloan Research Fellowship Program is one of the oldest such programs in North America. Fellows, who are selected from among hundreds of scientists in the early stages of their careers on the basis of their exceptional promise, are free to pursue whatever lines of inquiry are of most interest to them.
Gehrke's award will support research on privacy-preserving data mining and distributed database systems. His research is focusing on the design and implementation of a database system for sensor networks. He earned his Ph.D. in computer science from the University of Wisconsin-Madison in 1999. He joined the Cornell faculty in the same year.
Lin's award will support his study of neuronal connectivity in the mouse olfactory system. He is focusing on determining the mechanisms that enable olfactory sensory neurons to form connections in the mouse brain. He earned his Ph.D. in molecular biology from the University of California-Berkeley in 1994 and joined Cornell College of Veterinary Medicine faculty in 2001.
Murray studies how the recruitment of proteins to different cellular membranes is achieved and regulated. Her goal is to use computational methods to characterize the structural and energetic basis for the binding of lipid-interacting domains to phospholipid membranes and, in turn, to better understand the underlying forces that govern signal transduction and retroviral assembly. She earned her Ph.D. in physics at the State University of New York at Stony Brook in 1994. She joined the Weill Cornell faculty in 2001.
About 60 Cornell students and faculty members and one Ithaca eighth grader attended the panel on "Perspectives on Women in Computer Science" in Statler Hall Dec. 9, sponsored by the Women's Mentorship Program in Computer Science at Cornell, a program of the Department of Computer Science. The event was an opportunity for people with interests in computing and the information sciences to gather with other students, faculty and computing professionals to discuss important issues affecting women in the field. Among the panelists, above, were Claire Cardie, right, professor of computer science and coordinator of Undergraduate Programs in Information Science, and Priyanka Nishar '03, left, a bachelor of science degree candidate in engineering and president of the Association of Computer Science Undergraduates.