New Faculty - Golan Yona

Golan Yona, a Burroughs-Welcome postdoctoral fellow in computational molecular biology at Stanford University, will join the Cornell faculty in January as an assistant professor in the Department of Computer Science. Yona works in computational molecular biology, searching for global principles that might chart a "road map" of the space of all possible proteins. His goal: to establish a reliable, unified framework for sequence and structure analysis.

"Cornell's computer science is an excellent place to develop my research program. The university is putting a lot of attention toward genomic science, acknowledging the importance of this field. The wide interest in my research, and the collegial atmosphere in the Department of Computer Science, convinced me that Cornell would be the best place for me."
-- Assistant Professor Golan Yona

Date Posted: 10/05/2000

New Scientists

By Roger Segelken

Cornell is making it clear that it aims to become the very best place to learn sciences that don't fit old academic labels. The inclusion of an educational component in interdisciplinary research is viewed as essential training for what will be the first generation of researchers in the new science of life. These invaluable opportunities for education and research in novel scientific areas will enable more Cornell students to take up President Hunter Rawlings on his offer to "study at the best research university for undergraduate research." As just one example, the loosely defined "computational biology group" at Cornell is gaining a wide reputation not only for its researchers but also because it is making education an essential part of this frontier discipline. There also are plans to create an u ndergraduate program for students who want to add computational biology to their existing field of study. And a graduate degree program in computational molecular biology has just started.

The freshman year, some educators say, is not too soon to start. Some freshmen find themselves quickly enlisted by Carlo Montemagno, the associate professor of biological engineering whose nanobiotechnology research tries to create biomechanical devices even smaller than bacteria. "Freshmen are doing some extraordinary things," Montemagno says. A senior research associate in his group agrees: Hercules Neves, who oversees undergraduates working in the Cornell Nanofabrication Facility, reports that the young scientists-in-training work side-by-side with graduate students, postdocs and staff researchers. The undergraduates, notes Neves, are accorded the same research privileges and the same expectations of success. Undergraduate research fellowships from corporations, including General Electric, provide work-study salaries and laboratory supplies. The professor's job, said Montemagno, is to "keep the students focused on what they are doing and increase the challenges incrementally so they are not swamped but not bored, either."

Ron Elber, professor of computer science who is a leading figure in the computational biology group, co-teaches a graduate course in plant breeding and computer science, "Problems and Perspectives in Computational Molecular Biology," that also is open to undergraduates. The best way to conduct such a course, says co-instructor Susan McCouch, associate professor of plant breeding, is to pair up students from biology backgrounds with students from math, computer science and statistics. McCouch is particularly proud of the undergraduates she works with in the Presidential Research Scholars program. And she cites student Chris Maher as an outstanding example: He excels in plant research while creating instructional materials for high school students. Among Maher's science-outreach projects is a computer game to teach younger students about transposable genetic elements in rice and maize.

"My undergraduate research at Cornell has been an enriching experience, allowing me to interact with remarkable faculty members in multiple disciplines," Maher said. "It has been very rewarding working for faculty who have given me many opportunities and responsibilities." One new curriculum grew from an interdisciplinary field that hardly existed until a Cornell professor coined a name for it: nanobiotechnology. Now the W.M. Keck Program in Nanobiotechnology enrolls doctoral students who work at the "nano" scale (a nanometer is a billionth of a meter) on medically oriented devices that combine properties of the organic and the inorganic. Although based in the College of Engineering, Keck Fellows are free to range across the university, learning their skills wherever they are taught. "We knew that tomorrow's nanobiotechnologists would need the best education from several different disciplines," said Michael Isaacson, Cornell physicist and founding director of the Keck Program. A new class for Keck Fellows and anyone else who is interested and qualified has mathematician Richard Durrett teaching a course that seems to symbolize the interdisciplinary nature of the Keck program: "Using Probability Models to Understand DNA Sequence Evolution." Another course has Elber teaching "Numerical Methods in Comparative Molecular Biology."

Date Posted: 10/05/2000

HeinOnline

By Beth Goelzer Lyons

Law is grounded in the past, in the decisions and reasoning of generations of lawyers, judges, juries, professors.  Ready access to this history is vital to solid legal research, but until this summer much of it was buried in vast collections of aging paper journals.  Enter HeinOnline, an ambitious project to make the countrys law school journals available on the web, from each journals date of inception, and then to expand to an array of other classic legal materials. A collaboration among William S. Hein and Co. Inc. (the worlds largest distributor of legal periodicals), Cornell Law Library and Cornell Information Technologies, HeinOnline (http://heinonline/org/) currently spans more than 3000 journals, or over half a million pages.  Its expected to continue growing at more than 175,000 pages a month. 

"HeinOnline is nice because it shows the exact image of a page," said Claire Germain, Edward Cornell Law Librarian and professor of law.  "That protects the integrity of the original document and ensures its authenticity.  But is still allows you to cut and paste text, making it easy to use for research." 

Browsing the collections and searching for specific information are both supported, which means researchers dont have to trade the convenience of online access for the ability to "flip" through the pages of journals.  Another big plus is being able to enter a citation and instantly pull up the article. 

"Before HeinOnline, using these materials could be difficult," said Germain.  "You were dependent on the journals being on the shelf when you needed them, and the only tool you had for finding articles was paper-based indexes." 

Cornells involvement began three years ago when Hein was investigating ways to put its collections on the web and Cornell University Librarys "Making of America" project caught its attention.  "Making of America" focuses on 19th-century American social history.  Like HeinOnline, it relies on a Cornell-developed digital library protocol called Dienst. 

"Dienst attempts to bridge the gap between the ultra-organization of the traditional library and the chaos of the web. What Hein especially liked was its document-structuring capabilities," explained Carl Lagoze, digital library scientist in the Department of Computer Science and Dienst developer. 

"When Dienst is used to build a digital library, you specify how documents will be accessed -- table of contents, index, particular chapter, citation and so on.  Thats important for researchers and for managing copyright issues.  Dienst is also designed to work seamlessly with collections of materials housed across the Internet." 

What people actually see when they use HeinOnline is the interface designed by Rich Marisa, manager of digital library and electronic publishing applications for Cornell Information technologies, with input from Daniel Rosati of Hein, Germain and her colleagues at Cornell Law Library, and law school librarians at Yale, Boston University, the University of Pennsylvania, the University of Chicago, New York University and several other institutions.  

JSTOR, Lexis-Nexis and Westlaw -- other popular online research tools -- were used as the comparison standards, and second- and third-year Cornell law students in advanced legal research courses tested the prototypes. 

"There has been great synergy between what Hein has and the talents at Cornell," said Rosati.  "Weve been digitizing legal materials since 1994 and serving law libraries since 1920, but this is our first venture into digital libraries." 

Digital libraries are increasingly being considered by libraries as a way to ease shelf-space pressures.  Now that HeinOnline has put historical law journals online, law libraries have the flexibility of keeping only single copies of those journals on their shelves, putting them in storage or discarding them, depending on their preservation policies.  

Those outside the legal field may also benefit from HeinOnline.  "The experience of building HeinOnline puts us in a good position to apply the results of digital library research to digital asset management problems across the campus," noted Marisa. 

The digital library protocol Dienst is supported by the Digital Libraries Initiative phase two, which is coordinated by the National Science Foundation.  Details can be found at www.cs.cornell.edu/cdlrg.

Date Posted: 9/28/2000

CU to lead project to create software that adapts

Contact: Bill Steele
Office: 607-255-7164
E-Mail: ws21@cornell.edu

Contact: Margaret Corbit
Office: 607-254-8716
E-Mail: corbitm@tc.cornell.edu

ITHACA, N.Y. -- Computer programs that can adapt to changing conditions -- both in the virtual worlds they are creating and the hardware on which they are running -- will be developed under a $5 million project funded as part of the $90 million Information Technology Research initiative of the National Science Foundation (NSF), announced by the White House Wednesday (September 13). 

The program, called the Adaptive Software Project, will be conducted by researchers at Cornell University, the Mississippi State University, the College of William and Mary, Ohio State University and Clark-Atlanta University. 

Cornell will be the lead institution for the project and Keshav Pingali, Cornell professor of computer science, will be principal investigator, leading an interdisciplinary team of computer scientists, engineers and physicists. 

"In building a computer simulation you need to make decisions as you detect new conditions," Pingali says. "We are going to study the mechanisms of evolution and steal ideas from nature. The biological model is a useful one: When conditions change it acts as a stimulus, and the response is to evolve." 

Adaptive programs, he says, will be able to adjust to changing conditions in the simulation, and also to changes in the hardware on which they are running, accessing more or fewer processors as they are needed and available, and correcting for problems such as the failure of one or more processors in a parallel processing computer. In other words, Pingali says, the software will behave like a living organism, adapting "either to improve results or to improve its own survival." 

The project will develop two simulations as test beds: 

Cornell Theory Center's Computational Materials Institute (CMI) will develop a simulation of the propagation of cracks in structures, such as those that form in aging aircraft, under the leadership of Tony Ingraffea, the Dwight C. Baum Professor of Engineering at Cornell. The CMI has a world reputation for its work on fracture simulation, and in the past three years alone, has won a Grand Challenge award ($1.8 million), a Research Infrastructure award ($1.5 million), and a Knowledge and Distributed Intelligence award ($1.8 million) from the NSF. 

The Engineering Research Center at Mississippi State will simulate the behavior of "multi-phase" fluids in which a chemical reaction is taking place, such as the mixtures of liquids and gases that are found in combustion chambers, under the leadership of Professor Bharat Soni. 

William and Mary scientists will contribute software for "mesh generation," a step in the solution of complex equations of motion. Ohio State will develop software to create graphic displays as the output of simulations. 

These simulations will run on The Cornell Theory Center's (CTC) Velocity and Velocity + clusters, supercomputers made up of large numbers of Intel Pentium III processors running in parallel. 

Traditionally in such simulations, programmers must set up the conditions of the simulated world and the methods the computer will use in advance. They must also specify the hardware on which the programs will run, setting aside, for example, a certain number of parallel processors and a certain amount of memory. 

Adaptive software will be able to choose different mathematical models depending on how the simulation develops, based both on the conditions of the simulation and the availability of processors and memory. 

In crack propagation, for example, the Cornell researchers are able to do very sophisticated simulations only at the scale of the object itself, "but all the physics is occurring in a very small volume around the crack tip," Ingraffea explains. The new software will be able to shift down through intermediate scales to the atomic scale, choosing the best method to use at each scale, based on the conditions it senses at the surrounding levels. "This ITR award demonstrates once again the innovative research being done in CTC's Computational Materials Institute," says CTC director Thomas F. Coleman. 

Cornell team members, in addtion to Pingali and Ingraffea include Paul Chew and Paul Stodghill, research associates in the Department of Computer Science; Steve Vavasis, associate professor of computer science; physicists Thomas Arias, associate professor, and James Sethna, professor, and Gerd Heber and Chris Myers, CTC research associates. 

Related World Wide Web sites: The following sites provide additional information on this news release. Some might not be part of the Cornell University community, and Cornell has no control over their content or availability. 

Cornell Theory Center: http://www.tc.cornell.edu/ 
The Cornell Fracture Group: http://www.cfg.cornell.edu/ 

EDITORS: Keshav Pingali is currently on sabbatical. He may be reached at pingali@cs.cornell.edu.

Date Posted: 9/21/2000

Six degrees of separation Chronicle article

Explanation is personal networking, Cornell computer scientist says

ITHACA, N.Y. -- We all know it's a small world: Any one of us is only about six acquaintances away from anyone else. Even in the vast confusion of the World Wide Web, on the average, one page is only about 16 to 20 clicks away from any other. But how, without being able to see the whole map, can we get a message to a person who is only "six degrees of separation" away?

A Cornell University computer scientist has concluded that the answer lies in personal networking: We use "structural cues" in our local network of friends. "It's a collective phenomenon. Collectively the network knows how to find people even if no one person does," says Jon Kleinberg, assistant professor of computer science, who published his explanation in the latest issue (Aug. 24) of the journal Nature. 

His research is based on a computer model showing that an "ideal" network structure is one in which connections spread out in an "inverse square" pattern. In human terms that means that an "ideal" person in the model would have just about as many friends in the rest of the county as in the neighborhood, just as many in the rest of the state as in the county, just as many in the whole nation as in the state, and so on, as you might find in a highly mobile society.

Kleinberg's answers might have a very practical use in helping to reduce the number of clicks needed when surfing the web, as well as helping to speed up other kinds of networks. 

Although Kleinberg has been instrumental in the development of improved search engines for the web, he doesn't see this work as applying to traditional search engines. They already have the "big picture" of the network, he explains, since they work from indexes of the web. Rather, he sees it being useful in the construction of "agents," computer programs that will jump around the web looking for specific information. 

It could also apply to the distribution of data over the Internet, where computers called routers must send packets of information on their way toward their destinations without knowing what the state of the network is outside of their own immediate neighborhood. 

Kleinberg has shown that a computer algorithm (the basic design for a program) can choose the best way to send a message to a faraway place in a network even if it has knowledge only about the characteristics of its immediate neighborhood. "The correlation between local structure and long-range connections provides fundamental cues for finding paths through the network," he writes in the Nature paper. 

Kleinberg's work is a refinement of an earlier study by two other Cornellians, Steven H. Strogatz, professor of theoretical and applied mechanics, and his graduate student, Duncan Watts, now an assistant professor in Columbia University's sociology department. 

Strogatz and Watts offered a mathematical explanation for the results of a landmark experiment performed in the 1960s at Harvard by social psychologist Stanley Milgram. The researcher gave letters to randomly chosen residents of Omaha, Neb., and asked them to deliver the letters to people in Massachusetts by passing them from one person to another. The average number of steps turned out to be about six, giving rise to the popular notion of "six degrees of separation," and eventually the "six degrees of Kevin Bacon" game in which actors are connected by their movie appearances with other actors. 

Strogatz and Watts created a mathematical model of a network in which each point, or node, is closely connected to many other nodes nearby. When they added just a few random connections between a few widely separated nodes, messages could travel from one node to any other much faster thanthe size of the network would suggest. The six degrees of separation idea works, they said, because in every small group of friends there are a few people who have wider connections, either geographically or across social divisions. They also showed that such cross-connected networks exist not only between human beings but also in such diverse places as computer networks, power grids and the human brain.

But Kleinberg has found mathematically that the model proposed by Strogatz and Watts doesn't explain how messages can travel so quickly through real human networks. "The Strogatz-Watts model had random connections between nodes. Completely random connections bring everyone closer together," Kleinberg explains, "but a computer algorithm would have only local information. The long-range connections are so random that it [the algorithm] gets lost."

So Kleinberg designed a model in which nodes are arranged in a square grid and each node is connected randomly to others but with "a bias based on geography." As a result each node is connected to many nearby, a few at a longer distance and even fewer at a great distance -- the "inverse square" structure. "This bias builds in the structural cues in my long-range 

connections, and it's the bias that is implicitly guiding you to the target," Kleinberg explains. "In the Strogatz-Watts model, there is no bias at all and, hence, no cues -- the structure of the long-range connections gives you no information at all about the underlying network structure." 

The sender of a message in this system doesn't know where all the connections are but does know the general geographic direction of the destination, and if messages are sent in that direction, Kleinberg says, they arrive much faster than they would by completely random travel.

Kleinberg explains, "The Watts and Strogatz model is sort of like a large group of people who know each other purely through electronic chat on the Internet. If you are given the user ID of someone you don't know, it's really hard to guess which of your friends is liable to help you forward a message to them.

"The inverse square model is more like the geographic view of Milgram's experiment -- if you live on the West Coast and need to forward a message to someone in Ithaca, you can guess that a resident of New York state is a good first step in the chain. They are more likely to know someone in the Finger Lakes region, who in turn is more likely to know someone in Ithaca and so forth. Knowing that our distribution of friends is correlated with the geography lets you form guesses about how to forward the message quickly."

The geographic information on the grid, he adds, is an analogue of the social connections between people. Just as nodes on his simulated network choose the correct geographical direction to send a message, so humans may choose a social direction: A mathematician who wants to send a message to a politician might start by handing it to a lawyer. 

On the other hand, he says, "When long-range connections are generated uniformly at random, our model describes a world in which short chains exist but individuals, faced with a disorienting array of social contacts, are unable to find them." 

The paper in Nature is titled "Navigation in a Small World." 

Date Posted: 8/31/2000

Cancer cell growth appears related to evolutionary development of plump fruits and vegetables

ITHACA, N.Y. -- The genetic mechanism that through millennia of evolution has created plump and juicy fruits and vegetables could also be involved in the proliferation of human cancer cells.  

Plant biologists and computer scientists at Cornell University have essentially made a direct genetic connection between the evolutionary processes involved in plant growth and the processes involved in the growth of mammalian tumors.

Studying the genetic map of the tomato (Lycopersicon esculentum), the researchers found a "plumping" characteristic in a single gene called ORFX -- traits that are expressed early in the plant's floral development. The protein sequence obtained from the gene was predicted by computational data to be similar to the human oncogene c-H-ras p21, suggesting a common mechanism in the cellular processes leading to large, edible fruit in plants and cancer in humans. The research is detailed in the latest edition (July 7) of Science.

"We're beginning to understand that there are very common mechanisms that create life," says Steven D. Tanksley, Cornell Liberty Hyde Bailey Professor of plant breeding and the lead author of the research paper. "This is a case where we found a connection between agricultural research in how plants make edible fruit and how humans become susceptible to cancer. That's a connection nobody could have made in the past.

"In this era of genomics, many people are looking at divergent organisms, and we're starting to realize connections we never imagined," Tanksley says.

The discovery springs from the Cornell researchers' attempts to identify the remote evolutionary changes that led to the bountiful produce associated with modern agriculture. Tanksley explains that wild fruit and vegetable varieties were not always fit for human consumption. Usually they were too scrawny to provide much nourishment. But over millennia, not only did humans cross plants, but the plants also crossed themselves. Thus, many varieties became fleshy enough to eat. 

Corn kernels were not originally succulent, and wild tomatoes looked more like red blueberries. "When you see a beautiful ear of corn, you're actually looking at a gross exaggeration of the corn's real anatomy. Tomatoes and all other fruits and vegetables show the same thing, compared with their ancestors. They have gross exaggerations of the specific particular parts of their anatomy -- their fruit for example -- valued by humans," says Tanksley.

If not for this mechanism, humans would not have developed beyond the hunter-gatherer stage, and modern civilizations would not have been born. "We're now codependents with domesticated fruits and vegetables. Without them, we can't sustain ourselves, and without us the domesticated plants can't sustain themselves either," says Tanksley. "The humans and the plants have been caught up in a dance of co-evolution." 

The Cornell researchers were able to peer into the primeval development of food with the aid of computer science. They found a genetic quantitative trait locus, or QTL (the location of specific characteristics on genes), involved in the evolution and domestication of a tomato from small berries to what has become a large fruit. The ORFX gene, located at fw2.2 on the plant's DNA, changes fruit weight by up to 30 percent. The researchers say this gene is a key to domesticating wild plants.

The discovery might have taken decades if the researchers had used conventional methods. After obtaining the nucleotide sequence from the gene, which in turn provided the protein sequence, Tanksley collaborated with Ron Elber, Cornell professor of computer science and acting director of the Center for Parallel Processing Resources for Biomedical Scientists at Cornell. Using a computational biology program developed at the Cornell Theory Center, the scientists created a three-dimensional structure from the protein sequence. The software they used is called the Learning, Observing and Outputting of Protein Patterns (LOOPP), which matches sequences and protein structures. 

Elber and researcher Jaroslaw Meller matched the novel tomato sequence with known three-dimensional protein shapes and found a hit. Using as a template the structure of the ras (human oncogene) protein, they were able to identify some of the protein's specific chores. "Putting the protein into 3-D shape gave us guidelines for what to look for, and to see the critical amino acids," says Elber. "Without any hints, it is difficult."

The algorithm developed by Meller and Elber at Cornell is exceptionally efficient. It identified the tomato gene in less than a minute.

Says Tanksley: "It's astounding this kind of research could be done at the speed at which it was done. This would have been impossible a few years ago."

Other authors of the Science paper, "fw2.2: A Quantitative Trait Locus Key to the Evolution of Tomato Fruit Size," are researchers Anne Frary and Esther ven der Knaap; visiting fellows T. Clint Nesbitan and Silvana Grandillo; Cornell graduate student Amy Frary; and postdoctoral researchers Meller, Bin Cong and Jiping Liu. The late Kevin B. Alpert, who earned his doctoral degree with Tanksley, was an integral part of the paper's research. Alpert died April 1, 1999, from amyotrophic lateral sclerosis. The paper was dedicated to him.

The research was funded by the National Research Initiative Cooperative Grants Program, the U.S. Department of Agriculture Plant Genome Program, the National Science Foundation, the Binational Agriculture Research and Development Fund and the National Institutes of Health National Center for Research Resources.

Date Posted: 7/01/2000

Selman Made Fellow of the AAAI

Each year the AAAI (American Association of Artificial Intelligence) recognizes a few individuals (typically 4-5) who have made a significant, sustained contribution to Artificial Intelligence by electing them as Fellows of the AAAI. The Computer Science Department is  happy to announce that Bart Selman was elected this year. The citation reads: "For significant contributions to the field of knowledge representation and reasoning, and the development of widely used randomized methods in reasoning, search, and planning."

Date Posted: 5/31/2000

Completing Latin Squares - Carla Gomes

Sciencenews:   Week of May 6, 2000; Vol. 157, No. 19

Using only the numbers 1, 2, 3, and 4, arrange four sets of these numbers into a four-by-four array so that no column or row contains the same two numbers. The result is known as a Latin square. In effect, each row (and each column) is a permutation of four distinct numbers (or symbols).

 Such arrays have proved useful for a variety of purposes. Suppose, for example, you wanted to check the resistance to wear of four brands of automobile tires. Putting one tire of each brand on the four wheels of one car isn't good enough because the amount of wear may differ in those four positions and vary from week to week owing to different weather conditions. A better experiment would be to use four tires for four weeks and to interchange the four positions from week to week according to a four-by-four Latin square.

 Latin squares are also linked to algebraic objects known as quasigroups. A quasigroup is defined in terms of a set, Q, of distinct symbols and a binary operation (called multiplication) involving only the elements of Q. A quasigroup's multiplication table turns out to be a Latin square. Each element of the set Q occurs exactly once in each row and each column of the table.

The so-called quasigroup completion problem concerns a table that is correctly but only partially filled in. The question is whether the remaining blanks in the table can be filled in to obtain a complete Latin square (or a proper quasigroup multiplication table).

 Computer scientists have proved that the quasigroup completion problem belongs to a category of difficult computational problems described as NP-complete. As the number of elements, n, increases, a computer's solution time grows exponentially in the worst case. In effect, systematically solving such a worst-case problem involving many elements can take an enormous amount of time, even on the fastest available computers.

In typical cases, however, solution times can vary tremendously and depend on the specific search algorithm used to find an answer. Many search problems take much less time to solve than worst-case scenarios would suggest.

Moreover, researchers have discovered that, just as water abruptly freezes when the temperature dips below the freezing point, the quasigroup completion problem exhibits a computational phase transition--shift from cases for which a Latin square can be found to those for which no solution is possible when the fraction of initially filled-in squares in the table has a certain threshold value.

"At the phase transition, problem instances switch from being almost all solvable to being almost all unsolvable," Carla P. Gomes of Cornell University and her coworkers report in a recent issue of the Journal of Automated Reasoning. That transition occurs when about 42 percent of the blanks in the table are initially filled in.

Interestingly, the problems that take longest to resolve one way or the other (soluble versus insoluble) lie at the phase transition, where there are enough constraints (filled-in squares) to limit the number of good choices but not enough to show immediately that the case is hopeless.

Gomes has set up a Web page at http://www.cs.cornell.edu/Info/People/gomes/QUASIdemo.html where you can try out various instances of the quasigroup completion problem. In this demonstration, Gomes uses a 10-by-10 grid and 10 different colors.

You can select a sample pattern of colored-in squares as your starting point, generate a random, colored array filling a specified fraction of the grid, or manually create your own partial Latin square. Clicking on the to solve the problem following a built-in algorithm (a randomized backtrack search procedure, in this case). Clicking again presents a different solution.

On certain runs, the computer solves the problem easily, Selman observes. On other runs, it may fail to complete the pattern within the allotted time. Varying the fraction of filled-in squares also alters computation times. When the fraction is close to the critical value, computation times become very large.

Such experiments provide insights into the factors that make certain computational problems particularly difficult and may suggest improved search strategies for solving a variety of problems.

Date Posted: 5/06/2000

Changes of Mathematical State - Bart Selman

Untangling a web of conflicting demands can be tough on computers

Sciencenews: Week of May 6, 2000; Vol. 157, No. 19

 By I. Peterson

The frazzled host of an impending dinner party is trying to come up with a seating plan that would satisfy his guests.

Angelina would like to sit beside Bertrand, but she can't stand Charles. Darva and Rick, who've recently split up, insist on being as far apart as possible. And Eve, whose belated acceptance has thrown the original seating plan into disarray, is willing to sit next to Bertrand or Charles but not Angelina.

Is there an arrangement that would please everyone? As the list of guests and their various demands increases, the number of seating combinations that must be checked to avoid conflicts grows rapidly. Solving such a problem by trying out a vast number of different combinations can take years, even if the host enlists a powerful computer.

Sometimes the search for a solution is relatively quick, however. When guests impose few conditions, for example, many combinations meet the requirements. At the other extreme, guests may specify so many constraints that it's easy to recognize the problematic arrangements, and the host can then focus on the few that might work out.

Of course, conflicting requests may make the problem impossible to solve. What if Bertrand refuses to sit next to Charles and Charles demands to sit next to Bertrand? Then, the quest is over as soon as the host realizes the demands are at odds and the problem is insoluble.

Researchers have discovered, to their amazement, that just as water abruptly freezes when the temperature dips below the freezing point, a computer-search problem can change very rapidly from soluble to insoluble. When for a given number of guests, the number of constraints exceeds a certain threshold, the host will realize that he might as well give up.

Interestingly, the hardest problems to deal with lie in the middle--when there are enough constraints to limit the number of good choices but not enough to immediately eliminate the bad ones or signal that the case is  hopeless.

Researchers have found that a wide variety of search problems exhibit a transition phenomenon, such as from soluble to insoluble. Indeed, the analogy between rapid shifts in computational behavior and physical changes of state has inspired mathematicians, computer scientists, physicists, and others to explore physical models to obtain insights into precisely what makes many important computational problems hard to solve.

"Understanding intractability is one of the fundamental problems in computer science," says Jennifer Tour Chayes of Microsoft Research in Redmond, Wash. She described the current state of research on computational "phase transitions" earlier this year at the annual meeting of the American Association for the Advancement of Science in Washington, D.C. Concepts from physics could help elucidate the source of such difficulties, she notes.

 "There are some properties of these abrupt phase transitions that we may be able to exploit," remarks computer scientist Bart Selman of Cornell University. Such insights could lead to improved procedures, or algorithms, for solving challenging computational problems, whether they involve searching for optimal seating plans, developing airline schedules, or laying out complex circuitry on computer chips.

 Working out a seating plan that would placate fractious dinner guests is a version of the logic puzzle in theoretical computer science known as the satisfiability (SAT) problem.

Computer scientists can translate such constraint-satisfaction problems into symbolic logic, where each variable is either true or false. Consider, for example, two variables, x and y, and the logical statement (x OR y) AND ((NOT x) OR (NOT y)). The OR means that the clause (x OR y) is true if either x or y is true, and the AND means that the clause (x AND y) is true only if both x and y are true.

Solving this puzzle requires assigning a value of true or false to each of the variables so that the entire expression is satisfied--meaning that the statement must evaluate to true. Here, x must be true and y false, or vice versa.

Computer scientists rate the difficulty of problems by how long it takes a computer to solve them in the worst possible case. Satisfiability problems (see box) that involve expressions with only two variables in each clause (termed 2-SAT) can be solved in what is called polynomial time. In other words, as the number, n, of variables increases in the entire statement, the solution time grows modestly at a rate proportional to some power of n. In effect, computers can solve 2-SAT problems efficiently.

In contrast, when clauses each contain three variables, increasing the total number of variables means that the solution time grows exponentially in the worst case. Hence, systematically solving such a 3-SAT problem involving many variables can take an enormous amount of time.

The focus on worst cases, however, can be misleading. Researchers know from experience that solution times vary tremendously from case to case and depend on the specific search algorithm used to find an answer. Many search problems take much less time to solve than worst-case scenarios would suggest. Hence, "it makes a lot of sense to study typical cases," Chayes says.

In earlier work on randomly generated instances of the 3-SAT problem, Selman, Scott Kirkpatrick of the IBM Thomas J. Watson Research Center in Yorktown Heights, N.Y., and their coworkers learned that, on average, the hardest problems occur when the ratio of the number of clauses to the number of variables is about 4.25. Below this critical value, nearly all cases are satisfiable, and above that value, almost all of them are unsatisfiable. For 2-SAT problems, the threshold occurs when the ratio of clauses to variables is 1.

These results fit the observation that hard problems cluster near the boundary between satisfiability and unsatisfiability, where it's difficult to tell whether solutions exist or not. Moreover, such a concentration of hard cases occurs at the same critical value for a wide range of search methods, says Tad Hogg of the Xerox Palo Alto (Calif.) Research Center. In other words, "this behavior is a property of the problems rather than of the details of the search algorithm," he says. Not only do these dramatic changes in computational difficulty resemble the abrupt changes in a physical system's state at a certain parameter value, they also mirror a distinction made by physicists.

Physical phase transitions come in two flavors. In a first-order transition, such as the freezing of water when chilled, the jump from one state to the other is discontinuous. In a second-order transition, such as the demagnetization of iron when heated, the change is continuous, exhibiting a gradual change in behavior rather than a sharp break.

Remarkably, computational experiments by Selman and his coworkers recently showed that in 2-SAT problems, the transition from satisfiable to unsatisfiable is continuous. In 3-SAT problems, the transition is much  more abrupt. Selman and his colleagues are definitely onto something in highlighting the types of phase transition, Chayes says. However, "it's not that simple," she contends. "There are exceptions." Selman admits, "There are subtle issues here." For example, the discontinuous phase transition found for the 3-SAT case is itself quite complicated. "We are still studying further properties of this transition," he notes.

 "The empirical results are interesting and, if done well, can lead to insights which we can then try to verify rigorously," comments University in Pittsburgh. Mathematicians and computer scientists, for example, have shown theoretically that the threshold for the 3-SAT phase transition must lie between 3.2 and 4.5, and new efforts are gradually narrowing that range. This is consistent with Selman's empirical finding.

Theoretical constructs called spin glasses, which serve as simple models of magnetic systems, show an analogous phase behavior, Selman and others have demonstrated. This model starts with an array of magnetic particles, each one oriented either up or down (equivalent to variables that are true or false). A particle's orientation can flip from one state to the other with a certain probability, and each orientation influences neighboring orientations. Getting this so-called spin glass to settle into a lowest-energy state, in which all the particles comfortably coexist without further flipping, is equivalent to solving a satisfiability problem.

Although often a solution arises quickly, computer simulations show that difficulties arise when particles happen to settle into certain configurations. Patterns can get frozen prematurely, and then the spin glass as a whole can't readily reach the state with the lowest energy. In search problems, similar behavior could occur if a computer making random choices locks in some of the critical variables with the wrong values, then spends an enormous time fruitlessly trying combinations involving those values.

Chayes and her coworkers have investigated similar behavior in a physical model known as the Potts magnet. Unlike a spin-glass particle, a Potts particle can take on any one of three or more different values. This simple change greatly complicates the theory, and the resulting behavior can be very complex. In this case, knowing the type of phase transition by itself isn't enough to account for the model's behavior, Chayes says.

By getting a better understanding of how such complexities arise, especially near a phase transition, researchers can look for ways to weed out hopeless cases and focus their efforts on solving the rest."We have found that there are efficient heuristic methods that have a good chance of identifying those [troublesome] variables early on in a search," Selman says.

In airline scheduling, for example, some connections may pose special difficulties. One might first want to schedule the main flights between hubs, then later fill in the shorter commuter flights to and from the hubs, Selman suggests. "If you get certain destinations right, the rest of the problem is easy," he says. Selman, Kirkpatrick, and Cornell's Carla P. Gomes have just started a major effort to employ their theoretical results to develop a computational system that adapts to the difficulty of the problem at hand.

To obtain an estimate of the expected complexity of a computational task, they identify where a problem lies with respect to the relevant phase transition, Selman says. If the problem has characteristics that put it at the threshold, the user may be able to relax the constraints to shift the problem away from the transition, into a realm where it is more easily soluble. In practical terms, this could mean insisting that the airplanes always land in Chicago, no matter where else they go or how many people they carry, simply because the rest of the scheduling problem then becomes much easier. "There is still quite a bit of work to be done before we can apply the insights developed for SAT to hard real-world computational problems," Selman says. "In any case, the results on phase transitions and complexity have given us a new set of tools for attacking such computational challenges."

Satisfiability problems serve as important test cases in such studies because they share crucial characteristics with thousands of other problems now known to be computationally difficult. What is learned about the behavior of satisfiability problems can then be applied to a variety of other thorny computer problems.For example, the idea that particularly hard-to-solve instances of a problem occur at a computational phase transition may prove helpful in developing schemes for encrypting data. Moving toward a phase transition would make a code particularly tough to crack.

Nonetheless, "while the results are encouraging, a major open question is how the results for random [satisfiability] problems relate to those encountered in practice," Hogg emphasizes. In real-world problems, for  instance, correlations among the constraints can shift the location of the transition point and throw off predictions based on randomly chosen instances. Fruitful interactions among theory, experiment, and application continue to drive the study of computational phase transitions.

"Looking at a problem from a new perspective can add new insight," says Carnegie Mellon's Le phase transitions into [computational] complexity has this potential. Even if it does not characterize complexity, figuring out to what extent it may or may not be relevant would be extremely interesting."

Date Posted: 5/06/2000

Hartmanis Wins CRA Distinguished Service Award

Juris Hartmanis, Walter R. Reed Professor of Engineering, has won this year's CRA Distinguished service award. The Computing Research Association (CRA) is an association of more than 180 North American academic departments of computer science and computer engineering (CS&CE); laboratories and centers in industry, government, and academia engaging in basic computing research. It is the main voice of CS research in the country. The award recognizes service in the areas of government affairs, professional societies, publications or conferences, and leadership that has a major impact on computing research. Dr. Hartmanis  has been a dominant force in all of these directions.

Past Winners:
1999: Bill Joy, Sun Microsystems and Ken Kennedy, Rice University
1998: Merrell Patrick, National Science Foundation
1997: Anita Jones, University of Virginia
1996: Paul Young, University of Washington
1995: Randy Katz, University of California at Berkeley
1994: William A. Wulf, University of Virginia
1992: Joseph Traub, Columbia University
1991: David Gries, Cornell University
1990: Robert Kahn, Corporation for National Research Initiatives

Date Posted: 4/01/2000

Pages

Archived News: