Human Computation for Science and Computational Sustainability

Researchers in several scientific and sustainability fields have recently achieved exciting results by involving the general public in the acquisition of scientific data and the solution of challenging computational problems. One example is the eBird project (www.ebird.org) of the Cornell Lab of Ornithology, where field observations uploaded by bird enthusiasts are providing continent-scale data on bird distributions that support the development and testing of hypotheses about bird migration. Another example is the FoldIt project (www.fold.it), where volunteers interacting with the FoldIt software have been able to solve the 3D structures of several biologically important proteins.

Despite these early successes, the involvement of the general public in these efforts poses many challenges for machine learning. Human observers can vary hugely in their degree of expertise. They conduct observations when and where they see fit, rather than following carefully designed experimental protocols. Paid participants (e.g., from Amazon Mechanical Turk) may not follow the rules or may even deliberately mislead the investigators.

A related challenge is that problem instances presented to human participants can vary in difficulty. Some instances (e.g., of visual tasks) may be impossible for most people to solve. This leads to a bias toward easy instances, which can confuse learning algorithms.

A third issue with crowdsourcing is that in many of these problems, there is no available ground truth because the true quantities of interest are only indirectly observed. For example, the BirdCast project seeks to model the migration of birds. However, the eBird reports only provide observations of birds on or near the ground, rather than in migratory flight (which occurs predominantly at night). In such situations, it is hard to evaluate the accuracy of the learned models, because predictive accuracy does not guarantee that the values of latent variables are correct or that the model is identifiable.

This workshop will bring together researchers at the interface of machine learning, citizen science, and human computation. The goals of the workshop are i) to identify common problems, ii) to propose benchmark datasets, common practices and improved methodologies for dealing with such phenomena, iii) to identify methods for evaluating such models in the absence of ground truth, iv) to share approaches for implementing and deploying citizen science and human computation projects in scientific and sustainability domains, and v) to foster new connections between the scientific, sustainability, and human computation research communities.

There will be two awards (250$ book vouchers) for Best Contribution for the oral and/or poster presentations sponsored by the Institute for Computational Sustainability (www.cis.cornell.edu/ics)

Contact: nips2012wp@gmail.com