Tracking Environmental Change based on Bird Abundance Data
Overview
This project is a collaboration between faculty and researchers at the Cornell Lab of Ornithology (CLO) and Cornell's Department of Computer Science .
Birds have repeatedly demonstrated their profound importance as bioindicators. For example, politically-motivated changes in farming practices have been shown to affect rural ecosystems, because of declines in bird abundances. They have also shown that climate change disrupts interactions among species by altering food chains in natural ecosystems. Moreover birds are keystone components of entire ecosystems as seen by the impact of vulture disappearance on the Indian subcontinent, brought on by their inadvertent poisoning via livestock medication. The results from each of the examples cited above come directly from observed changes in the abundance of birds over time. Monitoring bird abundance is relatively easy, because birds are conspicuous, are found in all habitats, and are enjoyed by millions of people.
Biologists and literally tens of thousands of citizen volunteers are collecting bird abundance data every year. In fact, these data represent one of the largest and longest-running resources of environmental time-series data in existence. For example, the number of bird monitoring records for the U.S. and Canada is estimated to approach 60 million, and spans over one century of data collection. However, direct access to the data is often limited for the general public or even professional ecologists.
The goal of this project is to allow scientists, educators and citizens greater ability to identify and explore changes in bird abundance, aiding in conservation and management of the earth's natural systems. To achieve this goal we are addressing a diverse set of challenging research problems in the areas of data mining and machine learning, interactive exploration of spatio-temporal data, and information integration and dissemination.
Research
Data Mining and Machine Learning
- Estimation of bird abundance. The collected bird abundance data typically contain the number of birds of a certain species, which was observed by a person at a certain time and location. These data do not necessarily reflect the true abundance, because of several factors like detection probability (depends on species, habitat, weather, time of day, etc), observer bias (e.g., expertise, age), and protocol bias (e.g., feeder watch versus backyard bird count). The state of the art relies on parametric models for each stage of the detection process. Our goal is to improve on this by developing novel end-to-end models that map directly from the observational data (and all of the attributes describing each observation) to predicted true abundance (or predicted true absence/presence). In particular, we are currently working on approaches based on ensemble learning, multi-task learning, and spatially-cognizant smoothing.
- Mining for change. The detection, quantification, and description of change is of crucial importance to the scientist who wants not only to understand that a change has occurred, but also to pinpoint where change has occurred. The existence of change has far-reaching impact on any type of abundance analysis. For example, when constructing a data mining model for a given spatio-temporal abundance pattern, old data before a change can bias a data mining model towards data characteristics that do not hold any longer. Most existing work has concentrated on algorithms that adapt to changing distributions by giving older input data less weight or by discarding old input in a heuristic manner. The current state of the art either assumes that the data come from a specific parameterized probability distribution, or it does not contain a formal definition of change. Thus existing algorithms cannot specify precisely when and how the underlying distribution changes. Our goal is to use non-parametric techniques and to formally quantify statistically significant change in spatio-temporal abundance data.
Interactive Analysis of Spatio-Temporal Data
- Development of interactive analytical tools with web interfaces. These tools will enable biologists and bird enthusiasts with little or no experience in the manipulation of large data sets to extract biologically relevant information of their choosing from the data. We anticipate that users will expect rapid feedback from their queries, therefore, we are developing a set of exploratory tools that provide rapid response to queries intended to (i) determine the availability of data from a particular time and area, and (ii) graphically represent simple associations (e.g., abundance through time, co-occurrence of two species). These preliminary data explorations will facilitate researchers' abilities to identify relevant components of the data to download and analyze in more detail. A first result are novel sketch-based techniques that provide users with very fast approximate results, accompanied by approximation quality guarantees (confidence intervals).
Data Integration and Dissemination
- Integration of CLO's and other providers' (e.g., USGS, Bird Studies Canada, Klamath Bird Observatory) bird-monitoring data resources through a web services interface, using an existing data exchange schema for the bird-monitoring community.
- Dissemination of research results through a web-enabled bird-monitoring access node.
People
Rich Caruana
Daniel Fink
John Fitzpatrick
Johannes Gehrke
Wesley Hochachka
Steve Kelling
Art Munson
Mirek Riedewald
Daria Sorokina