Water managers in the western United States (U.S.) rely on longterm forecasts of temperature and precipitation to prepare for droughts and other wet weather extremes. To improve the accuracy of these longterm forecasts, the Bureau of Reclamation and the National Oceanic and Atmospheric Administration (NOAA) launched the Subseasonal Climate Forecast Rodeo, a year-long real-time forecasting challenge, in which participants aimed to skillfully predict temperature and precipitation in the western U.S. two to four weeks and four to six weeks in advance. Here we present and evaluate our machine learning approach to the Rodeo and release our SubseasonalRodeo dataset, collected to train and evaluate our forecasting system.

Our system is an ensemble of two regression models. The first integrates the diverse collection of meteorological measurements and dynamic model forecasts in the SubseasonalRodeo dataset and prunes irrelevant predictors using a customized multitask model selection procedure. The second uses only historical measurements of the target variable (temperature or precipitation) and introduces multitask nearest neighbor features into a weighted local linear regression. Each model alone is significantly more accurate than the operational U.S. Climate Forecasting System (CFSv2), and our ensemble skill exceeds that of the top Rodeo competitor for each target variable and forecast horizon. We hope that both our dataset and our methods will serve as valuable benchmarking tools for the subseasonal forecasting problem.

Lester Mackey is a researcher at Microsoft Research New England. From 2013-2016, he was an assistant professor of Statistics and, by courtesy, Computer Science at Stanford University. Prior to this, Lester spent a year as a Simons Math+X postdoctoral fellow, working with Emmanuel Candes. He received his Ph.D. in Computer Science (2012) and M.A. in Statistics (2011) from UC Berkeley. He received his B.S.E. in Computer Science (2007) from Princeton University. 

His current research interests include statistical machine learning, scalable algorithms, high-dimensional statistics, approximate inference, andprobability. Lately, Lester has been developing and analyzing scalable learning algorithms for healthcare, climate forecasting, approximate posterior inference, high-energy physics, recommender systems, and the social good.