Rediet Abebe
Using Search Queries to Understand Health Information Needs in Africa

The lack of comprehensive, high-quality health data in developing nations creates a roadblock for combating the impacts of disease. One key challenge is understanding the health information needs of people in these nations. Without understanding people's everyday needs, concerns, and misconceptions, health organizations and policymakers lack the ability to effectively target education and programming efforts. In this talk, we propose a bottom-up approach that uses search data from individuals to uncover and gain insight into health information needs in Africa. We analyze Bing searches related to HIV/AIDS, malaria, and tuberculosis from all 54 African nations. For each disease, we automatically derive a set of common search themes or topics, revealing a wide-spread interest in various types of information, including disease symptoms, drugs, concerns about breastfeeding, as well as stigma, beliefs in natural cures, and other topics that may be hard to uncover through traditional surveys. We expose the different patterns that emerge in health information needs by demographic groups (age and sex) and country. We also uncover discrepancies in the quality of content returned by search engines to users by topic. We explore what interventions these results suggest and namely how they inform targeted education efforts both on- and off-line. We conclude with a broader discussion on how computational techniques can be used to deepen our understanding of socioeconomic inequality and inform interventions aimed at mitigating it. 

This talk is based on joint work with Shawndra Hill, H. Andrew Schwartz, Peter M. Small, and Jennifer Wortman Vaughan. 

Faculty Advisor: John Kleinberg


Dylan Foster

Logistic Regression: The Power of Being Improper
Logistic regression is a fundamental task in machine learning and statistics. For the simple case of linear models, Hazan et al. (2014) showed that any algorithm that estimates model weights from samples must exhibit exponential dependence on the weight magnitude. As an alternative, we explore a counterintuitive technique called improper learning, wherein one estimates a linear model by fitting a non-linear model. Past success stories for improper learning have focused on cases where it can improve computational complexity. Surprisingly, we show that for sample complexity (number of examples needed to achieve a desired accuracy level), improper learning leads to a doubly-exponential improvement in dependence on weight magnitude over estimation of model weights, and more broadly over any so-called "proper" learning algorithm. This provides a positive resolution to a COLT 2012 open problem of McMahan and Streeter. As a consequence of this improvement, we also resolve two open problems on the sample complexity of boosting and bandit multiclass classification.

Faculty Advisor: Karthik Sridharan


Fabian Muehlboeck

Efficient and Principled Gradual Typing
The goal of Gradual Typing is to allow programmers to mix statically and dynamically type-checked code. This lets programmers trade off between the costs and benefits of using static type-checking for each individual part of their program as needed, and even eventually change their decisions about those trade-offs. Designing gradually typed languages has its own trade-offs: existing gradually typed languages all had to essentially decide between being efficient versus behaving in expected and safe ways, and those decisions were largely imposed by the existing languages that Gradual Typing was added on to. In this talk, I’ll discuss my work in designing a gradually typed programming language that is both efficient and well-behaved, which points the way towards a new generation of programming languages that can be used to seamlessly transition between personal scripting or rapid prototyping and large-scale software engineering. 

Faculty Advisor: Ross Tate