Adith Swaminathan

PhD Candidate
Department of Computer Science
Cornell University

E-Mail: [first name]

344 Gates Hall,
Ithaca, NY 14853-5169

Curriculum Vitae
courtesy Mihai Budiu
I am on the job market!

About Me

   I am a final year graduate student in the Department of Computer Science at Cornell University. I am easily distracted by (cover-letter-speak: my interests are) new or elegant ideas in Machine Learning, more broadly, Artificial Intelligence, with applications that grapple with ambiguity (e.g. Natural Language Processing) and/or adapt to an ever-changing worldview.

   Following the parable of the Rabbit's Ph.D., I'm advised by Prof. Thorsten Joachims. He is flanked by Prof. Johannes Gehrke, Prof. Éva Tardos and Prof. Robert Kleinberg in my committee. My graduate minor is in Applied Mathematics.

   My thesis studies machine learning for interactive systems. I develop principles and algorithms that can re-use the interaction logs of these systems to inform the design of (train and evaluate) future systems -- technically speaking, solving batch learning under bandit feedback models.

   I interned with the Machine Learning group at Microsoft Research NYC during the summer of 2015, Computer Human Interactive Learning group at Microsoft Research Redmond during the summer of 2013, Search Labs at Microsoft Research SVC during the summer of 2012, and worked as a strategist with Tower Research Capital for 14 months from June 2010 - July 2011.

   I received a Bachelor of Technology degree in Computer Science and Engineering from Indian Institute of Technology, Bombay in 2010 and a Master of Science degree in Computer Science from Cornell University in 2014. I interned over the summer of 2009 at Microsoft India Development Centre as a Software Development Engineer-Tester.

[Website under construction]


POEM: Policy Optimization for Exponential Models. POEM performs counterfactual risk minimization on logged bandit feedback data.

Test-Bed for Counterfactual Evaluation and Learning (SIGIR 2016 Tutorial) This test-bed implements a news recommendation scenario and provides simple implementations of several off-policy estimators (for evaluation) and learning algorithms (for optimization) for benchmarking.


Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. [abstract, WWW Companion 2015] [paper, ICML2015] [extended, JMLR2015].
Adith Swaminathan and Thorsten Joachims.
   The logs of an interactive learning system contain valuable information about the system's performance and user experience. However they are biased (actions preferred by the system are over-represented) and incomplete (user feedback for actions not taken by the system is unavailable). So they cannot be used to train and evaluate better systems through supervised learning. We developed a new learning principle and an efficient algorithm that re-uses such logs to solve structured prediction problems. The key insight is to reason about the variance introduced when fixing the biased logs, and derive an efficient conservative approximation of this variance so that we can solve the problem at scale.

The Self-Normalized Estimator for Counterfactual Learning. NIPS 2015.
Adith Swaminathan and Thorsten Joachims.
   We identified an interesting, new kind of overfitting in the batch learning from logged bandit feedback scenario (which arises when trying to learn using the logs of interactive systems like search, recommendation and ad placement systems) which we call propensity overfitting. The conventional counterfactual risk estimator is very vulnerable to this overfitting when optimized over rich hypothesis spaces. We identified one property, equivariance, that guards against this problem and motivated the use of the self-normalized risk estimator for this problem.

Unbiased Ranking Evaluation on a Budget. WWW Companion 2015.
Tobias Schnabel, Adith Swaminathan and Thorsten Joachims.
   Evaluating structured prediction systems is hard and costly. We re-write several popular evaluation metrics for these systems as expectations and use ideas from importance sampling to derive a simple, efficient, and unbiased strategy that also generates re-usable judgements.

Temporal corpus summarization using submodular word coverage. CIKM 2012.
Ruben Sipos, Adith Swaminathan, Pannaga Shivaswamy and Thorsten Joachims.
   Evolving corpora can be summarized through timelines indicating the influential elements. We created interesting timelines of articles in the ACL anthology and NIPS conference proceedings over a span of 15 years using landmark documents, influential authors and topic key-phrases. We achieved this by formulating a sub-modular objective extending the coverage problem over words (traditional summarization) to incorporate coverage across time.

Beyond myopic inference in Big Data pipelines. KDD 2013.
Karthik Raman, Adith Swaminathan, Johannes Gehrke and Thorsten Joachims.
   Pipelines constructed using modular components have very brittle performance: errors in earlier components cascade through the pipeline causing catastrophic failures in the eventual output. We explored simple ideas from Probabilistic Graphical Models to make the inference procedure more robust, while still using off-the-shelf components to compose pipelines.

Mining videos from the web for electronic textbooks. ICFCA 2014.
Rakesh Agrawal, Maria Christoforaki, Sreenivas Gollapudi, Anitha Kannan, Krishnaram Kenthapadi and Adith Swaminathan.
   There is a wealth of educational content available online, but the primary teaching aid for the vast majority of students (e.g. India and China) continues to be the textbook. Recognizing how textbook sections are organized and mining them for concepts using Formal Concept Analysis, we automatically retrieved appropriate videos from Youtube Edu. Using transcripts made available by educators to remain accessible to deaf students, we side-stepped the challenge of automatic audio transcription or content-based video retrieval.


This timeline collects some of the fun research projects I worked on, arranged chronologically. The icons       represents Talks,       references Unpublished research projects, and       references Publications. The colors indicate:

In an ideal world with unlimited leisure, I will re-visit these projects, and assess whether I'm satisfied with the state-of-the-art solutions linked below.

NASH: An intelligent Hex player.

Hex is a two-player board game invented by Piet Hein and John Nash. The first player to move has an advantage (a winning strategy exists, but not known for 9x9 boards and larger). Hex has a large branching factor in the size of the board, which make search-based AI for Hex challenging.

Advised by Prof. Amitabha Sanyal, Nikhil Hooda and I coded a functional AI for Hex in Scheme using micro- and macro-reductions in an abstract connection game, Y. Using differential calculus to infer approximate Probability of Ownership for unoccupied hexes, NASH consistently beat Hexy (2000 Olympiad Gold, AAAI 2000 Outstanding Paper) and Queenbee (2000 Olympiad Silver) [both use alpha-beta search based on virtual connections] on 7x7 boards playing as Player 1, and was competitive with both programs on 9x9 boards. Note, Olympiads are typically played on 11x11 boards and use the pie rule to mitigate Player 1 advantage, which wasn't simulated. The current state-of-the-art is MoHex, which uses Monte Carlo Tree Search. Read more about AI research for Hex here.


K-server: Numerous Algorithms by RHS.

Advised by Prof. G. Sivakumar, Karthik Raman, Nikhil Hooda and I tackled the classic online K-server problem. We derived the competitive ratio for several online algorithms, and constructed request sequences that showed these ratio bounds were tight. We also created a variant of the Harmonic algorithm that was O(2k log k)-competitive and efficient. The state-of-the-art is the Work Function Algorithm which is O(2k-1) competitive (but very inefficient). We couldn't construct a request sequence to show this bound is tight. The K-server conjecture (there exists a deterministic O(k)-competitive algorithm) is open.


Affective Computing.

Samhita Kasula, Omkar Wagh, Karthik Raman and I were intrigued that we readily anthropomorphize pets but "intelligent entities" (say, Google) are still firmly considered tools. We studied affective computing, and presented work on simulating human-like emotions (e.g. incorporating prosody in Natural Language Processing -- a separate project that Nikhil Hooda, Prachur Goel and I worked on) to have more emotionally intelligent human-computer interactions. Read more about affective computing here.


Infinity+1 and other surreal numbers.

Surreal numbers are a simple, constructive number system that generalize the reals, infinitesimals and infinite numbers, invented by John Conway. Following Conway's insight connecting surreal numbers to two player games, I demonstrated how surreals can be used to reason about Hex -- however, this representation was much more inefficient than NASH's approximations. This talk won Gold at the Exposition General Championship at IIT Bombay. Read more about surreals and games here.


Learning Many-Many Matchings.

Advised by Prof. Soumen Chakrabarti, I built a reviewer-paper matching tool that integrated seamlessly with Microsoft CMT, ConfMaster, etc. The tool worked by scraping DBLP for recent publications of reviewers to build a text-profile, collecting citations from submissions, reviewer bids and topic/track/category matches from CMT etc. Optimal weighting of these different affinity cues was learned using transductive regression and using Max-Margin Markov Networks. The assignment was achieved using a min-cost max-flow problem. Modeling asymmetric costs (dissatisfied reviewers who don't get the papers they want vs. irritated reviewers who get papers they don't want) is important. Check out the Toronto paper matching system for an alternative approach.


Temporal summarization.

Remember Age of Empires 2? The end-game summary screen showed how your empire fared across the ages -- influential novel events (e.g. wars) were marked out as milestones on the timeline. Evolving text corpora can be summarized in a similar way. At any point in time, novelty can be measured relative to the word content in the past and influence can be measured by looking at the future development of the corpus. By formulating a sub-modular objective incorporating novely and influence, we found key-phrases and milestone papers. These turn out to complement citations (e.g. survey papers are highly cited but not novel). Try our prototype here. Also check out Dafna Shahaf's work on a similar idea -- “information maps”.


Augmenting textbooks with videos.

Can we make interactive textbooks programmatically, by using online educational content (e.g. Khan Academy, YoutubeEDU)? The Vidya project at MSR SVC attempted this, by retrieving appropriate videos from the web as one reads through a textbook section. Classic retrieval techniques (e.g. Bag-of-Words and topic models) don't work well, so we developed a pipeline to associate sections with distinctive phrases. Using transcripts of videos, we side-stepped the challenge of automatic audio transcription. Although this worked better than conventional approaches, the bigger problem of determining what is appropriate -- over and above what is relevant -- remains open.


Beyond myopic inference in Big Data pipelines.

We connected the workflow of Big Data pipelines to inference in probabilistic graphical models. This allows us to detect and recover from prediction errors in several workflows, using simple message passing algorithms (beam search) for inference in graphical models. Moreover, it allowed us to trade-off computation cost of inference with accuracy using heuristically adapted beam sizes.


Doubly semi-supervised learning.

There is ample evidence that unlabeled inputs in addition to labeled data can improve accuracy in univariate classification (refer semi-supervised learning). For structured output prediction, perhaps “unpaired” outputs will also help (since they tell us which structures are a priori more likely). We derived a simple objective that approximated the marginal distribution of these unpaired outputs to learn a discriminative prediction rule, but the quality of this approximation was terrible for high-dimensional problems with limited data. This hurt the eventual performance of this approach. If you have a problem that has unpaired outputs, perhaps you are better off deriving empirical constraints over expected outputs and using expectation regularization or using a generative model.

Miscellaneous links

  SPICMACAY Cornell (website I made)

  IIT Bombay Institute Football League 2009 (website I made)

  My Youtube videos (amateur carnatic music performances)

  Scratchpad (mostly Vogon poetry)