CS7792 - Bias and Fairness in Learning Systems

Special Topics in Machine Learning

Spring 2020
Prof. Thorsten Joachims
Cornell University, Department of Computer Science & Department of Information Science

 

Time and Place

First meeting: January 22, 2020
Last meeting: April 29, 2020
Time: Wednesdays, 10:10am - 11:10am
Room: Gates Hall G11 (Ithaca) / Bloomberg Center 397 (Cornell Tech)

Course Description

We are now using artificial intelligence and machine learning for making decision in an ever increasing range of settings. On one end of the range are systems that inform life-altering decisions about hiring, loans, and college admission. On the other end are small decisions like recommending a product to a user, but in aggregate these small decisions have a big impact as well by fundamentally shaping markets like content streaming and e-commerce. This implies a responsibility that these systems are fair, unbiased, and lead to societally desirable outcomes. While quantifying and formalizing decision criteria in a computable way holds a lot of promise, we have come to realize that naively applying machine learning to these problems can lead to undesirable outcomes, can be unfair, and can perpetuate existing biases.

This seminar discusses issues of bias and fairness in learning systems as an emerging research area in the intersection of machine learning, causal inference, economics, and information retrieval. Topics include causal inference and the potential outcome model, causality and fairness, handling selection bias in data, and fairness criteria for learning. Concepts will be illustrated with applications, especially in search engines and recommender systems.

The prerequisites for the class are: knowledge of machine learning algorithms and their theory, basic probability, basic statistics, and general mathematical maturity.

Enrollment is limited to PhD students.

 

Syllabus

  • 01/22: Introduction [slides]
    • Examples of bias and fairness issues in learning systems.
    • Overview of course.
    • Course administration and policies.
  • 01/29: Primer on Causal Inference for Intelligent Systems [slides]
    • Potential outcomes model and the contextual bandit model.
    • Model the world vs. model the bias.
    • Policies, reward regression, IPS weighting.
  • 02/05: Presentation Bias in Ranking [slides]
    • Reading: T. Joachims, A. Swaminathan, T. Schnabel, Unbiased Learning-to-Rank with Biased Feedback, International Conference on Web Search and Data Mining (WSDM), 2017. [PDF]
  • Rest of schedule is in CMT.

Contact

Please use the CS7792 Piazza Forum for questions and discussions. Otherwise, contact Thorsten Joachims (homepage) [Office hours: Wednesdays, 11:10am-12:00pm (Gates 418)].

For peer feedback, we are using this CMT Instance for this course.

For grades, we are using CMS.

For remote participation, we are using this Zoom room.

 

Grading

This is a 1-credit seminar. S/U only (no letter grade, no audit). Grades will be determined based on quizzes, paper presentations, peer reviewing, and class participation.

For the paper presentations, we will use peer review. This means that you will comment on other students presentations, giving constructive feedback. The quality of your reviewing also becomes a component of your own grade.

To eliminate outlier grades for quizzes and peer reviews, the lowest grade is replaced by the second lowest grade when grades are cumulated at the end of the semester. So, missing one week is no big deal.

To pass the course, you need to get at least half of the cumulative quiz points, half of the presentation points, half of the peer reviewing points, and half of the class participation points.

 

Reference Material

We will mostly read original research papers, but the following books and tutorials provide entry points for the main topics of the class:

  • Imbens, Rubin, "Causal Inference for Statistics, Social, and Biomedical Sciences", Cambridge University Press, 2015. (online via Cornell Library)
  • Morgan, Winship "Counterfactuals and Causal Inference", Cambridge University Press, 2007.
  • Barocas, Hardt, Narayanan. "Fairness and Machine Learning". (online)

Other sources for general background on machine learning are:

  • Kevin Murphy, "Machine Learning - a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library)
  • Shai Shalev-Shwartz, Shai Ben-David, "Understanding Machine Learning - From Theory to Algorithms", Cambridge University Press, 2014. (online)
  • Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online)
  • Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
  • Tom Mitchell, "Machine Learning", McGraw Hill, 1997.
  • Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
  • Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
  • Vapnik, "Statistical Learning Theory", Wiley, 1998.

Academic Integrity

This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated. Respectful, constructive and inclusive conduct is expected of all class participants.