Yiwei Bai  

Ph.D. Student
CS Department@Cornell University

Office: 344, Gates Hall, Cornell, Ithaca, NY 14853
Email: yb263 [at] cornell (dot) edu

GitHub  /  LinkedIn


Hi! I am a CS Ph.D. student at Cornell University, where I am supervised by professor Carla P. Gomes. My research interests lie in the intersection of reinforcement learning, decision making and computational sustainability. I resecived a B.Eng from ACM Honors Class, Zhiyuan College, Shanghai Jiao Tong University.

Cornell University, USA
Ph.D. in Computer Science, Aug. 2018 to Present
Shanghai Jiao Tong University, China
Bachelor of Engineering, Sep. 2014 to Jun. 2018
Cornell University, USA
Research Intern, June. 2017 to Dec. 2017

Zero Training Overhead Portfolios for Learning to Solve Combinatorial Problems
Yiwei Bai, Wenting Zhao, Carla P. Gomes
Under review.

  • We have observed that well-trained models for combinatorial problems acquired in the same training trajectory, with similar top validation performance, perform well on very different validation instances

  • ZTop leverages these diverse models to increase the test performance with (almost) zero training overhead.

  • Batch Learning from Bandit Feedback through Bias Corrected Reward Imputation
    Lequn Wang, Yiwei Bai, Arjun Bhalla, Thorsten Joachims
    Appears in Real-world Sequential Decision Making workshop, ICML, 2019.

  • We introduce a new "Model the World" style batch learning from logged bandit feedback algorithm: Bias Corrected Reward Imputation (BCRI).

  • BCRI learn a reward-regression model and derive a policy from the estimated rewards.

  • The problem is formulated as bi-level optimization, where the upper level maximizes the DM estimate and the lower lever fits a weighted reward-regression.

  • Publication

    Deep Reasoning Networks for Unsupervised Pattern De-mixing with Constraint Reasoning
    Di Chen, Yiwei Bai, Wenting Zhao, Sebastian Ament, John M. Gregoire, Carla P. Gomes
    ICML 2020.

  • We proposed Deep Reasoning Networks (DRNets), an end-to-end framework that combines deep learning with reasoning for solving complex tasks.

  • DRNets exploit problem structure and prior knowledge by tightly combining logic and constraint reasoning with stochastic-gradient-based neural network optimization.

  • We illustrate the power of DRNets on de-mixing overlapping hand-written Sudokus and on a substantially more complex task in scientific discovery: Crystal-Structure-Phase-Mapping
  • Scalable Relaxations of Sparse Packing Constraints: Optimal Biocontrol in Predator-Prey Networks
    Johan Bjorcks, Yiwei Bai, Yexiang Xue, Xiaojian Wu, Mark Whitemore, Carla P. Gomes
    In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018.

  • In this research, we solved an important problem in computational sustainability -- biological control of invasive species.

  • We proposed an approximation algorithm based on a width relaxation and randomized projections, which is quite scalable compared with previous work and can be used for realistic problem

  • We evaluated our algorithm in the context of biocontrol for the insect pest Hemlock Wolly Adelgid(HWA) in eastern North America.
  • An Empirical Study of Collective Behaviors in Many-agent Reinforcement Learning (Extended Abstract)
    Yiwei Bai*, Lantao Yu*, Yaodong Yang*, Jun Wang, Weinan Zhang, Ying Wen, Yong Yu
    In the Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2018.

  • In this research, we designed and developed Million-level Multi-Agent Reinforcement Learning Platform. You could find the platform here.

  • We tried to understand AI population with multi-agent reinforcement learning and verify the principles developed in the real world could be applied to AI population.

  • Research Projects

    Yi, an AI platform playing the GO(game)
    Yiwei Bai, Lequn Chen, and colleagues in Tianrang, (advised by Professor Guirong Xue), Jan. 2017

  • Yi won the Four-th prize in the first International Computer Go Competition
  • Yi won the Ninth place in the 10th UEC Cup
  • I and colleagues trained and tuned the policy network and value network
  • I and colleagues designed and implemented the reinforcement learning framework of the value network incorporated with policy network