Deep Learning with Logged Bandit Feedback

Authors: Thorsten Joachims and Adith Swaminathan>
Cornell University
Department of Computer Science

Version: 1.00
Date: 23.02.2018


BanditNet permits the use of logged contextual bandit feedback for training deep neural networks. Such contextual bandit feedback can be available in huge quantities (e.g., logs of search engines, recommender systems) at little cost, opening up a path for training deep networks on orders of magnitude more data. To this effect, we propose a Counterfactual Risk Minimization (CRM) approach for training deep networks using an equivariant empirical risk estimator with variance regularization, BanditNet, and show how the resulting objective can be decomposed in a way that allows Stochastic Gradient Descent (SGD) training. We empirically demonstrate the effectiveness of the method by showing how deep networks -- ResNets in particular -- can be trained for object recognition without conventionally labeled images. The code and data given on this page was used in the experiments for the ICLR 2018 paper, and it should be easy to extend it to other domains.

Source Code

The program is free for scientific use, and it is based on the ResNet implementation from CNTK. If you use BanditNet in your scientific work, please cite as

The implementation was developed on Linux with CNTK 2.0 under Anaconda Python 2.7. It requires a CNTK build that supports GPU computation.

The archive contains the python source code of the most recent version of BanditNet, as well as the CNTK downloader for the CIFAR-10 dataset. Unpack the archive using the shell command:

      tar xvfz banditnet.tar.gz 

This expands the archive into the current directory, which now contains all relevant files. Download and install the CIFAR-10 dataset by going into the CIFAR-10 sub-directory

      cd CIFAR-10

and executing the command


This will download the data and convert it to CNTK format.

How to Use

BanditNet consists of a single script that trains the model and evaluates performance on the test set. You call it like

      python -n resnet20 -c 0.0001 -b 128 -r 0.1 -e 1000 -l 0.9 -x ./CIFAR-10-BLBF/train_map_full1.txt -y ./CIFAR-10-BLBF/train_map_blbf1.txt -z ./CIFAR-10/test_map.txt

The BLBF data derived from the full-information CIFAR-10 data is in the CIFAR-10-BLBF sub-directory. To train the code you need to specify the filename for the BLBF data with the -y option. In addition, you need to specify the corresponding full-information dataset with the -x option, since some debugging information is computed from the full-information dataset and it contains the links to the image files. The CIFAR-10-BLBF directory contains training files of various sizes.

The BLBF training code can be called with the following options:

                                   [-v VAL_IMG_FILE] [-w VAL_BLBF_FILE] -z
                                   TEST_FILE [-n NETWORK] [-c L2_REG_WEIGHT]
                                   [-b MINIBATCH_SIZE] [-r LEARN_RATE]
                                   [-l LAGRANGE_MULT] [-e EPOCHS]
                                   [-p PROFILER_DIR] [-m MODEL_DIR]
                                   [-tensorboard_logdir TENSORBOARD_LOGDIR]
                                   [-s START_CHECKPOINT]

required arguments:
  -x TRAIN_IMG_FILE, --train_img_file TRAIN_IMG_FILE
                        Full-information training file with images and full
                        information label for debugging.
  -y TRAIN_BLBF_FILE, --train_blbf_file TRAIN_BLBF_FILE
                        BLBF training file with losses and propensities.
  -z TEST_FILE, --test_file TEST_FILE
                        Full-information test file.

optional arguments:
  -h, --help            show this help message and exit
  -n NETWORK, --network NETWORK
                        Network type (resnet20 or resnet110)
  -c L2_REG_WEIGHT, --l2_reg_weight L2_REG_WEIGHT
                        L2 regularization parameter
  -b MINIBATCH_SIZE, --minibatch_size MINIBATCH_SIZE
                        Minibatch size
  -r LEARN_RATE, --learn_rate LEARN_RATE
                        Factor for learning rate
  -l LAGRANGE_MULT, --lagrange_mult LAGRANGE_MULT
                        Lagrange multiplier value to subtract from the loss
  -e EPOCHS, --epochs EPOCHS
                        Number of training epochs
  -v VAL_IMG_FILE, --val_img_file VAL_IMG_FILE
                        Full-information validation file with images.
  -w VAL_BLBF_FILE, --val_blbf_file VAL_BLBF_FILE
                        BLBF validation file with losses and propensities.
  -p PROFILER_DIR, --profiler_dir PROFILER_DIR
                        Directory for saving profiler output
  -m MODEL_DIR, --model_dir MODEL_DIR
                        Directory for saving model
  -tensorboard_logdir TENSORBOARD_LOGDIR, --tensorboard_logdir TENSORBOARD_LOGDIR
                        Directory where TensorBoard logs should be created
                        Checkpointed model to start training with.


The author is not responsible for implications from the use of this software. This material is based upon work supported by the National Science Foundation under Award IIS-1615706. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF).

Known Problems