Rishabh Madan | Can a driver trained in TORCS drive in CARLA?

There has been some success in solving continuous action problems in simulation using RL, but these algorithms still don’t seem to work too well in the real world environment. One of the reasons is due to the fact that there is very limited scope of exploration on real robotic agents since there is a chance of hardware damage once it is left to explore and such events can results in loss of monetary and technical resources. There have been efforts to mitigate this problem by the use of transfer learning to transfer optimal policies learned in simulation to the real world. While significant progress has been made to improve learning in a single task, the idea of transfer learning has recently been applied to reinforcement learning tasks. Agents trained in simulation if directly run in real-world (target domain), are bound to be sub-optimal due to implementation gaps like variation in sensor-data between simulator and real-world, differences in the physical parameters of the environment & inaccurate dynamics of the vehicle. Agents that address and generalize well to these differences offer better performance in the target domain.

The idea of transferring an agent from the TORCS, a low-fidelity simulator to CARLA, a high-fidelity simulator, originates from the reasons discussed above. This implies that training an agent in TORCS and transferring it to CARLA is similar to the sim2real paradigm, in the sense that both the simulators have considerable differences in terms of virtual sensor implementations, physical parameters and vehicle dynamics. An agent trained in TORCS, when transferred to CARLA, if works well, would verify the usefulness of the training approach and would thus be helpful in quantifying its performance when used for then sim-to-real case.

This project required implementing input states taken by TORCS in CARLA. Upon direct transfer of a DDPG agent from TORCS, the agent fails to run in CARLA and immediately crashes into the siderail. We then xperimented with a Variational Autoencoder (VAE), Denoising Autoencoder (DAE) and $\beta$-VAE for learning a latent representation of the input states, we call this the representation block. The latent embedding obtained by after representation block is then feeded to the DDPG agent for training. Upon direct transfer of this agent to CARLA, we observe some minor improvements, as the agent is successfully able to take the first turn, but immediately crashes after that. After running several experiments, we observe that DAE works well for our case. Further improvements can be made predicting the turning radius, instead of steering angle, and using ackermann geometry to evaluate the steer angle. This would be helpful in our case, since we are operating on vehicles with different physical dimensions.