Rishabh Madan | POMDP solver for HERB

**Manipulating a can with unknown weight**

Object manipulation under uncertainty is one of the most important problem that we face when deploying robots in environments having objects of unknown shapes and sizes. This uncertainty could arise when perceiving the physical parameters of objects (mass distribution, friction, shape), imperfect models of the environment and the robot model. There has been a lot of work in the past few years that try to tackle this problem. TossingBot from Princeton and Google nicely demonstrates how end-to-end methods could be useful for dealing with arbitrary objects. Though it has its own limitations like interpretability and use of only RGB-D sensors to infer the physics. There have also been works that try utilizing meta reinforcement learning for dealing with uncertainty and improving the manipulation policy online. Prior to such methods, there has been a lot of works that phrase this problem as a POMDP. POMDPs, in addition to general MDPs, model the partially observable part of the state as belief space. For instance, in our toy experiment we use objects of varying mass and since it is not directly observable, we model it as the belief (generally spoken as ‘belief over X’). PBVI, HSVI2 and SARSOP are some of the point-based solvers that are commonly known to the research community in this domain. From the Survey of POMDP-Solvers,

A point-based algorithm explores the belief space, focusing on the reachable belief states, while maintaining a value function by applying the point-based backup operator.

SARSOP is a highly computationally efficient solver compared to its predecessors due to its feature of sampling from the reachable space. Compared to other approaches, it is extremely sample efficient with the downside of limiting its application to complex manipulation tasks. In this project, I developed a framework to integrate SARSOP with the sampling-based motion planners available from OMPL. The main idea is to design high-level action primitives, and discretizing the state space that are used with SARSOP to compute the optimal policy graph. These action primitives utilize the CRRT planner to perform constrained actions in the continuous action space, allowing the use of the policy graph on a 7-DOF Barrett WAM.

I designed a toy experiment to show the efficacy of this framework. The experiment involves the robot holding a dumbbell of unknown mass placed on the table. The task is to shift-and-lift the dumbbell to a target position. We use the readings from the F/T sensor on the arm, as observation, to update the belief distribution over the mass of the dumbbell. In the short gif above, we see the robot moving the robot throught the state space, feeding the F/T readings to our framework and following the policy graph obtained from SARSOP.