Manipulating deformable objects is a key functionality for robotic systems. In our recent work, we’ve been thinking about how to represent deformable objects in the joint domain of vision and touch. We’ve come up with an interesting idea: using forces from touch to deform neural fields — continuous neural representations of 3D geometries (ICRA 2022 + code/dataset).

Here, we present VIRDO: an implicit, multi-modal, and continuous representation for deformable-elastic objects. VIRDO operates directly on visual (point cloud) and tactile (reaction forces) modalities and learns rich latent embeddings of contact locations and forces to predict object deformations subject to external contacts. Further, we demonstrate VIRDOs ability to: i) produce high-fidelity cross-modal reconstructions with dense unsupervised correspondences, ii) generalize to unseen contact formations, and iii) state-estimation with partial visio-tactile feedback.

A key feature of our representation is its multimodality, where we are seamlessly combining sight and touch.

The schematic of VIRDO: Our representation is built directly from raw visual feedback and implicitly represents the geometry of the object as a signed-distance field (an example of a continuous neural representation). The force and deformation modules capture the deformation field we apply to the continuous neural representation.

Here, we are showing VIRDO representing all these different commonly used tools and there deformations using both the 3D geometric data and force feedback. We emphasize that only one model is trained and is able to represent the entire data-set.

Video

Authors

Youngsun Wi

Andy Zeng

Pete Florence

Nima Fazeli

Citation

@article{wi2022virdo,
  title={Virdo: Visio-tactile implicit representations of deformable objects},
  author={Wi, Youngsun and Florence, Pete and Zeng, Andy and Fazeli, Nima},
  journal={arXiv preprint arXiv:2202.00868},
  year={2022}
}