Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-Rich Tasks
UC Berkeley, Yonsei University
* New version of code coming soon! In the meantime, you can check out our old simulation and experiment code linked at the bottom of the page.
We present EquiContact, a hierarchical SE(3)-equivariant vision-to-force policy for contact-rich manipulation that achieves spatial generalization. Our policy handles contact-rich tasks, and is only trained on a fixed task configuration but can generalize to unseen configurations with arbitrary SE(3) transformations.
EquiContact full pipeline on peg-in-hole (PiH) task with spatial generalization to unseen configurations. Video demonstrates (1) generalization to translational transformation (flat platform), (2) generalization to rotation + translational transformation (tilted platform), and (3) robustness to extreme transformations.
This paper presents a framework for learning vision-based robotic policies for contact-rich manipulation tasks that generalize spatially across task configurations. We focus on achieving robust spatial generalization of the policy for the contact-rich tasks trained from a small number of demonstrations. We propose EquiContact, a hierarchical policy composed of a high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF) and a novel low-level compliant visuomotor policy (Geometric Compliant Action Chunking Transformers, G-CompACT). G-CompACT operates using only localized observations (geometrically consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB images) and produces actions defined in the end-effector frame. Through these design choices, we show that the entire EquiContact pipeline is SE(3)-equivariant, from perception to force control. We also outline three key components for spatially generalizable contact-rich policies: compliance, localized policies, and induced equivariance. Real-world experiments on peg-in-hole (PiH), screwing, and surface wiping tasks demonstrate a near-perfect success rate and robust generalization to unseen spatial configurations, validating the proposed framework and principles.
Figure: Overview of EquiContact.
We propose an EquiContact, a hierarchical, provably SE(3) vision-to-force equivariant policy for spatially generalizable contact-rich tasks.
(Left) The proposed EquiContact consists of a Diffusion-Equivariant Descriptor Field (Diff-EDF), a Geometric Compliant Action Chunking Transformer (G-CompACT),
and a Geometric Admittance Controller (GAC).
(Right) The G-CompACT method is trained only on the fixed task configuration, but it can be generalized to task configurations that undergo arbitrary SE(3) transformation, given the reference frames.
G-CompACT is a low-level visuomotor policy that takes localized observations and outputs actions defined in the end-effector frame. The policy inputs include (i) geometrically consistent error vectors (GCEV) between the current and target end-effector poses, (ii) wrist RGB images, and (iii) force-torque readings in end-effector frame. The policy outputs relative poses and admittance gains for compliant control. Moreover, using the language guidance to the wrist camera input imposes approximately left-invariance to SE(3) task transformation. Therefore, G-CompACT is a left-invariant localized policy, which is a key component for spatial generalization and SE(3)-equivariance.
We outline three key components for spatially generalizable contact-rich policies which are embodied in EquiContact:
Which together can be summarized as the following punchline:
We also have a proof for full SE(3)-equivariance from vision input to force-control output in our paper. If you are interested, please check out the paper!
Task: Erase the marker on the flat/tilted whiteboard
Task: Screw-lock the peg into the threaded flat/tilted platform
Videos demonstrating EquiContact's spatial generalization on surface wiping and screwing tasks. The policy is trained only on a fixed configuration but successfully generalizes to unseen configurations on flat and tilted platforms without much degradation.
Executed without GAC. Without complaint controller, the robot exceeds too high interaction force which activated safety stop.
Executed with GAC, but with fixed admittance gains. Without real-time gain modulation, the robot often exerts excessive forces or becomes too stiff.
CompACT successfully finish the task by leveraging compliant control and real-time gain scheduling. However, it fails under out-of-distribution (OOD) task transformation, even on small translational displacement.
Here are the extended version of videos.