Experiments
Sensor Transferability
We incorporate data from GelSight, GelSlim, DIGIT and GelSight Mini into
the training of AnyTouch to obtain four different models, and compare them across four
downstream tasks. We observe performance improvements across the three unseen datasets, with
greater enhancements for unseen sensors than seen sensors. This suggests that the knowledge from
the data of GelSlim, DIGIT, and GelSight Mini can transfer to the GelSight and other sensors.
Multi-Sensor Representation Space
We extract one aligned contact frame from each sensor for the 30 touches in the unused
fine-grained
subset of TacQuad. We then use t-SNE to visualize the tactile representations. With our
cross-sensor
matching task, the representations from different sensors fully mix in a shared multi-sensor
space,
clearly clustering by the object's tactile information. This indicates that our model possesses
the
ability to extract sensor-agnostic features, enabling generalization to unseen sensors.
Static and Dynamic Perception
To validate the benefit of unified multi-sensor representations in transferring knowledge from
multiple sensor data to seen sensors and unseen sensors, we compare AnyTouch with existing
multi-sensor models on two datasets from seen sensors and two datasets from unseen sensors. As
shown in the tables, our AnyTouch model outperforms existing methods on all four datasets,
demonstrating its static perception capabilities on both seen and unseen sensors.
To test the dynamic perception capability of our method in real-world object manipulation tasks,
we conduct experiments on a real-world task: fine-grained pouring. The robot arm must rely
entirely on tactile feedback to pour out 60g of small beads from a cylinder that initially
contains 100g of beads. We conduct 10 real-world test runs and record the mean error. The results
demonstrate the importance of learning unified multi-sensor representations from both static
and dynamic perspectives for completing various tasks including real-world tasks.