Post by Robotrove
140 followers
๐ง๐ต๐ฒ ๐ฟ๐ผ๐ฏ๐ผ๐๐ถ๐ฐ๐ ๐ฑ๐ฎ๐๐ฎ ๐ฏ๐ผ๐๐๐น๐ฒ๐ป๐ฒ๐ฐ๐ธ ๐ท๐๐๐ ๐ด๐ผ๐ ๐๐ป๐น๐ผ๐ฐ๐ธ๐ฒ๐ฑ. 14.6 years of internet video. 147 million action segments. Zero annotation. Rice University's @RobotPI Lab just dropped EgoInfinity โ not another static dataset, but a data engine that turns arbitrary internet videos into executable robot training data. Here's what it does: โ Takes any static-camera RGB video of human manipulation โ Recovers metric 3D hand trajectories + 6-DoF object poses + contact states โ Retargets the motion onto any robot morphology No mocap. No wearables. No manual labels. No human in the loop. The key insight: it's not a naive pipeline. Cross-module metric calibration and interaction-aware refinement fix the drift and contact inconsistency you'd get from just chaining vision models together. Already validated on 4 robots: Unitree G1, Robonaut2, Dual-Franka, XLeRobot โ with real-robot demos on Franka FR3 and LEAP dexterous hand (grasping, cutting, wiping, pouring). The addressable corpus is bounded only by Action100M. The engine itself is corpus-agnostic. Scale is literally infinite. This is how you go from hundreds of lab demos to millions of real-world manipulation trajectories. Huge thanks to the team behind it Kejia Ren, @Yiting Chen, Howard Qian, Podshara Chanrungmaneekul, and Kaiyu Hang, with @Andrew Morgan from RAI Institute. ย Project: https://lnkd.in/eDCtAtZU ย Paper: https://lnkd.in/eRVG3xiU ย Code: https://lnkd.in/eEwsatwx Follow the Robotrove page for more drops that push physical AI forward.
Video Content