Research
Scaling Physical AI for Sidewalk Autonomy
At Coco Robotics, we are redefining the future of urban goods delivery through the power of physical AI. Our mission is to make sidewalk delivery robots as reliable, adaptive, and intelligent as human couriers and beyond. To achieve this, we are applying the latest advances in computer vision, robot learning, and AI, from foundation model pretraining and generative AI to data-driven simulation and neural 3D reconstruction.
Operating one of the largest fleets of sidewalk robots across multiple cities, Coco is uniquely positioned to transform real-world urban navigation experience into robot intelligence. As the Chief AI Scientist of Coco, I envision the strategy toward sidewalk autonomy around three key research pillars:
1. Pretraining on Massive Video Demonstrations
Every day, thousands of Coco Robots navigate sidewalks in diverse urban neighborhoods, capturing an enormous amount of real-world video data. These videos record a rich variety of sidewalk layouts, traffic patterns, weather conditions, and urban interactions, providing an unparalleled window into the dynamics of everyday city life.
Our goal is to leverage this massive and growing video corpus to pretrain a navigation foundation model for autonomous sidewalk navigation. Similar to how large language models learn from internet-scale text, our robots’ perception and planning models will learn from city-scale visual data, building a robust understanding of the visual and behavioral patterns that shape sidewalk environments.
the diverse urban navigation videos collected by Coco Robot fleet.
2. Learning in Simulation Environments
While video pretraining provides strong visual priors, it does not directly teach counterfactual reasoning, causal understanding, or decision-making in unseen scenarios. To achieve these capabilities for our autopilot model, we further fine-tune our models through reinforcement learning in simulation. We are developing high-fidelity and high-performing urban simulation environments. These simulations allow us to reproduce complex sidewalk interactions, from pedestrians crossing unexpectedly to delivery obstacles, in a safe and controllable setting.
Simulation of Coco Robot navigating UrbanSim (Wu and He et al, CVPR’25) and UrbanVerse (Liu et al, arXiv preprint).
To bridge the gap between simulation and reality, we are also reconstructing real-world 3D environments from Coco videos using Gaussian Splatting and other neural reconstruction methods. This enables simulation scenes that reflect the visual richness and geometric accuracy of real cities, pushing simulation realism closer to the physical world.
Vid2Sim (Xie et al. CVPR’25) turns video into interactive training environment.
3. Continuous Learning from Human Feedback
Even the most advanced models need to continuously adapt to the unpredictability of the real world. During live operations, Coco Robots are remotely monitored by human teleoperators, who can intervene or provide guidance whenever the robot encounters uncertain or ambiguous situations.
We are developing shared-control algorithms that enable our AI models to learn directly from human interventions, turning each teleoperation event into a valuable training signal. Over time, the robots learn to anticipate operator intent, refine their decision-making, and autonomously handle similar challenges in the future.
This human-in-the-loop learning forms a crucial feedback loop between real-world deployment, data collection, and continuous model improvement — ensuring that every delivery contributes to a smarter, more adaptive fleet.
My recent research advances, including PVP4Real (Peng et al., ICRA 2025) and PPL (Cai et al., NeurIPS 2025 Spotlight) , lay the foundation for this capability. These works demonstrate efficient real-time policy learning from online human interventions and demonstrations, without requiring explicit rewards or heavy pretraining. This approach significantly improves the data efficiency and training safety of mobile robots operating in complex dynamic environments.
Toward a Scalable Physical AI for Sidewalk Autonomy
By integrating large-scale video pretraining, reinforcement learning in simulation, and continual learning from human feedback, Coco Robotics is charting a path toward scalable sidewalk autonomy. Our vision is a world where sidewalk robots can safely and intelligently navigate any city, adapting in real time to the vibrant, unpredictable rhythm of urban life.
We have openings for full-time and internship Machine Learning Engineer positions, see here for more details



