Human-to-Everything Lab
Undergraduate Researcher
The Problem
Autonomous robots face a fundamental tradeoff: large machine learning models are accurate but require cloud inference with large latency, while small edge models are fast but less capable. A robot navigating dynamic environments can't wait several seconds for each decision, but it also can't rely solely on a limited local model for complex environments.
UniLCD, developed at the lab previously, introduced a routing framework that dynamically chooses between local and cloud models based on scene complexity. My work extends this into eCLR (efficient Cloud-Local Routing), a learned routing policy that adapts to runtime conditions using reinforcement learning.
Routing Architecture
The core idea is simple. We use a fast local model for routine navigation, and route to the cloud only when the scene requires it. A lightweight router network analyzes each frame and decides which path to take. The challenge is training this router to make good decisions without ground truth labels for 'when to use cloud.'
I implemented a Soft Actor Critic (SAC) training pipeline to learn this routing policy. SAC is an off policy reinforcement learning algorithm that maximizes expected reward and entropy, encouraging exploration while remaining sample efficient. My custom reward design combines navigation success with a latency penalty, so the router learns to minimize cloud calls without sacrificing accuracy.
Local Model
The local path uses RegNetY-002; I plan on validating inference latency and accuracy under real-world conditions, including RF interference that degrades cloud connectivity. The local model handles straightforward navigation with < 5ms response times.
Testing on physical hardware may reveal edge cases the simulator missed; network congestion, thermal throttling, and sensor noise all affect real world performance differently than in simulation. I built a benchmarking robot to quantify these effects and am feeding them back into the training environment.
Cloud Model
For complex scenes, the system routes to InternVL3-2B, a vision-language-action (VLA) model with stronger visual reasoning capabilities than traditional models. The language pretraining gives the VLA better semantic understanding of scenes, helping it navigate around unexpected obstacles or unusual configurations that weren't well represented in training.
The tradeoff is latency: cloud inference adds network round trip time plus model inference time, typically a few seconds depending on load. The routing policy learns when this cost is worth paying, essentially predicting whether the local model would fail on the current frame.
Training Pipeline
I trained the routing policy in CARLA, an open source autonomous driving simulator. The original codebase used CARLA 0.9.x on Unreal Engine 4; I'm currently porting it to CARLA 0.10.0, built on Unreal Engine 5, for improved rendering fidelity and sensor simulation.
The SAC implementation replaced an earlier PPO-based approach. SAC's off-policy nature allows reusing experience from previous episodes, significantly improving sample efficiency.
Results
I'm co-authoring a paper on this dynamic control framework for autonomous navigation, currently in preparation for submission. The work demonstrates that learned routing policies can achieve near-cloud accuracy with near-local latency, a practical solution for deploying capable vision systems on resource constrained robots.