Human-to-Everything Lab
Undergraduate Researcher
The Problem
There is a fundamental tradeoff in deploying ML models on small robots. Large models are accurate but have to run on a remote server, which adds seconds of latency per decision. Small models are fast enough to run on the robot itself, but they are less capable. An autonomous robot navigating a busy environment can't afford to wait several seconds for every decision, but it also can't rely entirely on a weak local model when something unexpected happens.
Prior work at the lab, UniLCD (published at ECCV 2024), proposed using reinforcement learning to train a routing policy that makes this local vs. cloud decision dynamically. It used PPO as its RL algorithm and ran in CARLA on Unreal Engine 4, with a RegNet backbone as the cloud model. The framing of the problem carries over to my work, but most of the engineering is new. The RL algorithm changed, both models changed, the simulator engine changed, and the reward function and data collection pipeline were redesigned from scratch.
The Router
The idea is simple: for straightforward scenes like an empty corridor, use the fast local model. For complex scenes like a crowded intersection, route to the cloud. A lightweight router network looks at each camera frame and decides which path to take. The hard part is that there are no labeled examples of 'when to use cloud.' Nobody can annotate a dataset with that information ahead of time, because whether you need the cloud depends on how the local model would have performed, which you only know after the fact. This makes it a reinforcement learning problem; the router has to learn the decision boundary through trial and error.
The training algorithm is Soft Actor-Critic (SAC), replacing UniLCD's PPO. Simulator steps in CARLA are expensive; each one requires full physics, sensor rendering, and scene updates. PPO is on-policy, meaning it discards experience after each update, so a lot of that expensive simulation data gets thrown away. SAC keeps a replay buffer and learns from past experience across many updates, which makes better use of every frame the simulator produces. SAC also includes entropy regularization, a term in its training objective that penalizes the policy for becoming too certain too early. Without it, the router tends to collapse to always picking the local model, since most frames in any route are easy enough that local works fine. The entropy term keeps the router exploring the boundary between local and cloud long enough to learn an actual tradeoff.
The reward function went through several iterations. The router has to balance two competing objectives: make progress along the route, and avoid using the cloud when it isn't needed. Early versions penalized cloud usage too aggressively, which caused the router to never try it. Later versions tied the progress signal to straight line distance rather than route distance, which made learning inconsistent across routes of different lengths. Each failure pointed to a specific structural issue in how the objective was encoded, and the final formulation addressed them.
Local and Cloud Models
The local model is RegNetY-002, a small convolutional network that takes a cropped camera frame and outputs a steering direction and speed. It runs in under 5ms, fast enough for real time control. Training uses expert demonstrations collected in the simulator: a path following controller drives the robot along known routes and records what it sees and does. One thing that came up the hard way is that the camera setup during training has to match evaluation exactly. Even a few degrees of pitch difference or a slightly different crop region causes the model to perform noticeably worse, because it learns spatial features tied to specific regions of the image. If the road appears in the lower third during training, it needs to appear there during evaluation too.
The cloud model is SimLingo, a roughly one billion parameter vision language action model pretrained on both images and language. It has a much richer understanding of what it's looking at and can handle scenes the local model has never encountered. The cost is latency; a round trip to the cloud takes 1 to 3 seconds depending on network conditions, and the router's job is to learn when that cost is worth paying.
Fine-Tuning
SimLingo was pretrained on real world driving footage and an older version of the simulator, so fine-tuning it for the new CARLA environment was necessary. This turned out to be the most debugging intensive part of the project. The first training run produced a degenerate result: the model output the same steering angle on every frame regardless of what it was looking at. It had found a constant output that minimized training loss without learning to respond to visual input. The root cause was several independent mismatches between how training data was collected and how the model ran at inference. The coordinate frame for target waypoints differed between training and evaluation, the lookahead distance didn't match, and the supervision targets had too little variance across frames to give the model a meaningful learning signal. Any one of these alone might have been survivable, but together they left the model with no reason to look at the camera at all.
Separately, the original training data contained no examples of obstacle avoidance. The model had never seen what to do when a pedestrian steps into the path. To fix this, a data collection pipeline was built around ORCA (Optimal Reciprocal Collision Avoidance), which computes smooth, cooperative avoidance trajectories for each agent in the scene. Running the robot through dense pedestrian traffic under ORCA's control produces naturalistic avoidance demonstrations that the cloud model can learn from during fine-tuning.
Simulation Environment
The entire training pipeline was ported from CARLA 0.9.x (Unreal Engine 4) to CARLA 0.10.0 (Unreal Engine 5). The motivation was visual fidelity. UE5 introduced Lumen for global illumination and Nanite for high detail geometry, both of which produce training images that look substantially more realistic than UE4. This matters because the cloud model is sensitive to visual domain shift; if synthetic training images don't resemble what the robot will actually see, the learned behavior transfers poorly.
The migration surfaced engine level issues, the most significant being a regression in how CARLA handles pedestrian movement in synchronous mode. In UE4, the physics engine (PhysX) processed character movement in a predictable order relative to the simulation tick. UE5 replaced PhysX with Chaos, which schedules movement integration differently and breaks this assumption. Pedestrians would stutter, teleport, or ignore velocity commands entirely. This is a confirmed open bug in CARLA with no upstream fix. The workaround was bypassing the physics locomotion system and directly setting each pedestrian's world transform every simulation step.
Route Design
The evaluation set is 25 navigation routes across two maps, ranging from easy (straight corridors, open spaces) to hard (sharp turns in narrow passages with dense pedestrian traffic). Easy routes should be entirely manageable by the local model; hard routes should force the router to use the cloud. Having this spectrum matters because the router needs to see both ends during training to learn where the boundary is.
A procedural route generation tool built on CARLA's waypoint API handles route construction. It samples candidate paths and validates them for clearance from buildings, navigable ground surface, and minimum passage width, which lets the hand designed routes be augmented with additional variety without manually placing waypoints. For evaluation, there is a strict split: training routes, held out routes on the same map, and transfer routes on an entirely different map. The transfer evaluation is the most interesting, because it tests whether the router learned something generalizable about when cloud is worth the cost, or just memorized which specific scenes are hard.
Status
The UE5 migration is complete and stable, all 25 routes are verified, both models are fine-tuned on UE5 data, and the router is in its final training phase. A paper on this framework is currently in preparation for submission.