OpenRouter · Working Thesis · May 12, 2026

Auto-routing is the precursor to the OpenClaw moment for robotics.

Karina's thesis from a week of robotics events in San Francisco — Spencer Huang at Stanford, MagicLab's Silicon Valley summit, the Citrini "China robotics" debate — benchmarked against papers, GitHub, news, and voices. The short version: multi-model routing inside robot stacks is already an active research area, an OSS standard, and a Cloudflare-grade serverless product. OpenRouter has a real lane here.

Compiled by Khutulun for Karina · papers + GitHub + news + voices · last updated 2026-05-12

TL;DR

Six load-bearing claims, with the literal verdict from research / news / GitHub for each.

The thesis, claim by claim

  1. Frontier models just got good enough to help robotics teams build their own simulators and training data. Strong support NVIDIA's Spencer Huang frames this as a "data flywheel" between physics simulators and neural world models; V-JEPA 2 (Meta/FAIR, June 2025) and Isaac GR00T N (March 2026) are the public artifacts.
  2. Training data is the binding constraint, not model capacity. Strong support Mobile ALOHA / ALOHA 2 explicitly built around demo-throughput; the JAIR "Generalist Robot Learning from Internet Video" survey (2025) and Point Bridge (Jan 2026) treat the data gap as the open problem.
  3. Hardware is still meaningfully behind models. Mixed The hardware cost curve is collapsing fast (Unitree G1 at $5,900), but Citrini's on-the-ground tour found that Chinese humanoid OEMs' autonomy still lags Tesla/Figure. Stanford-side sensor research (muscle contraction, glove sensors) backs the "we still don't know enough" framing.
  4. China will win robotics because of training data + factories. Partial — needs nuance Citrini's actual call is "Crouching Tiger" — China dominates the supply chain and is a second-mover with components/manufacturing scale, but Tesla/Figure still lead in autonomy. So: China likely wins on hardware throughput, but the model-and-autonomy half of the win is still up for grabs.
  5. Tightly coupled training loops + multi-model robotics inference will matter. Smoking-gun support The literal "OpenRouter for robotics" paper — RoboRouter (arXiv 2603.07892, March 9, 2026) — already exists, with +13% real-world success rate vs best individual policy. Openmind's OM1 OS does plug-and-play OpenAI/Gemini/DeepSeek/xAI in a "Parallel" mode. Google's Gemini Robotics-ER 1.6 (April 2026) ships multi-model robot inference in production cloud. Counter: Skild AI ($14B valuation) explicitly bets on "any robot, any task, one brain."
  6. Auto-routing is the precursor to the OpenClaw moment for robotics. Plausible — and the surface is being built right now Nebius launched managed robotics inference + serverless endpoints in March 2026 (Token Factory). OpenRouter's existing Auto Router targets LLM workloads, not VLA / robot policies — that gap is the OpenRouter lane.

1. The Spencer Huang revelation

"The world has changed in the last three weeks. The world has changed in the last six weeks."

Frontier models from external providers (Anthropic in NVIDIA Robotics' case) just became powerful enough that a single physics PhD on the team could build a model-agnostic, physically-faithful simulator for robotics training data — in their OpenRouter-style internal setup.

Who Spencer is, exactly

Spencer Huang is NVIDIA's product lead for robotics software (Isaac, GR00T, OSMO, Cosmos), and Jensen Huang's son. (Karina's notes had "Director of Robotics" — that's slightly off; the right anchor is "product lead for robotics software".) The canonical published interview is the November 2025 Turing Post sit-down.

The Stanford event itself was most likely MagicLab's CONNECT 2026 / Global Embodied Intelligence Summit at Santa Clara Convention Center on April 28, 2026 — multiple Stanford-affiliated speakers (Martin Hellman, Jan Liphardt) plus NVIDIA GEAR Lab (Zhengyi Luo), Dyna Robotics (York Yang), Amazon Frontier AI & Robotics (Haozhi Qi), Chestnut Robotics (Evan Tao). Spencer Huang isn't on the published agenda; if he spoke or was there, it was likely informal / unlisted. So treat the conversation Karina had as an in-person side conversation at that event, not a Stanford-hosted talk per se.

Evidence — Spencer's framing

In the Turing Post interview Spencer describes frontier (neural) and conventional (physics) simulators as "siblings" growing up together inside a data flywheel — neural simulators generate diverse environments quickly but lack physical grounding, while traditional simulators (Isaac Sim, Newton) provide physics accuracy but are slow to set up. Frontier model access closes that loop because the LLM/VLM does the simulator-design heavy lifting.

Counter — was this actually new in spring 2026? And how good are these models really?

π0 (Physical Intelligence) shipped a strong general-purpose VLA in Oct 2024, and π0.5 demonstrated open-world generalization (cleaning entire kitchens in homes never seen during training) in April 2025. So the "frontier of usable robot models" had been moving for a year already. What's plausibly new in spring 2026: access. Internal NVIDIA robotics teams getting Anthropic-grade external frontier models for the first time is an enterprise / IT story, not a research-frontier story.

And on the "models are good enough" framing: the formal benchmark community is sceptical. VLA-Arena (Dec 2025, 170-task benchmark) found that "current VLAs tend toward memorization over generalization, exhibit asymmetric robustness, lack safety awareness." VLABench (Fudan) bluntly concludes current VLAs are "still far from GPT-2 level" in robotics. Sergey Levine (Physical Intelligence co-founder) on Dwarkesh in Q1 2026 puts the median estimate for household-autonomous robots at ~2030. So Spencer's "world has changed in 3-6 weeks" is plausibly an internal-tooling / data-pipeline statement (fewer weeks of work to set up a sim), not a generalization-frontier statement.

Unverified: The "world has changed in the last three weeks / six weeks" quote is from Karina's in-person conversation with Spencer at the Stanford robotics event — not a public quote. Treat as personal communication, not a citation.

2. Training data is the bottleneck

There's no internet for hands. The robot world doesn't have a Common Crawl.

Robotics doesn't have the equivalent of the open-source code corpus or the open-web text corpus. The "gold standard" — teleoperation, where a human in a harness puppeteers the robot — captures high-fidelity demos but doesn't scale to internet sizes. So the gap between robotics and LLMs is fundamentally a data gap, and the path forward is simulation + video pretraining.

Evidence — this is the consensus framing in the literature

  • paper Towards Generalist Robot Learning from Internet Video: A Survey (JAIR, July 2025) — explicitly frames the data scarcity bottleneck as the central LfV (Learning from Videos) problem and surveys methods to bootstrap robot policies from internet video.
  • paper Point Bridge: 3D Representations for Cross Domain Policy Learning (Jan 2026, Mandlekar / Fox et al., NVIDIA + NYU) — uses domain-agnostic point representations to make synthetic data zero-shot-transferable; up to 44% gains in pure sim-to-real transfer, 66% with limited real data.
  • paper Real-to-Sim Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions (Nov 2025) — builds soft-body digital twins from real videos to evaluate policies at scale.
  • paper Mobile ALOHA (Fu, Zhao, Finn — CoRL 2024) — co-training with 825 static-ALOHA episodes (RT-X) and only 50 mobile demos boosts success rates by up to 90%. The empirical proof that demo throughput is the binding scarce resource.
  • paper ALOHA 2 (May 2024) — explicitly engineered for "1000s of demonstrations per day" across a fleet. The whole hardware redesign is around the data-throughput bottleneck.
  • market view Gartner forecast (cited in NVIDIA's GTC 2026 blog): synthetic data will be more than 90% of edge-scenario robot training data by 2030.

Counter — Skild AI argues simulation + internet video is enough; "stop sprinkling 1% real data on a VLM"

Skild AI raised $1.4B at a $14B+ valuation on the thesis that you can build a unified general-purpose robot brain by pre-training on simulation + internet human videos and post-training with targeted real-world data. CEO Deepak Pathak directly critiques approaches that "sprinkle <1% real robot data on a VLM" and call it a foundation model — Skild's claim is that true scale requires trillions of examples via simulation, and the hardware data isn't the bottleneck if you do the simulation right.

3. Hardware is still behind

The Stanford prof was right: cost is collapsing, capability is not.

Even with frontier models and good simulators, hardware is still the rate limiter — sensor density on hands, replicating muscle contraction, dexterous manipulation. Stanford labs are putting glove sensors on humans at the gym to figure out what proprioceptive signal we even need to capture. Models can scale faster than hands can.

Evidence — hardware capability still lags model capability

Counter — hardware cost curves are collapsing fast (and that may be enough)

Unitree G1 ships at $5,900 base. AMI Labs raised a $1.03B seed with Toyota and NVIDIA partnerships. Tesla Optimus, Figure, MagicLab, AgiBot, Dyna are all shipping production hardware in 2025-2026. The capability gap is real but the price/availability gap is closing faster than anyone forecast in 2024. So "hardware is behind" might be a 2025 framing that's already getting outdated; the real question by 2027 may be whether the foundation models can keep up with how cheap the bodies have gotten.

4. The China thesis

"China will win robotics" — but Citrini's actual on-the-ground call is more nuanced.

Karina's framing: China wins because they have (a) the most surveillance / video training data, (b) the most factories deploying robots and capturing skill data, (c) the supply chain depth. MagicLab is the leading example — already in factories at scale, hosted a Silicon Valley summit in April 2026.

Evidence — China's supply-chain depth and factory deployment is real

Counter — Citrini's call is "Crouching Tiger" not "China wins outright"

The most rigorous on-the-ground analysis (Citrini visited 11+ Chinese robotics companies June 22 – July 1, 2025) lands somewhere more interesting than "China wins":

  • Tesla Optimus and Figure AI remain the global gold standard today for autonomy. The autonomous-humanoid layer is still Western-led.
  • China's edge is supply-chain depth + government backing — the "Crouching Tiger" framing — second-mover advantage in components/manufacturing scale, ready to dominate once the technology matures.
  • Demand for AgiBot's 3,000–5,000 unit 2025 target appears to come almost entirely from government R&D and SOEs (e.g., China Mobile) — not real commercial pull. Hardware shipments aren't demand-driven yet.
  • The investable winners in Citrini's basket are Western pick-and-shovel suppliers (Teradyne via Universal Robots, analog semis, motion-control names) trading at cyclical lows.

So the more defensible version of Karina's thesis is: China will win the body half (cost curves, factory throughput, supply chain), but the brain half is genuinely contested — Skild ($14B), Physical Intelligence, Dyna, Figure, Tesla Optimus, plus the foundation-model heavyweights (Anthropic, Google, NVIDIA) are well-funded and culturally proximate to the AI / VLA research frontier in a way that Chinese teams aren't, yet.

5. Tightly coupled training loops + multi-model robotics inference

This is the most important section. The literature literally lined up under your fingertips while you were typing this thesis.

Different countries / labs / hardware vendors will train their robots on different skills using different models. Switching between those models at inference time — i.e., routing — will matter for robotics in a way analogous to how it matters in OpenRouter's LLM business. Hardware that can host multiple models, and software stacks that can swap them in and out, will win.

Evidence — multi-model robotics OS already exists in OSS

The single strongest piece of evidence is Openmind's OM1. (Karina's notes had "Megamind" — that's a misremember; the real company is Openmind, founded by Stanford bioengineering professor Jan Liphardt, who is also active in crypto via the ERC-7777 human-robot-society standard.)

  • repo openmind/OM1 — 2,695★, MIT, "modular AI runtime for robots," pre-configured LLM endpoints for OpenAI, xAI, DeepSeek, Anthropic, Meta, Gemini, NearAI, and Ollama. Hardware-agnostic across Unitree Go2/G1, TurtleBot, Ubtech Yanshee. Runs on Jetson, Mac mini, generic Linux, Raspberry Pi 5.
  • product OpenMind LLM docs — Single / Dual / Parallel modes — "Parallel" mode runs multiple specialized LLMs simultaneously with action filters, which is literally OpenRouter-style multi-model routing inside a robot stack.
  • quote CEO Jan Liphardt: "Just as Android transformed smartphones, we believe an open OS will transform robotics."
  • news Jan Liphardt's personal page — confirms his Stanford bioengineering tenure, sabbatical to build OpenMind, and crypto / DeSci background.

🎯 Evidence — the literal "OpenRouter for robotics" paper exists, published two months ago

Evidence — internal multi-model routing is a hot 2025-2026 research vein

Five additional 2025–2026 papers all do mixture-of-experts / router / fast-slow decoupling inside VLA models:

Evidence — small-footprint world models are real

  • paper LeWorldModel (LeWM) (LeCun et al., March 2026, arXiv 2603.19312) — 15M parameters, single-GPU training in a few hours, 48× faster planning than DINO-WM. The "you can do a world model with very few parameters" claim Karina remembered from the LeCun-adjacent literature is real and got dramatically more compact in spring 2026.

Evidence — serverless robotics inference is shipping

Counter — the single-model bet is also winning

  • Skild AI at $14B+ valuation explicitly bets on "any robot, any task, one brain." They argue routing-style approaches (different models for different tasks) miss the point — the right architecture is one foundation model with a hierarchical low-frequency / high-frequency policy split, not many specialized models stitched together.
  • Dyna Robotics' DYNA-1 is a single foundation model that hits 99.4% success rate without human intervention; $120M Series A at ~$600M valuation in spring 2025. They aren't routing between models — they ship one model that handles edge cases via an engineered reward model + RL.
  • NVIDIA Isaac GR00T N is one open foundation VLA, not a router across models. NVIDIA's bet at the system level is that the multi-model coordination problem is solved internally in a unified VLA, not externally via inference-time routing.

So the strong version of the counter is: routing across foundation models from different vendors may be a transient niche — useful in OSS / hobbyist setups (OM1) and in research (AC2-VLA, MoTVLA), but the production pattern in 2026-2028 may converge on big single-model bets per company. The strong version of Karina's claim is: even those single-model bets internally use routing (mixture-of-experts, fast-slow), and OpenRouter's external auto-routing is the natural analog at the infrastructure layer.

6. What it means for OpenRouter

There's a real lane here, and the surface is being built right now.

Implication 1 — Robotics is greenfield for OpenRouter today

OpenRouter's existing Auto Router targets LLM workloads (38 candidate models, NotDiamond-powered, no extra fee). The Pareto Code Router (April 21, 2026) extends to coding-specific cost-vs-capability tradeoffs. Neither targets VLA / robot-policy workloads. That's the gap. The pattern that wins in robotics is going to look like the Pareto Code Router but for VLA + policy + perception — different latency tiers, different parameter sizes, different physical-AI-grade safety guarantees.

Implication 2 — The natural product wedge is "OM1 endpoints, but managed"

Openmind's OM1 already does plug-and-play routing across OpenAI, Gemini, DeepSeek, xAI, Anthropic — but OSS, self-hosted, and hardware-side. The serverless production layer (Nebius Token Factory) is being built by NVIDIA's cloud partners, not by OpenRouter. There's a credible OpenRouter wedge as the inference router for OM1-style robot stacks — built on top of OpenRouter's existing multi-model billing + observability infra, with VLA-specific latency tiers and a path-to-edge story (large model in cloud + small adapter on-device, mirroring AsyncVLA).

Implication 3 — Cost control still matters, but it's the "second story"

Karina's framing in the original brain dump: "auto-routing matters for cost purposes, which is important, but I think routing is going to be so important in robotics." That's exactly right. The cost-router story (Pareto-style) is the wedge today; the robotics-router story is the bigger lane in 2027-2028. Both should be priced together because they share the same routing infra.

Implication 4 — Watch Openmind, Nebius, and Skild closely

These are the three companies whose product positioning most directly bears on whether OpenRouter has a robotics lane:

  • Openmind (OM1) — the OSS surface that proves routing-in-robotics works as a pattern.
  • Nebius (Token Factory + Serverless AI) — the managed-inference surface that shows what enterprise robotics inference looks like.
  • Skild AI — the strongest counter-positioning. If Skild's "one brain" approach wins decisively, the robotics-routing wedge shrinks; if it plateaus, the wedge is open.

So what should we actually do?

  1. Add OpenMind / OM1 to the dossier as a watch target — they validate the pattern and have an MIT-licensed reference implementation we can study + benchmark against. Track GitHub star velocity and Liphardt's commentary.
  2. Track Nebius Token Factory pricing and SLAs — that's the closest production surface to "OpenRouter for robotics." Worth a benchmark from the routing-and-cost angle.
  3. Add a research note for the auto-router team — VLA-aware latency tiers, edge-vs-cloud split, action-conditioned variants of small models (V-JEPA 2-AC at 300M, V-JEPA 2.1 at 80M, LeWM at 15M) as router candidates for low-latency tiers. This is a paper waiting to be written.
  4. Skill: robotics-watch — extend the daily briefing to include MagicLab, Unitree, Figure, Tesla Optimus, Skild, Physical Intelligence, Openmind, Dyna, Nebius (Physical AI), and the VLA-routing arXiv stream. Weekly cadence; quarterly synthesis.

Sources

Papers

GitHub repos

News / press releases

Citrini Research

Voices & interviews

Counter-evidence (per soul.md inverse-hypothesis discipline)

Newer hits added in second pass

Products