TL;DR
Six load-bearing claims, with the literal verdict from research / news / GitHub for each.
The thesis, claim by claim
- Frontier models just got good enough to help robotics teams build their own simulators and training data. Strong support NVIDIA's Spencer Huang frames this as a "data flywheel" between physics simulators and neural world models; V-JEPA 2 (Meta/FAIR, June 2025) and Isaac GR00T N (March 2026) are the public artifacts.
- Training data is the binding constraint, not model capacity. Strong support Mobile ALOHA / ALOHA 2 explicitly built around demo-throughput; the JAIR "Generalist Robot Learning from Internet Video" survey (2025) and Point Bridge (Jan 2026) treat the data gap as the open problem.
- Hardware is still meaningfully behind models. Mixed The hardware cost curve is collapsing fast (Unitree G1 at $5,900), but Citrini's on-the-ground tour found that Chinese humanoid OEMs' autonomy still lags Tesla/Figure. Stanford-side sensor research (muscle contraction, glove sensors) backs the "we still don't know enough" framing.
- China will win robotics because of training data + factories. Partial — needs nuance Citrini's actual call is "Crouching Tiger" — China dominates the supply chain and is a second-mover with components/manufacturing scale, but Tesla/Figure still lead in autonomy. So: China likely wins on hardware throughput, but the model-and-autonomy half of the win is still up for grabs.
- Tightly coupled training loops + multi-model robotics inference will matter. Smoking-gun support The literal "OpenRouter for robotics" paper — RoboRouter (arXiv 2603.07892, March 9, 2026) — already exists, with +13% real-world success rate vs best individual policy. Openmind's OM1 OS does plug-and-play OpenAI/Gemini/DeepSeek/xAI in a "Parallel" mode. Google's Gemini Robotics-ER 1.6 (April 2026) ships multi-model robot inference in production cloud. Counter: Skild AI ($14B valuation) explicitly bets on "any robot, any task, one brain."
- Auto-routing is the precursor to the OpenClaw moment for robotics. Plausible — and the surface is being built right now Nebius launched managed robotics inference + serverless endpoints in March 2026 (Token Factory). OpenRouter's existing Auto Router targets LLM workloads, not VLA / robot policies — that gap is the OpenRouter lane.
1. The Spencer Huang revelation
"The world has changed in the last three weeks. The world has changed in the last six weeks."
Who Spencer is, exactly
Spencer Huang is NVIDIA's product lead for robotics software (Isaac, GR00T, OSMO, Cosmos), and Jensen Huang's son. (Karina's notes had "Director of Robotics" — that's slightly off; the right anchor is "product lead for robotics software".) The canonical published interview is the November 2025 Turing Post sit-down.
The Stanford event itself was most likely MagicLab's CONNECT 2026 / Global Embodied Intelligence Summit at Santa Clara Convention Center on April 28, 2026 — multiple Stanford-affiliated speakers (Martin Hellman, Jan Liphardt) plus NVIDIA GEAR Lab (Zhengyi Luo), Dyna Robotics (York Yang), Amazon Frontier AI & Robotics (Haozhi Qi), Chestnut Robotics (Evan Tao). Spencer Huang isn't on the published agenda; if he spoke or was there, it was likely informal / unlisted. So treat the conversation Karina had as an in-person side conversation at that event, not a Stanford-hosted talk per se.
Evidence — Spencer's framing
In the Turing Post interview Spencer describes frontier (neural) and conventional (physics) simulators as "siblings" growing up together inside a data flywheel — neural simulators generate diverse environments quickly but lack physical grounding, while traditional simulators (Isaac Sim, Newton) provide physics accuracy but are slow to set up. Frontier model access closes that loop because the LLM/VLM does the simulator-design heavy lifting.
- interview Turing Post — Spencer Huang: NVIDIA's Big Plan for Physical AI (Nov 1, 2025)
- launch NVIDIA GTC 2026 — Isaac GR00T N (open VLA), Isaac Lab 3.0, Newton physics, GEAR-SONIC (March 18, 2026)
- paper V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (Meta FAIR + Mila, LeCun et al.; revised April 2026) — pre-trained on 1M+ hours of video, action-conditioned variant is just 300M params, deployed zero-shot on a Franka arm
- repo facebookresearch/vjepa2 — 3,789★, V-JEPA 2.1 (Mar 16, 2026) added an 80M-param ViT-B/16 checkpoint, V-JEPA 2-AC for robotics dominates Octo and Cosmos baselines on Franka pick-and-place (80% vs 10%)
- news Figure — Helix 02 (Jan 27, 2026) — full-body autonomy with a 4-min, 61-action dishwasher demo. The "System 0" layer was trained on 1,000+ hours of human motion data and replaces 109k LOC of hand-written C++. Concrete instance of frontier-model-style scaling actually shipping on production humanoid hardware.
- news GTC 2026 keynote (Mar 18, 2026) — Cosmos 3, GR00T N1.7, Alpamayo 1.5, Physical AI Data Factory Blueprint. Rev Lebaredian's "compute is data" framing is the citrini-style tightly-coupled hardware-model loop made explicit by NVIDIA leadership.
Counter — was this actually new in spring 2026? And how good are these models really?
π0 (Physical Intelligence) shipped a strong general-purpose VLA in Oct 2024, and π0.5 demonstrated open-world generalization (cleaning entire kitchens in homes never seen during training) in April 2025. So the "frontier of usable robot models" had been moving for a year already. What's plausibly new in spring 2026: access. Internal NVIDIA robotics teams getting Anthropic-grade external frontier models for the first time is an enterprise / IT story, not a research-frontier story.
And on the "models are good enough" framing: the formal benchmark community is sceptical. VLA-Arena (Dec 2025, 170-task benchmark) found that "current VLAs tend toward memorization over generalization, exhibit asymmetric robustness, lack safety awareness." VLABench (Fudan) bluntly concludes current VLAs are "still far from GPT-2 level" in robotics. Sergey Levine (Physical Intelligence co-founder) on Dwarkesh in Q1 2026 puts the median estimate for household-autonomous robots at ~2030. So Spencer's "world has changed in 3-6 weeks" is plausibly an internal-tooling / data-pipeline statement (fewer weeks of work to set up a sim), not a generalization-frontier statement.
- paper π0: A Vision-Language-Action Flow Model for General Robot Control (Oct 2024)
- paper π0.5: A VLA Model with Open-World Generalization (Apr 2025)
- repo physical-intelligence/openpi — 11,368★, last push April 16, 2026
- paper VLA-Arena: 170-task generalization benchmark (Dec 2025) — memorization-heavy, weak safety
- paper VLABench (Fudan) — current VLAs "far from GPT-2 level"
- interview Sergey Levine on Dwarkesh (Q1 2026) — household-autonomous robots median estimate ~2030
2. Training data is the bottleneck
There's no internet for hands. The robot world doesn't have a Common Crawl.
Evidence — this is the consensus framing in the literature
- paper Towards Generalist Robot Learning from Internet Video: A Survey (JAIR, July 2025) — explicitly frames the data scarcity bottleneck as the central LfV (Learning from Videos) problem and surveys methods to bootstrap robot policies from internet video.
- paper Point Bridge: 3D Representations for Cross Domain Policy Learning (Jan 2026, Mandlekar / Fox et al., NVIDIA + NYU) — uses domain-agnostic point representations to make synthetic data zero-shot-transferable; up to 44% gains in pure sim-to-real transfer, 66% with limited real data.
- paper Real-to-Sim Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions (Nov 2025) — builds soft-body digital twins from real videos to evaluate policies at scale.
- paper Mobile ALOHA (Fu, Zhao, Finn — CoRL 2024) — co-training with 825 static-ALOHA episodes (RT-X) and only 50 mobile demos boosts success rates by up to 90%. The empirical proof that demo throughput is the binding scarce resource.
- paper ALOHA 2 (May 2024) — explicitly engineered for "1000s of demonstrations per day" across a fleet. The whole hardware redesign is around the data-throughput bottleneck.
- market view Gartner forecast (cited in NVIDIA's GTC 2026 blog): synthetic data will be more than 90% of edge-scenario robot training data by 2030.
Counter — Skild AI argues simulation + internet video is enough; "stop sprinkling 1% real data on a VLM"
Skild AI raised $1.4B at a $14B+ valuation on the thesis that you can build a unified general-purpose robot brain by pre-training on simulation + internet human videos and post-training with targeted real-world data. CEO Deepak Pathak directly critiques approaches that "sprinkle <1% real robot data on a VLM" and call it a foundation model — Skild's claim is that true scale requires trillions of examples via simulation, and the hardware data isn't the bottleneck if you do the simulation right.
- news Skild AI Expands Generalized Robot Intelligence (March 16, 2026) — partnerships with NVIDIA, ABB, Universal Robots, MiR. Production deployment on NVIDIA Blackwell with Foxconn.
- post Skild AI — Building the General-Purpose Robotic Brain — "any robot, any task, one brain"
3. Hardware is still behind
The Stanford prof was right: cost is collapsing, capability is not.
Evidence — hardware capability still lags model capability
- research report Citrini Research — Crouching Tiger: On the Ground in China's Humanoid Robot Supply Chain (Aug 2025) — UBTech Walker has shaky motion and limited recognition outside controlled demos; Unitree G1/R1 base unit "has no hands and is remote-controlled — a drone with legs"; "useful autonomous humanoids remain aspirational".
- paper Interactive imitation learning for dexterous robotic manipulation: challenges and perspectives — a survey (Frontiers in Robotics, Dec 2025) — articulates the dexterity bottleneck and why hand hardware + interactive feedback are still gating problems.
- product MagicLab's H01 dexterous hand, launched at the April 28-29, 2026 Magic Ecosystem Conference in Silicon Valley — explicit acknowledgement that hands are the bottleneck and need a dedicated product.
Counter — hardware cost curves are collapsing fast (and that may be enough)
Unitree G1 ships at $5,900 base. AMI Labs raised a $1.03B seed with Toyota and NVIDIA partnerships. Tesla Optimus, Figure, MagicLab, AgiBot, Dyna are all shipping production hardware in 2025-2026. The capability gap is real but the price/availability gap is closing faster than anyone forecast in 2024. So "hardware is behind" might be a 2025 framing that's already getting outdated; the real question by 2027 may be whether the foundation models can keep up with how cheap the bodies have gotten.
- news Humanoids Daily — Yann LeCun's World Model Vision Gets a Leaner Engine (April 2026) — covers AMI Labs' $1.03B seed and Toyota / NVIDIA partnerships
- news MagicLab Robotics Unveils Embodied AI Vision in Silicon Valley (April 2026) — Co-Create 1000 Initiative, $1B over 5 years
4. The China thesis
"China will win robotics" — but Citrini's actual on-the-ground call is more nuanced.
Evidence — China's supply-chain depth and factory deployment is real
- news MagicLab unveils MagicBot X1 + Magic-Mix world model + H01 hand at the Silicon Valley summit (April 28-29, 2026) — international markets accounted for 60% of 2025 sales, spanning 50+ countries; $1B "Co-Create 1000" developer ecosystem fund; long-term target of $14B annual revenue by 2036.
- news Humanoids Daily — MagicLab Pivots to Magic-Mix Intelligence with $1B Developer Ecosystem Bet
- news Magic Ecosystem Conference Luma page — confirms York Yang (Co-Founder, Dyna Robotics), Jan Liphardt (Founder, Openmind, Stanford), Haozhi Qi (former Meta Robotics), Lewis Hong (former SpaceX) on panels.
- research report Citrini's Crouching Tiger documents heavy supply-side subsidies — ¥1 trillion VC fund (March 2025), ¥10B local funds in Beijing/Shanghai/Shenzhen/Wuhan, plus state-owned enterprise revenue subsidies disguised as commercial orders.
Counter — Citrini's call is "Crouching Tiger" not "China wins outright"
The most rigorous on-the-ground analysis (Citrini visited 11+ Chinese robotics companies June 22 – July 1, 2025) lands somewhere more interesting than "China wins":
- Tesla Optimus and Figure AI remain the global gold standard today for autonomy. The autonomous-humanoid layer is still Western-led.
- China's edge is supply-chain depth + government backing — the "Crouching Tiger" framing — second-mover advantage in components/manufacturing scale, ready to dominate once the technology matures.
- Demand for AgiBot's 3,000–5,000 unit 2025 target appears to come almost entirely from government R&D and SOEs (e.g., China Mobile) — not real commercial pull. Hardware shipments aren't demand-driven yet.
- The investable winners in Citrini's basket are Western pick-and-shovel suppliers (Teradyne via Universal Robots, analog semis, motion-control names) trading at cyclical lows.
So the more defensible version of Karina's thesis is: China will win the body half (cost curves, factory throughput, supply chain), but the brain half is genuinely contested — Skild ($14B), Physical Intelligence, Dyna, Figure, Tesla Optimus, plus the foundation-model heavyweights (Anthropic, Google, NVIDIA) are well-funded and culturally proximate to the AI / VLA research frontier in a way that Chinese teams aren't, yet.
- research report Citrini Research — Robotics Update (July 10, 2025)
- research report Citrini Research — Thematic Primer: Humanoid Robots (May 16, 2025)
5. Tightly coupled training loops + multi-model robotics inference
This is the most important section. The literature literally lined up under your fingertips while you were typing this thesis.
Evidence — multi-model robotics OS already exists in OSS
The single strongest piece of evidence is Openmind's OM1. (Karina's notes had "Megamind" — that's a misremember; the real company is Openmind, founded by Stanford bioengineering professor Jan Liphardt, who is also active in crypto via the ERC-7777 human-robot-society standard.)
- repo openmind/OM1 — 2,695★, MIT, "modular AI runtime for robots," pre-configured LLM endpoints for OpenAI, xAI, DeepSeek, Anthropic, Meta, Gemini, NearAI, and Ollama. Hardware-agnostic across Unitree Go2/G1, TurtleBot, Ubtech Yanshee. Runs on Jetson, Mac mini, generic Linux, Raspberry Pi 5.
- product OpenMind LLM docs — Single / Dual / Parallel modes — "Parallel" mode runs multiple specialized LLMs simultaneously with action filters, which is literally OpenRouter-style multi-model routing inside a robot stack.
- quote CEO Jan Liphardt: "Just as Android transformed smartphones, we believe an open OS will transform robotics."
- news Jan Liphardt's personal page — confirms his Stanford bioengineering tenure, sabbatical to build OpenMind, and crypto / DeSci background.
🎯 Evidence — the literal "OpenRouter for robotics" paper exists, published two months ago
- paper RoboRouter: Training-Free Policy Routing for Robotic Manipulation (March 9, 2026) — an LLM-based router that selects among heterogeneous robot policies (VLA / VA / code) at inference time. +13% real-world success rate over the best individual policy. This is, almost word-for-word, the multi-model routing thesis Karina articulated — formalized, benchmarked, and shipped two months ago.
- paper SwiftBot: Federated LLM-powered robotic task execution with intelligent container orchestration (March 7, 2026) — 1.5–5.4× startup latency reduction. Decentralized routing across robot fleets.
- paper LLM-MCoX: GPT-4o as a centralized planner routing waypoints across 6 robots (Sept 2025) — 22.7% faster exploration. Multi-model orchestration for multi-robot inference.
- product Gemini Robotics-ER 1.6 (April 14, 2026) — Google's reasoning model natively calls VLA models AND user-defined functions as tools at inference time. Google has shipped multi-model robot inference in production cloud.
- news Boston Dynamics × DeepMind partnership (Jan 5, 2026) — explicit hardware-OEM × frontier-model tight coupling. Atlas + Gemini Robotics integration; the citrini-thesis pattern at the highest visible production tier.
Evidence — internal multi-model routing is a hot 2025-2026 research vein
Five additional 2025–2026 papers all do mixture-of-experts / router / fast-slow decoupling inside VLA models:
- paper AC2-VLA: Action-Context-Aware Adaptive Computation in VLA Models — lightweight action-prior router; 1.79× speedup, 29.4% of dense FLOPs.
- paper InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation (Feb 2026) — Mixture-of-Transformers; +4.4% static, +26.7% dynamic vs π0.5.
- paper CogVLA: Cognition-Aligned VLA Model via Instruction-Driven Routing & Sparsification (NeurIPS 2025) — 97.4% on LIBERO with 2.5× lower training cost vs OpenVLA.
- paper MoTVLA: VLA Model with Unified Fast-Slow Reasoning (ICLR 2026) — Mixture-of-Transformers integrating fast/slow with diffusion policy.
- paper AsyncVLA: An Asynchronous VLA for Fast and Robust Navigation on the Edge — 8B remote VLA + 76M onboard edge adapter, 40% better under 6s comm delays.
Evidence — small-footprint world models are real
- paper LeWorldModel (LeWM) (LeCun et al., March 2026, arXiv 2603.19312) — 15M parameters, single-GPU training in a few hours, 48× faster planning than DINO-WM. The "you can do a world model with very few parameters" claim Karina remembered from the LeCun-adjacent literature is real and got dramatically more compact in spring 2026.
Evidence — serverless robotics inference is shipping
- news Nebius Teams With NVIDIA to Build Cloud for Robotics and Physical AI (March 17, 2026) — integrates NVIDIA Physical AI Data Factory Blueprint; Nebius Token Factory + Serverless AI Endpoints for low-latency execution from cloud to edge; early customers RoboForce, Milestone Systems (Hafnia VLMs), Voxel51.
- product Nebius Physical AI & Robotics
- product Nebius Serverless AI — Jobs (batch), Endpoints (inference), DevPods (dev) — pay-per-second.
Counter — the single-model bet is also winning
- Skild AI at $14B+ valuation explicitly bets on "any robot, any task, one brain." They argue routing-style approaches (different models for different tasks) miss the point — the right architecture is one foundation model with a hierarchical low-frequency / high-frequency policy split, not many specialized models stitched together.
- Dyna Robotics' DYNA-1 is a single foundation model that hits 99.4% success rate without human intervention; $120M Series A at ~$600M valuation in spring 2025. They aren't routing between models — they ship one model that handles edge cases via an engineered reward model + RL.
- NVIDIA Isaac GR00T N is one open foundation VLA, not a router across models. NVIDIA's bet at the system level is that the multi-model coordination problem is solved internally in a unified VLA, not externally via inference-time routing.
So the strong version of the counter is: routing across foundation models from different vendors may be a transient niche — useful in OSS / hobbyist setups (OM1) and in research (AC2-VLA, MoTVLA), but the production pattern in 2026-2028 may converge on big single-model bets per company. The strong version of Karina's claim is: even those single-model bets internally use routing (mixture-of-experts, fast-slow), and OpenRouter's external auto-routing is the natural analog at the infrastructure layer.
6. What it means for OpenRouter
There's a real lane here, and the surface is being built right now.
Implication 1 — Robotics is greenfield for OpenRouter today
OpenRouter's existing Auto Router targets LLM workloads (38 candidate models, NotDiamond-powered, no extra fee). The Pareto Code Router (April 21, 2026) extends to coding-specific cost-vs-capability tradeoffs. Neither targets VLA / robot-policy workloads. That's the gap. The pattern that wins in robotics is going to look like the Pareto Code Router but for VLA + policy + perception — different latency tiers, different parameter sizes, different physical-AI-grade safety guarantees.
Implication 2 — The natural product wedge is "OM1 endpoints, but managed"
Openmind's OM1 already does plug-and-play routing across OpenAI, Gemini, DeepSeek, xAI, Anthropic — but OSS, self-hosted, and hardware-side. The serverless production layer (Nebius Token Factory) is being built by NVIDIA's cloud partners, not by OpenRouter. There's a credible OpenRouter wedge as the inference router for OM1-style robot stacks — built on top of OpenRouter's existing multi-model billing + observability infra, with VLA-specific latency tiers and a path-to-edge story (large model in cloud + small adapter on-device, mirroring AsyncVLA).
Implication 3 — Cost control still matters, but it's the "second story"
Karina's framing in the original brain dump: "auto-routing matters for cost purposes, which is important, but I think routing is going to be so important in robotics." That's exactly right. The cost-router story (Pareto-style) is the wedge today; the robotics-router story is the bigger lane in 2027-2028. Both should be priced together because they share the same routing infra.
Implication 4 — Watch Openmind, Nebius, and Skild closely
These are the three companies whose product positioning most directly bears on whether OpenRouter has a robotics lane:
- Openmind (OM1) — the OSS surface that proves routing-in-robotics works as a pattern.
- Nebius (Token Factory + Serverless AI) — the managed-inference surface that shows what enterprise robotics inference looks like.
- Skild AI — the strongest counter-positioning. If Skild's "one brain" approach wins decisively, the robotics-routing wedge shrinks; if it plateaus, the wedge is open.
So what should we actually do?
- Add OpenMind / OM1 to the dossier as a watch target — they validate the pattern and have an MIT-licensed reference implementation we can study + benchmark against. Track GitHub star velocity and Liphardt's commentary.
- Track Nebius Token Factory pricing and SLAs — that's the closest production surface to "OpenRouter for robotics." Worth a benchmark from the routing-and-cost angle.
- Add a research note for the auto-router team — VLA-aware latency tiers, edge-vs-cloud split, action-conditioned variants of small models (V-JEPA 2-AC at 300M, V-JEPA 2.1 at 80M, LeWM at 15M) as router candidates for low-latency tiers. This is a paper waiting to be written.
- Skill:
robotics-watch— extend the daily briefing to include MagicLab, Unitree, Figure, Tesla Optimus, Skild, Physical Intelligence, Openmind, Dyna, Nebius (Physical AI), and the VLA-routing arXiv stream. Weekly cadence; quarterly synthesis.
Sources
Papers
- V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning — Meta FAIR + Mila, LeCun et al., revised April 2026
- LeWorldModel (LeWM) — LeCun et al., March 2026, arXiv 2603.19312, 15M-param JEPA world model
- π0: A VLA Flow Model for General Robot Control — Physical Intelligence, October 2024
- π0.5: A VLA Model with Open-World Generalization — Physical Intelligence, April 2025
- Towards Generalist Robot Learning from Internet Video: A Survey — JAIR, July 2025
- Point Bridge: 3D Representations for Cross Domain Policy Learning — NVIDIA + NYU, January 2026
- Real-to-Sim Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions — November 2025
- Mobile ALOHA — Stanford / Fu, Zhao, Finn, CoRL 2024
- ALOHA 2: Enhanced Low-Cost Hardware for Bimanual Teleoperation — May 2024
- Interactive imitation learning for dexterous robotic manipulation: a survey — Frontiers in Robotics, December 2025
- AC2-VLA: Action-Context-Aware Adaptive Computation in VLA Models
- InternVLA-A1: Unifying Understanding, Generation and Action — February 2026
- CogVLA: Cognition-Aligned VLA Model via Instruction-Driven Routing & Sparsification — NeurIPS 2025
- MoTVLA: VLA Model with Unified Fast-Slow Reasoning — ICLR 2026
- AsyncVLA: An Asynchronous VLA for Fast and Robust Navigation on the Edge
GitHub repos
- openmind/OM1 — 2,695★, MIT, modular AI runtime for robots with multi-LLM routing
- facebookresearch/vjepa2 — 3,789★, V-JEPA 2 + 2.1 checkpoints (80M / 300M / 600M / 1B / 2B)
- physical-intelligence/openpi — 11,368★, π0 / π0-FAST / π0.5
News / press releases
- NVIDIA GTC 2026: From Simulation to Production — Build Robots with AI (March 18, 2026)
- MagicLab Robotics Unveils Embodied AI Vision in Silicon Valley (April 2026)
- Humanoids Daily: MagicLab Pivots to Magic-Mix
- Magic Ecosystem Conference Luma page
- Nebius Teams With NVIDIA for Robotics & Physical AI (March 17, 2026)
- Skild AI Expands Generalized Robot Intelligence (March 16, 2026)
- Humanoids Daily: LeWorldModel Coverage (April 2026)
Citrini Research
- Crouching Tiger: On the Ground in China's Humanoid Robot Supply Chain (August 2025)
- Robotics Update (July 10, 2025)
Voices & interviews
- Turing Post: Spencer Huang — NVIDIA's Big Plan for Physical AI (Nov 1, 2025)
- Yann LeCun on V-JEPA 2 (June 2025)
- Jan Liphardt — CoinDesk author page — ERC-7777 standard, "Trusted Autonomy" essay
- Sergey Levine — The Promise of Generalist Robotic Policies — real-robot data flywheel framing
- Sergey Levine on Dwarkesh — household-autonomous robots median estimate ~2030
- Jim Fan year-end thread (Dec 2025) — physical AGI as "the last grand challenge"
Counter-evidence (per soul.md inverse-hypothesis discipline)
- VLA-Arena (Dec 2025) — VLAs are memorization > generalization, weak safety
- VLABench (Fudan) — current VLAs "still far from GPT-2 level" in robotics
- Skild AI Series C ($1.4B at $14B+, Jan 14, 2026) — explicit "one brain" single-model bet
- Dyna Robotics DYNA-1 — single foundation model at 99.4% success
- Figure Helix 02 — single VLA, hardware-coupled, not routed
- Genie Centurion (May 2025) — explicit teleop scaling problem ("1 robot-hour = 1 human-hour")
Newer hits added in second pass
- 🎯 RoboRouter (Mar 9, 2026) — direct hit on Karina's thesis, +13% real-world success vs best individual policy
- SwiftBot (Mar 7, 2026) — federated LLM-powered robotic task execution
- LLM-MCoX — multi-robot LLM router
- Gemini Robotics-ER 1.6 (Apr 14, 2026) — Google production multi-model robot inference
- Boston Dynamics × DeepMind (Jan 5, 2026) — hardware × frontier-model coupling
- Figure Helix 02 (Jan 27, 2026) — full-body autonomy, dishwasher demo
- GTC 2026 keynote (Mar 18, 2026) — Cosmos 3, GR00T N1.7, Alpamayo 1.5
- AgiBot ACoT-VLA — China-side chain-of-thought VLA, SOTA on LIBERO 98.5%
- GigaBrain-0 — 2.5k★ Chinese OSS robot foundation model
- Unitree UnifoLM-X1-0 — embodied AI on G1 in Hangzhou factory; data flywheel from factory deployment shipping
- Genie Centurion (May 2025) — teleop scaling problem canonical citation
- CDF-Glove (Mar 2026) — high-quality teleop demos as primary IL bottleneck
- Meta FAIR OSMO tactile glove (Dec 2025) — 12 three-axis sensors
- RoboArena (CoRL 2025 oral) — community-run real-world VLA benchmark on DROID