Zero-Shot Function Encoder-Based Differentiable Predictive Control
[arxiv] zero-shot-controldifferentiable-predictive-controlneural-odessystem-identificationadaptive-controlnonlinear-dynamical-systems
Zero-Shot Function Encoder-Based Differentiable Predictive Control
Authors: Hassan Iqbal, Xingjian Li, Tyler Ingebrand, Adam Thorpe, Krishna Kumar, Ufuk Topcu, Ján Drgoňa Year: 2026 Tags: differentiable-predictive-control, function-encoders, neural-odes, zero-shot-adaptation, nonlinear-control, system-identification
TL;DR
Combines function encoder-based neural ODEs (FE-NODE) for online system identification with differentiable predictive control (DPC) to train offline neural policies that adapt instantaneously to unseen parametric dynamics via closed-form least-squares coefficient inference. Demonstrated on four nonlinear benchmarks achieving 2.3–71.5× wall-clock speedup over white-box MPC at inference time while maintaining comparable tracking accuracy.
First pass — the five C's
Category. Research prototype: end-to-end learning-based control methodology.
Context. Nonlinear adaptive control + learning-based MPC. Builds directly on: Ingebrand et al. (2024a) — zero-shot neural ODE transfer via function encoders; Drgoňa et al. (2022, 2024) — self-supervised differentiable predictive control and parametric policy learning with guarantees; Chen et al. (2018a) — neural ODEs; Low et al. (2025) — Hilbert-space formalization and finite-sample error bounds O(R / λ√m) for function encoder approximation.
Correctness. Central assumptions: (1) the family of deployment dynamics lies within the Hilbert subspace spanned by B offline-learned basis functions — violated if test dynamics are out-of-distribution; (2) m = 100 online data points suffice for accurate coefficient estimation across all benchmarks — asserted, not analyzed; (3) the offline training distribution over coefficients c covers deployment scenarios; (4) FE-NODE differentiability is preserved well enough to produce useful DPC gradients. All four assumptions are plausible for the tested parametric families but are not stress-tested.
Contributions. - Zero-shot adaptive closed-loop control: parametric DPC policy conditioned on FE coefficients inferred online in closed form, requiring no retraining or online optimization at deployment. - Full differentiable pipeline: FE-NODE dynamics integrated with DPC enables end-to-end gradient-based policy learning over learned dynamics representations. - Empirical validation across four nonlinear benchmarks (Van der Pol, two-tank, 7D glycolytic oscillator, 12D quadrotor) including abrupt mid-simulation dynamics switching. - Open-source implementation in NeuroMANCER (code URL deferred to final version).
Clarity. Writing is well-organized and the three-algorithm structure maps cleanly to Figure 1; notation is consistent throughout, though the quadrotor control-input parameterization in Appendix A is dense and would benefit from a brief verbal gloss.
Second pass — content
Main thrust: Offline, train B neural ODE basis functions (FE-NODE) on trajectories from a parametric system family, then train a DPC neural policy conditioned on the resulting FE coefficient vector c; at deployment, estimate c from 100 online observations via regularized least squares and immediately evaluate the policy — no solver call, no gradient step.
Supporting evidence: - Van der Pol stabilization: FE-DPC MSE 0.002683 vs. WB-MPC 0.002653; inference 0.53 s vs. 1.21 s (2.3× speedup). - Two-tank reference tracking (700 steps, multiple switches): FE-DPC MSE 0.008452 vs. WB-MPC 0.004164; 1.13 s vs. 6.75 s (6.0× speedup). - Glycolytic oscillator (7D, stiff, horizon = 50): FE-DPC MSE 0.180299 vs. WB-MPC 0.032320; 5.89 s vs. 136.07 s (23.1× speedup). - Quadrotor (12D, 20 random parameterizations with abrupt switches): FE-DPC MSE 0.022003 vs. WB-MPC 0.024208; 1.93 s vs. 155.85 s (~80.7× speedup by table arithmetic, though the paper text states a maximum of 71.5×—a discrepancy not explained). - Theoretical error bound from Low et al. (2025): approximation error ≤ O(R / λ√m) for fixed basis, providing asymptotic grounding for online coefficient estimation.
Figures & tables: - Figure 1 (pipeline diagram): qualitative only; effectively communicates the three-algorithm flow; no quantitative content. - Figure 2 (Van der Pol): phase portraits and time-series for fixed and switching dynamics; axes labeled (x₁, x₂, time steps, u); no error bars or confidence intervals. - Figure 3 (two-tank): state and control trajectories over 700 steps with multiple switches; axes labeled; no statistical intervals. - Figure 4 (glycolytic oscillator, uncontrolled): 7-state trajectories over 1500 steps with log-scale y-axis for x₁; true vs. predicted overlaid; no error bars. - Figure 5 (glycolytic oscillator, controlled): side-by-side WB-MPC vs. FE-DPC; axes labeled; no confidence intervals; tracking offset visible in FE-DPC but not quantified beyond Table 1. - Figure 6 (quadrotor, 20 trajectories): all 20 runs overlaid; shows distributional robustness qualitatively; no error bands, no success-rate statistic. - Table 1: MSE and wall-clock time (seconds) for all four benchmarks; no standard deviations, no indication of how many runs were averaged.
Follow-up references: - Ingebrand et al. (2024a) — foundational zero-shot neural ODE transfer via function encoders; essential prerequisite for the dynamics modeling component. - Drgoňa et al. (2022, 2024) — DPC formulation and parametric policy learning with constraint guarantees; necessary for understanding the control half of the pipeline. - Low et al. (2025) — Hilbert-space structure and finite-sample error bounds for function encoders; supplies the theoretical backbone cited in Section 3.1. - Ingebrand et al. (2024b) — function encoders applied to zero-shot reinforcement learning; closest parallel application domain.
Third pass — critique
Implicit assumptions: - Offline training parameter distributions (ν ~ Pν, x₀ ~ Px₀, ξ ~ Pξ) exactly match or bound deployment distributions — if a new system lies outside these, the learned basis and policy may fail silently with no detection mechanism. - m = 100 online samples is universally sufficient regardless of system dimensionality or noise level — this is fixed for all four benchmarks without justification or sensitivity analysis. - Wall-clock speedup is meaningful: FE-DPC runs on an NVIDIA RTX 5090 GPU; CasADi-based WB-MPC hardware is not specified. If MPC runs on CPU, the reported speedups conflate algorithmic and hardware advantages. - Policy generalization during dynamics switching: the coefficient update is performed in a receding fashion using recent data, but whether and how quickly the 100-point window captures a new regime is not analyzed.
Missing context or citations: - No comparison to other adaptive or meta-learning control approaches (e.g., MAML-based policy adaptation, L1 adaptive control, GP-MPC with online model updates). - No comparison to SINDy-MPC or Koopman-based MPC, which are cited as related methods for system identification but never benchmarked. - Li et al. (2025b) — a closely related concurrent paper on zero-shot transferable parametric optimal control from overlapping authors — is cited but not compared experimentally. - No discussion of constraint satisfaction rates; figures show constraints are respected but no quantitative feasibility statistics are reported.
Possible experimental / analytical issues: - Table 1 reports single MSE values with no standard deviation, confidence intervals, or statement of how many evaluation episodes were averaged; reproducibility of the numbers is unclear. - The stated speedup range of "2.3 to 71.5×" in the text is inconsistent with Table 1: 155.85 s / 1.93 s ≈ 80.7× for the quadrotor, not ≤ 71.5×. This arithmetic discrepancy is unexplained. - Glycolytic oscillator MSE gap is 5.6× worse for FE-DPC (0.180 vs. 0.032); the paper characterizes this as acceptable due to speedup but does not analyze whether this error level meets any practical control tolerance. - No ablation on B (number of basis functions) or m (online sample count) is included in this paper; authors defer entirely to Ingebrand et al. (2025) ablations performed on different problems. - No formal closed-loop stability or constraint satisfaction guarantees are derived for the combined FE-DPC system; stability is demonstrated only empirically and only for the tested parameter ranges. - DPC training samples c from Pc (distribution over coefficients) rather than directly from ν ~ Pν; the relationship between these distributions and whether Pc truly covers all reachable c values is not verified.
Ideas for future work: - Derive formal closed-loop stability or safety guarantees (e.g., control Lyapunov functions or barrier certificates) for the FE-DPC combination, given that DPC alone admits some guarantees (Drgoňa et al., 2024). - Ablate sensitivity to m (online sample count) and B (basis size) on the benchmarks in this paper, and characterize the tradeoff between identification accuracy and inference latency. - Incorporate PCA-based dimensionality reduction of the dynamics space (mentioned as future work citing Low et al., 2025) to test whether compact representations maintain zero-shot accuracy at higher B without policy network blowup. - Benchmark on a hardware-normalized platform (e.g., both FE-DPC and MPC on CPU, or both on GPU) to isolate algorithmic speedup from hardware effects.
Methods
- function encoder (FE)
- neural ordinary differential equations (NODE)
- differentiable predictive control (DPC)
- regularized least squares coefficient inference
- Tikhonov regularization
- RK4 integration
- automatic differentiation
- adjoint method
- multi-layer perceptron (MLP) policy network
- Adam optimizer
Datasets
- Van der Pol oscillator
- two-tank level regulation system
- glycolytic oscillator (yeast glycolysis model)
- 12-dimensional quadrotor model
Claims
- The proposed FE-DPC framework achieves zero-shot adaptation to unseen dynamical systems without retraining or online reoptimization by conditioning control policies on function encoder coefficients.
- FE-DPC achieves inference speedups of 2.3 to 71.5 times over classical MPC across tested benchmarks while maintaining competitive tracking accuracy.
- Function encoder-based neural ODE basis functions can represent diverse nonlinear system dynamics as a linear combination, enabling closed-form online coefficient estimation from limited observations.
- The learned parametric control policies remain stable under abrupt online dynamics switches across all tested nonlinear benchmarks, including a 12-dimensional quadrotor model.
- The FE-NODE approximation is theoretically grounded, with approximation error bounded asymptotically at rate O(R / (lambda * sqrt(m))) given a fixed basis.