Data Assimilation
- Description: Mathematical methods for combining a dynamical model's forecast with observations to produce a best-estimate trajectory — sequential (Kalman filter family, ensemble Kalman filter, particle filter) vs variational (3D-Var, 4D-Var). Widely used in weather forecasting, oceanography, reservoir engineering, and SLAM.
- My Notion Note ID: K2C-1-3
- Created: 2020-06-01
- Updated: 2026-05-22
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1. The Problem
- 2. The Bayesian Frame
- 3. Sequential Methods
- 4. Variational Methods
- 5. Hybrid Methods
- 6. Connection to Filtering and SLAM
- 7. Pitfalls
- 8. References
1. The Problem
- You have a dynamical model (set of ODEs/PDEs) for a physical system — e.g., the atmosphere, the ocean, an underground reservoir, a vehicle's pose.
- You also have noisy observations of some of the state variables, scattered in time and space.
- Neither alone tells the whole story:
- The model drifts: small initial errors compound, parameterizations are imperfect.
- The observations are sparse, noisy, and only see some variables — never the full state.
- Data assimilation = combine the two. Produce a trajectory that is consistent with both the model's dynamics and the observed data, weighted by their respective uncertainties.
Two large algorithm families:
| Family | Idea | Examples |
|---|---|---|
| Sequential | Update the state estimate one observation (or batch) at a time as observations come in. Maintain a posterior estimate + uncertainty. | Kalman filter, Extended KF, Ensemble KF, Particle filter. |
| Variational | Minimize a cost functional over a time window combining model fit + observation fit. One optimization per assimilation window. | 3D-Var, 4D-Var. |
Hybrids combine elements of both (EnVar, 4DEnVar, hybrid 4D-Var).
2. The Bayesian Frame
Both families derive from Bayesian estimation:
p(x_t | y_{1:t}) ∝ p(y_t | x_t) · p(x_t | y_{1:t-1})
Where:
x_t— state at timet.y_{1:t}— observations up to timet.p(x_t | y_{1:t-1})— the forecast (prior), produced by running the model from the previous estimate.p(y_t | x_t)— the observation likelihood.p(x_t | y_{1:t})— the analysis (posterior).
Sequential methods compute the posterior recursively. Variational methods minimize the negative log-posterior over a window, with the model as a strong (4D-Var) or weak constraint.
3. Sequential Methods
3.1 Kalman Filter (KF)
- Optimal for linear dynamics + linear observation operator + Gaussian noise.
- Maintains
x̂(mean) andP(covariance) of the state. - Two-step cycle:
- Forecast:
x̂⁻ = F x̂;P⁻ = F P Fᵀ + Q(process noise covarianceQ). - Analysis: Kalman gain
K = P⁻ Hᵀ (H P⁻ Hᵀ + R)⁻¹;x̂ = x̂⁻ + K (y - H x̂⁻);P = (I - K H) P⁻(observation noise covarianceR, operatorH).
- Forecast:
- Cost: O(n³) per step for the inverse — fine for n in low thousands, infeasible for n = 10⁹ (weather model state size).
3.2 Extended Kalman Filter (EKF)
- For weakly nonlinear systems. Linearize
FandHaround the current estimate (Jacobians). - Used in early SLAM and navigation systems (GPS-INS fusion).
- Breaks down with strong nonlinearity or bimodal posteriors.
3.3 Ensemble Kalman Filter (EnKF)
- Represent the state distribution by an ensemble of N members (typically 50-200 in weather, 20-50 in ocean).
- Forecast = propagate each ensemble member through the (possibly nonlinear) model.
- Estimate covariance from the ensemble:
P ≈ (1/(N-1)) Σ (x_i - x̄)(x_i - x̄)ᵀ. - No tangent-linear or adjoint code needed — huge implementation win.
- Cost per step: O(N · model cost) for propagation + O(N · m²) for update with
mobservations. - Standard in operational weather forecasting (ECMWF, NCEP, JMA all run ensemble systems).
3.4 Particle Filter (Sequential Monte Carlo)
- Represent posterior as weighted set of particles. No Gaussian assumption.
- Forecast: propagate each particle.
- Analysis: reweight by observation likelihood; resample to avoid degeneracy.
- Excellent for strongly nonlinear / non-Gaussian problems.
- Suffers from "curse of dimensionality" — number of particles needed grows exponentially with state dimension. Rarely used at full state-vector scale; common for low-dim subproblems (e.g., target tracking).
4. Variational Methods
4.1 3D-Var
-
Single-time optimization. Find state
xminimizing:J(x) = (x - x_b)ᵀ B⁻¹ (x - x_b) + (y - H(x))ᵀ R⁻¹ (y - H(x))with
x_b= background (prior forecast),B= background error covariance,R= observation error covariance. -
The two terms penalize disagreement with the background and the observations respectively.
-
Solved with conjugate gradient or quasi-Newton (L-BFGS); requires
Hᵀ(adjoint of the observation operator) but not the model's tangent-linear/adjoint.
4.2 4D-Var
-
Like 3D-Var but the model is part of the constraint. Minimize over the initial state
x_0of a window:J(x_0) = (x_0 - x_b)ᵀ B⁻¹ (x_0 - x_b) + Σ_t (y_t - H_t M_{0,t}(x_0))ᵀ R_t⁻¹ (y_t - H_t M_{0,t}(x_0))where
M_{0,t}propagates the model fromt=0tot. -
Requires the adjoint model
Mᵀ— substantial software engineering effort to maintain alongside the forward model. -
Operational backbone at ECMWF, Met Office, Météo-France for global NWP since the early 2000s.
-
"Strong constraint" 4D-Var assumes the model is perfect inside the window; "weak constraint" 4D-Var adds a model-error term.
4.3 Sequential vs Variational
- Sequential advantages: incremental, no need to wait for a whole window, no adjoint required (EnKF). Disadvantage: only the present is updated.
- Variational advantages: uses all observations in the window self-consistently, propagates information backward in time through the adjoint. Disadvantage: adjoint code maintenance, batch latency.
5. Hybrid Methods
Modern operational systems combine both:
- EnVar / Hybrid-Var — use the ensemble to estimate the flow-dependent background error covariance
B, then plug into a variational solver. - 4DEnVar — 4D variational analysis with covariance localized via an ensemble; avoids the adjoint by using ensemble perturbations to estimate the tangent linear.
- ECMWF runs EDA + 4D-Var (Ensemble of Data Assimilations + deterministic 4D-Var). NCEP runs hybrid 4DEnVar.
6. Connection to Filtering and SLAM
The same math powers other fields under different names:
- Robotics / SLAM — pose-graph optimization is essentially weak-constraint 4D-Var. EKF-SLAM, Particle-filter-SLAM are sequential methods. Modern systems (ORB-SLAM, VINS-Fusion) use sliding-window bundle adjustment ≈ short-window 4D-Var.
- Reservoir engineering — history matching = 4D-Var; ensemble methods (EnKF, ES-MDA) widely used.
- Target tracking — Kalman variants; multi-target uses Joint Probabilistic Data Association or Multiple Hypothesis Tracking (combinatorial structure on top of Bayesian filtering).
- State-space modeling in econometrics / signal processing — same filtering theory; smaller scale.
7. Pitfalls
Bmatrix dimensionality — for a global weather model with state dim ~10⁹, storingBexplicitly is impossible (10¹⁸ entries). Operational systems use spectral / wavelet / ensemble-based representations.- Localization — long-range spurious correlations in small ensembles. Apply Schur-product localization (multiply ensemble covariance by a compactly supported function).
- Inflation — small ensembles underestimate variance; multiplicatively inflate the ensemble spread (typical factor 1.05-1.15).
- Observation bias — assumed-unbiased observations are usually biased. Bias correction (
VarBC) adds bias parameters to the variational cost. - Filter divergence — analysis becomes overconfident; observations get ignored; trajectory drifts. Often caused by underestimated
P(inadequate inflation) or missing model error. - Strongly non-Gaussian states (e.g., precipitation, sea ice concentration on [0,1]) — Gaussian-based methods misbehave. Use transformations (anamorphosis), Gamma observation operators, or particle methods.
- Adjoint maintenance — adjoint code drifts from the forward model unless tested rigorously (twin experiments, gradient checks).
- The two-camp confusion — in the source the original note distinguished sequential (序贯) — propagates state forward, updates as observations arrive — from variational (变分) — optimizes the whole window using all observations together. The distinction is real and explained more carefully above (§ 3 vs § 4).
8. References
- Asch, Bocquet, Nodet, Data Assimilation: Methods, Algorithms, and Applications, SIAM, 2016. (Modern textbook treatment.)
- Kalnay, Atmospheric Modeling, Data Assimilation and Predictability, Cambridge, 2003. (Classic.)
- Evensen, Data Assimilation: The Ensemble Kalman Filter, Springer, 2009.
- Particle filter for non-Gaussian DA — Van Leeuwen, "Particle filtering in geophysical systems," 2009.
- ECMWF operational DA documentation — https://www.ecmwf.int/en/research/data-assimilation
- Wikipedia: Data assimilation — https://en.wikipedia.org/wiki/Data_assimilation
- Wikipedia: Kalman filter — https://en.wikipedia.org/wiki/Kalman_filter
- Wikipedia: Ensemble Kalman filter — https://en.wikipedia.org/wiki/Ensemble_Kalman_filter
- For SLAM / bundle adjustment connection — Triggs et al., "Bundle Adjustment — A Modern Synthesis," 2000.