translated from: https://blog.csdn.net/qq_42731705/article/details/129420739

Intro example

when given different measurement devices, how can we obtain the diameter of a coin?

Test 1: Measuring tape

we can use the measuring tape to measure the diameter of the radius twice, measurements $z_{1}$ and $z_{2}$ . Taking the average gives the estimate

z_{e} = 25 mm

Test 2: measuring tape + ruler of different resolutions

To obtain the estimate $z_{e}$ would we still average the measurements?

Test 3: measuring tape and micro meter

In this case, would we still take one measurement each and take the average? Even though the micrometer’s resolution is much higher than the measuring tape? (1mm vs 0.001mm)

Intuition: No

Obviously, since sensor precisions differ, we can no longer simply average the readings from different sensors. Because, neglecting systematic error, a micrometer’s measurement is clearly more accurate than that of a tape measure.

So how can we obtain the best estimate of the coin’s diameter from measurements made by different sensors? That’s where the data‐fusion methods described later come in!

Data fusion

What does Data fusion do?

Using different sensors to achieve the best estimate of the system. In the example above, it would be to use different sensors (measuring devices) to give a best estimation of the coin’s diameter.

Prerequisite for Data Fusion — Uncertainty

No sensor is perfectly accurate, nor is there any measurement process that is completely error‑free. In other words, every single measurement comes with some uncertainty. When we ignore systematic errors, a higher‑precision sensor will have a smaller measurement uncertainty. For example, in the case above, the measurement uncertainty of the micrometer is much smaller than that of the tape measure.

In statistics, uncertainty is expressed in terms of standard deviation, variance, and the covariance matrix. Clearly, in the previous example, when computing a weighted average of measurements from different sensors, we should take each sensor’s uncertainty into account: the smaller the uncertainty, the larger the weight its measurement should carry, because it is more accurate. If we know the standard deviation of the measurements produced by each sensor, we should be able to derive a weight $k$ in order to assign different weights (importance/trustworthiness) to each measurement.

Result of Data Fusion: The Statistically Optimal Estimate

Every measurement carries uncertainty, so from a statistical standpoint our measurement is a random variable, and the final estimate of the system state is also a random variable. The optimal data‐fusion result, therefore, should minimize variance; for a multivariate random variable this means achieving the smallest possible trace of the covariance matrix.

Hence, when we derive the weighted average of the measurements from different sensors below, our goal is to choose weights so that the resulting weighted average has the minimum variance.

Optimal Estimation: fusion result with the minimum uncertainty, i.e minimum $σ, σ^{2}, P$

Example

Suppose we have measurements:

z_{1} = z_{2} = 24 mm, σ_{1} = 1 mm 25.003 mm, σ_{2} = 0.001 mm

goal: to find value $k$ such that the estimation has minimal uncertainty

z_{e} = (1 - k) z_{1} + k z_{2}

ans: recall that

σ_{z_{e}}^{2} = = = Va r [(1 - l) z_{1} + k z_{2}] Va r [(1 - k) z_{1}] + Va r [k z_{2}] (1 - k)^{2} Va r [z_{1}] + k^{2} Va r [z_{2}]

due to properties of variance, we have

σ_{z_{e}}^{2} = (1 - k)^{2} σ_{1}^{2} + k^{2} σ_{2}^{2}

since $σ_{1}$ and $σ_{2}$ are constants in this case, we can differentiate $σ_{z_{e}}$ with respect to $k$ to find the minima.

\frac{d σ _{z_{e}}^{2}}{d k} = 0 = k = - 2 (1 - k) σ_{1}^{2} + 2 k σ_{2}^{2} - 2 (1 - k) σ_{1}^{2} + 2 k σ_{2}^{2} \frac{σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}}

The result is easy to interpret in two scenarios:

If $σ_{1}^{2}$ is very large, then $k$ tends toward 1, and the fused result approaches $z_{2}$ . This makes sense: measurement 1 has a large variance (i.e., is less accurate), so we lean on measurement 2.
If $σ_{2}^{2}$ is very large, then $k$ tends toward 0, and the fused result approaches $z_{1}$ . The reasoning is analogous. Plugging our derived formula into the previous example of measuring the coin’s diameter with a tape measure and a micrometer, we find that the final fused estimate lies very close to the micrometer’s reading, which is exactly the desirable outcome.

State Space Representation

In robotic state‑estimation problems, it’s not simply a matter of taking two sensor measurements and fusing them. Instead, one usually builds a motion model of the system that lets us predict its state. When new sensor data arrives, we update that prediction based on the measurements, yielding a more accurate estimate of the system state.

This process is precisely the system’s state‑space formulation. In the Kalman Filter, it’s broken into two parts: the state equation and the observation (measurement) equation.

State equation

We construct a mathematical model of the system’s physics. For example, in SLAM, if we assume the robot travels at constant velocity or under constant acceleration, then our models become the constant‑velocity model or the constant‑acceleration model, respectively.

In short, the state equation is something we calculate or derive. Given the system state at the previous time step, we use the state equation to compute the current system state, this constitutes our prediction.

Another point to note, as mentioned in the Data Fusion section, is that both measurements and our mathematical model carry uncertainty. For instance, our model may be imperfect, so the state equation is subject to noise. In the Kalman Filter, we assume this noise is Gaussian, an essential premise for deriving the Kalman Gain.

x_{k} = A x_{k - 1} + B u_{k - 1} + w_{k - 1},

$A$ : state‑transition matrix
$B$ : control‑input matrix
$u_{k - 1}$ : control vector
$w_{k - 1}$ : process noise, assumed to be drawn from multivariate normal distribution with zero‑mean and covariance $Q$

$A$ : State‑Transition Matrix

What it is: A matrix that encodes your system’s built‑in dynamics—how the state moves forward in time if there were no external inputs or noise.
Role in the equation: In

x_{k} = A x_{k - 1} + B u_{k - 1} + w_{k - 1},

the term $A x_{k - 1}$ is your prediction of the new state $x_{k}$ based sData Fusion and Kalman Filterolely on the old one.

Intuition:
If your state is just a 1D position and you assume it doesn’t change by itself, $A = 1$ .
If your state is $[position velocity]$ under a constant‑velocity model, then with timestep $Δ t$ :

A = [10 Δ t 1],

because (from kinematics)

pos_{k} = pos_{k - 1} + Δ t vel_{k - 1},

and (from constant velocity assumption)

vel_{k} = vel_{k - 1} .

Details: suppose our state $x_{i} = [p_{i} v_{i}]$ at some timestep $i$ , then

x_{k} = = = = A \cdot x_{k - 1} [10 Δ t 1] \cdot [p_{k - 1} v_{k - 1}] [1 \cdot p_{k - 1} + Δ t \cdot v_{k - 1} 0 \cdot p_{k - 1} + 1 \cdot v_{k - 1}] [p_{k - 1} + Δ t v_{k - 1} v_{k - 1}]

which matches above.

Q: So we have to derive $A$ ourselves every time we need to apply a kalman filter?

$B$ :Control‑Input Matrix

What it is: A matrix that describes how your known inputs or controls $u$ (e.g. commanded acceleration, wheel‑encoder speeds) push the state forward.
Role in the equation: In

x_{k} = A x_{k - 1} + B u_{k - 1} + w_{k - 1},

the term $B u_{k - 1}$ injects the effect of your control actions.

Intuition:
If your control $u$ is a direct velocity command in a 1D position‑only model, you might set

A = 1, B = Δ t,

so that $B u = Δ t \cdot (velocity)$ updates your position.

In the 2‑state $[pos; vel]$ constant‑acceleration model, if $u$ is acceleration $a$ , you’d choose

B = [\frac{1}{2} Δ t^{2} Δ t],

because

pos increment = \frac{1}{2} a Δ t^{2}, vel increment = a Δ t .

Summary

$A$ tells you “where the system would go on its own.”
$B$ tells you “how external commands or controls nudge it.”

Observation Equation

This is just like our earlier example of measuring the coin’s diameter: it describes how we observe the system’s state. In SLAM, however, the observation is often indirect and mediated by an observation model. For instance, in visual‑inertial odometry (VIO) we want to estimate the system’s 6 DOF pose, but what the camera actually gives us are pixel coordinates of feature points, not the pose itself. Those pixel measurements are related to the 6 DOF state through the camera’s projection model. In other words, the projection model ties the actual observed pixel values to the true pose, and that relationship is our observation equation.

In summary, although the observation equation doesn’t measure the system state directly, it indirectly measures it via the observation model, so it still constitutes a measurement of the state.

Likewise, observations are noisy: feature‐point locations may be imprecise, and the projection model itself might not be perfect. Therefore we add Gaussian noise to the observation equation, and it must be Gaussian, because that assumption is necessary for deriving the Kalman Gain.

Intuition: At each time step $k$ , you get a new sensor reading $z_{k}$ . The Kalman‐filter observation equation

z_{k} = H_{k} x_{k} + v_{k}

says that:

$x_{k}$ is the true system state at time $k$ (e.g.\ position/velocity, 6‑DOF pose, etc.).
$H_{k}$ is the observation matrix (or model) that tells you how the state maps into whatever your sensor actually measures.

If your state is $[pos vel]$ but your sensor only reads position, then

H = [10]

so that $z_{k} = [10] [pos_{k} vel_{k}] = pos_{k}$ .

In a camera case, $H_{k}$ would be the projection from 3‑D pose into pixel coordinates.

$v_{k}$ is the measurement noise, capturing all the ways your sensor might err (electronic noise, quantization, feature‐extraction error, etc.). We model it as

v_{k} \sim N (0, R_{k}),

meaning zero‑mean Gaussian with covariance $R_{k}$ . The size and shape of $R_{k}$ encode how much trust you place in that sensor—and whether different measurements are correlated.

Example of System State-Space Equations

Suppose we have a small car equipped with:

A single‑line laser rangefinder that measures its distance from the starting point, and
A wheel encoder that measures the wheel’s speed. Our goal is to estimate the car’s state, so we can write its state-space model like so:

State Equation

We want to arrive at something in this form

x_{k} = A x_{k - 1} + B u_{k - 1} + w_{k - 1}

Start: Recall the discrete‐time kinematic update:

x_{k} = x_{k - 1} + (true velocity) \times Δ t

but we only measure velocity with noise:

$v_{k - 1}$ the measured velocity at time $k - 1$ ,
$w_{k - 1}$ the zero‐mean process noise on that velocity.

v_{true} = v_{k - 1} - w_{k - 1} .

Plug that into the kinematic step:

x_{k} = x_{k - 1} + v_{true} Δ t = x_{k - 1} + (v_{k - 1} - w_{k - 1}) Δ t .

After distributing the $Δ t$ ,

x_{k} = x_{k - 1} + v_{k - 1} Δ t - w_{k - 1} Δ t

We can then match it to

x_{k} = A x_{k - 1} + B u_{k - 1} - W w_{k - 1},

since $x_{k - 1} = A x_{k - 1}$ , it follows that $A = 1$
likewise $B = Δ t$ , and $W = - Δ t$
and $w_{k - 1} \sim N (0, Q) .$

Note: You always start with your true (possibly nonlinear) update, then group all deterministic terms into $A x + B u$ and bundle every approximation or uncertainty into the additive “noise” term $W w$ .

Observation Equation

z_{k} = H x_{k} + v_{k}

In our running example the sensor reads position directly, so

z_{k} = x_{k} + v_{k}, v_{k} \sim N (0, R),

and hence $H = 1$ .

Note: “starting point” for your observation equation always comes from your sensor model. In other words, from how the sensor actually measures the thing you care about. In general you write:

z_{k} = h (x_{k}) + v_{k},

where:

$h (x_{k})$ is the true (possibly nonlinear) mapping from state → sensor reading,
$v_{k}$ is the additive measurement noise.

Summary

$A = 1$ : With no inputs and no noise you’d stay in place.
$B = Δ t$ : Converts encoder‐measured velocity into position change.
$w$ : Process‐noise on the motion model (e.g. wheel slip) enters through $W = - Δ t$ .
$H = 1$ : The laser directly measures position.
$v$ : Measurement‐noise on the laser reading.

Note: In the setup above, we put the wheel‑encoder measurement into the state equation (as the control input $u$ ) and the laser measurement into the observation equation. You might wonder: aren’t both sensors just making observations? Why not treat the wheel encoder like any other measurement? In fact, these two choices are mathematically equivalent.

Wheel encoder in the state equation:
- You model its noise as part of the process noise $w$ acting on the control input $u$ .
Wheel encoder in the observation equation:
- You set $u = 0$ in the state equation and propagate velocity purely via the previous state, then treat the encoder reading as a second measurement $z$ that constrains the velocity in the observation equation.

Both formulations impose the same information on the filter; they simply differ in whether you view the encoder as an “input” or a “measurement.”

Kalman Filter

What does a Kalman filter do?

Despite its name, the Kalman Filter isn’t really a “filter” in the signal‐processing sense but a state estimator. Its job is to combine your state equation

x_{k} = A x_{k - 1} + B u_{k - 1} + w_{k - 1}, w_{k - 1} \sim N (0, Q)

and your observation equation

z_{k} = H x_{k} + v_{k}, v_{k} \sim N (0, R)

to produce the statistically optimal estimate of the true state $x_{k}$ (i.e.\ minimum‑variance unbiased).

Two Key Assumptions

Linearity: Both the state‐transition model and the observation model must be linear in the state variable—hence the constant matrices $A$ and $H$ .
Gaussian Noise: Both the process noise $w$ and the measurement noise $v$ are assumed zero‑mean Gaussian. Only under these two conditions does the algebraic derivation of the Kalman Gain guarantee that the fused estimate achieves the minimum possible error covariance in a purely statistical sense.

Kalman Fitler Intuition

Because we have two noisy estimates of the same state

$\overset{x}{^}_{k}$ from the model (state equation), and
$x_{k}^{(m)}$ from the sensor (observation equation) we fuse them exactly as we did when averaging two coin‐diameter measurements, choosing weights to minimize the final variance.

Derivation

State equation (the “calculated” or predicted estimate)

x_{k} = A x_{k - 1} + B u_{k - 1} + w_{k - 1} .

Since $w_{k - 1}$ is zero‐mean, we replace it by its expectation (zero) when forming the prior estimate:

\overset{x}{^}_{k} = A x_{k - 1} + B u_{k - 1} .

Observation equation (the “measured” estimate)

z_{k} = H x_{k} + v_{k} .

Likewise, $v_{k}$ has zero mean, so to get a direct state estimate from the measurement we momentarily ignore $v_{k}$ and invert $H$ :

x_{k}^{(m)} = H^{- 1} z_{k} .

Why “zero out” the noises and invert $H$ ?

Zero‐mean noise: In reality $w$ and $v$ aren’t zero, but we only know their statistics. If we had the actual noise values we wouldn’t need a filter, we’d just plug them in. Hence in the derivation we substitute the mean of each noise term (zero).
Inverting $H$ : This step is purely for intuition, to view the sensor’s reading as a “direct” state estimate. The final Kalman‐filter update never explicitly computes $H^{- 1}$ ; instead it uses a gain matrix that blends prediction and measurement optimally. Applying the data‑fusion idea, we simply form a weighted average of the predicted state and the measured state, like this:

{z_{1} = 24 mm, σ_{1} = 1 mm z_{2} = 25.003 mm, σ_{2} = 0.001 mm

z_{e} = 0.5 z_{1} + 0.5 z_{2} ⟹ z_{e} = (1 - k) z_{1} + k z_{2}

σ_{e}^{2} = Var [(1 - k) z_{1} + k z_{2}] = (1 - k)^{2} σ_{1}^{2} + k^{2} σ_{2}^{2}

\frac{d σ _{e}^{2}}{d k} = - 2 (1 - k) σ_{1}^{2} + 2 k σ_{2}^{2} = 0 ⟹ k = \frac{σ _{1}^{2}}{σ _{1}^{2} + σ _{2}^{2}}

\tilde{x}_{k} = (I - G) \overset{x}{^}_{k} + G x_{k}^{m}

Kalman Filter Task: Using the State equation and Observation equation to get the optimal state estimation

State equation (calculated):

\overset{x}{^}_{k} = A x_{k - 1} + B u_{k - 1}

Observation equation (measured):

z_{k} = H x_{k}^{m} ⟹ x_{k}^{m} = H^{- 1} z_{k}

By algebraically manipulating that weighted‑average formula, eliminating the need to explicitly invert $H$ :

We start with the weighted-average fusion:

\tilde{x}_{k} = (I - G) \overset{x}{^}_{k} + G x_{k}^{m}

$\overset{x}{^}_{k}$ is your prior (predicted) state,
$x_{k}^{m}$ is your measured state (i.e.\ $H^{- 1} z_{k}$ ),
$G$ is the matrix of weights you’ll choose (analogous to the scalar $k$ in the coin example).
$I - G$ is the complementary weight on the prior.

We then substitute in the direct measurement form, since $x_{k}^{m} = H^{- 1} z_{k}$ ,

\tilde{x}_{k} = (I - G) \overset{x}{^}_{k} + G (H^{- 1} z_{k}) .

Regroup to expose the “innovation” $(z_{k} - H \overset{x}{^}_{k})$ . Add and subtract $G H^{- 1} H \overset{x}{^}_{k}$ inside:

\tilde{x}_{k} = (I - G) \overset{x}{^}_{k} + G H^{- 1} z_{k} = \overset{x}{^}_{k} - G \overset{x}{^}_{k} + G H^{- 1} z_{k} = \overset{x}{^}_{k} + G (H^{- 1} z_{k} - \overset{x}{^}_{k})

Notice that $H^{- 1} z_{k} - \overset{x}{^}_{k} = H^{- 1} (z_{k} - H \overset{x}{^}_{k}) .$

= \overset{x}{^}_{k} + G H^{- 1} (z_{k} - H \overset{x}{^}_{k}) .

The vector $(z_{k} - H \overset{x}{^}_{k})$ is called the innovation or measurement residual—it’s how much the actual sensor reading $z_{k}$ differs from what your prediction $H \overset{x}{^}_{k}$ would imply.

Define the Kalman gain $K$ , let

K = G H^{- 1} .

Then the update becomes the classic Kalman‐filter form:

\tilde{x}_{k} = \overset{x}{^}_{k} + K (z_{k} - H \overset{x}{^}_{k}) .

$\overset{x}{^}_{k}$ is your prior,
$z_{k} - H \overset{x}{^}_{k}$ is the innovation,
$K$ is the gain that determines how much of that innovation you trust.

Summary

You start by imagining “let’s just weight‐average” prediction vs. (inverted) measurement.
Realize that weighting the direct measurement $H^{- 1} z_{k}$ is equivalent to adding a correction proportional to the difference between what you saw $(z_{k})$ and what you expected $(H \overset{x}{^}_{k})$ .
That proportionality matrix is exactly the Kalman gain $K$ .

Choosing $K$ to minimize the posterior covariance is what makes the Kalman filter “optimal” in the minimum‑variance sense.

By applying the data‐fusion idea, our ultimate goal is to find a suitable weighting coefficient so that the final weighted‐average result has the smallest variance. For a multivariate random variable, this means minimizing the trace of its covariance matrix. As shown below: 2. Posterior update (weighted‐average fusion)

\tilde{x}_{k} = \overset{x}{^}_{k} + K (z_{k} - H \overset{x}{^}_{k}) .

$\overset{x}{^}_{k}$ : prior (predicted) state
$z_{k}$ : new measurement
$H$ : observation matrix
$K$ : gain matrix (to be determined)

Posterior covariance Denote the resulting covariance of $\tilde{x}_{k}$ by $\tilde{P}_{k}$ .
Optimization criterion $K = K ar g min tr (\tilde{P}_{k}),$ i.e. choose $K$ to make the trace of the posterior covariance as small as possible.
Covariances in play
- Prediction covariance: $\hat{P}_{k} = A \tilde{P}_{k - 1} A^{T} + Q$
- Measurement covariance: $R$ Putting it all together, the Kalman Gain $K$ is the weight that minimizes $tr (\tilde{P}_{k})$ .

Note:

The covariance of the predicted state, $\hat{P}_{k}$ , is obtained by propagating the previous covariance through the state equation.
The covariance for the observation equation, since we haven’t transformed measurements from observation space into state space (i.e. we didn’t compute any pseudo‑inverse of $H$ ), is simply the observation noise covariance $R$ .

Prediction covariance derivation: We have the true state update

x_{k} = A x_{k - 1} + B u_{k - 1} + w_{k - 1}, w_{k - 1} \sim N (0, Q),

and the prior (predicted) estimate

\overset{x}{^}_{k} = A \tilde{x}_{k - 1} + B u_{k - 1} .

Define the prediction error

ε_{k} = x_{k} - \overset{x}{^}_{k} .

Propagate the error, substitute $x_{k}$ from the state‐equation:

ε_{k} = (A x_{k - 1} + B u_{k - 1} + w_{k - 1}) - (A \tilde{x}_{k - 1} + B u_{k - 1}) = A (x_{k - 1} - \tilde{x}_{k - 1}) + w_{k - 1} = A ε_{k - 1} + w_{k - 1} .

Compute the prior covariance: By definition: In general, for any random column‐vector $y$ we define its covariance as

Cov (y) = E [(y - E [y]) (y - E [y])^{T}] .

Since

We let $ε_{k} = x_{k} - \overset{x}{^}_{k}$ be the prediction error.
Under the usual Kalman assumptions (zero‐mean process and measurement noise, and an unbiased filter), that error itself has zero mean, i.e.\ $E [ε_{k}] = 0$ . Hence its covariance simplifies to

Cov (ε_{k}) = E [ε_{k} ε_{k}^{T}],

Thus we arrive at:

\hat{P}_{k} = Cov (ε_{k}) = E [ε_{k} ε_{k}^{T}] .

Using $ε_{k} = A ε_{k - 1} + w_{k - 1}$ and the facts that $ε_{k - 1}$ and $w_{k - 1}$ are uncorrelated, and $Cov (w_{k - 1}) = Q$ , we get

\hat{P}_{k} = E [(A ε_{k - 1} + w_{k - 1}) (A ε_{k - 1} + w_{k - 1})^{T}] = A E [ε_{k - 1} ε_{k - 1}^{T}] A^{T} + E [w_{k - 1} w_{k - 1}^{T}] = A \tilde{P}_{k - 1} A^{T} + Q .

Measurement covariance derivation Your observation equation is

z_{k} = H x_{k} + v_{k}, v_{k} \sim N (0, R) .

Here, $R$ is defined to be the covariance of the measurement noise $v_{k}$ . Thus any time you fuse or weight in a new measurement, the amount of uncertainty that measurement brings in is exactly $R$ .

Kalman Gain

is very complicated lol, for full derivation: https://www.bilibili.com/video/BV1hC4y1b7K7/?spm_id_from=333.788&vd_source=1363e3b30e51ca9984f82492949f865b or https://blog.csdn.net/qq_42731705/article/details/129423983

For now we just use the result:

K = \frac{P ^ _{k} H ^{T}}{H P ^ _{k} H ^{T} + R} ⟺ K = \hat{P}_{k} H^{T} (H \hat{P}_{k} H^{T} + R)^{- 1} .

This choice of $K$ is exactly the weight matrix that minimizes the trace of the posterior covariance. 2. Posterior Covariance Update

\tilde{P}_{k} = (I - K H) \hat{P}_{k} .

Together, these two equations complete the Kalman‐filter “measurement‐update” step:

You compute $K$ from your prior covariance $\hat{P}_{k}$ , the observation model $H$ , and the measurement‐noise covariance $R$ .
You then update the covariance to $\tilde{P}_{k}$ , which is guaranteed to be the minimum‐variance (i.e. minimum trace) posterior under the linear‑Gaussian assumptions.

Summary

State‐Equation Prediction: Using the previous time step’s state and the system’s state‐space equation, we compute a prediction of the current state. At the same time, we update the covariance matrix of that predicted state: Definitions of all symbols assume $x_{k} \in R^{n}$ , $u_{k} \in R^{m}$ , $z_{k} \in R^{p}$

(1) Prior estimation

\overset{x}{^}_{k} = A \tilde{x}_{k - 1} + B u_{k - 1},

$\overset{x}{^}_{k} \in R^{n}$ : prior (predicted) state estimate at time $k$ .
$A \in R^{n \times n}$ : state‑transition matrix.
$\tilde{x}_{k - 1} \in R^{n}$ : posterior state estimate at time $k - 1$ .
$B \in R^{n \times m}$ : control‑input matrix.
$u_{k - 1} \in R^{m}$ : control (input) vector at time $k - 1$ .

(2) Prior covariance

\hat{P}_{k} = A \tilde{P}_{k - 1} A^{T} + Q .

$\hat{P}_{k} \in R^{n \times n}$ : prior error‐covariance at time $k$ .
$\tilde{P}_{k - 1} \in R^{n \times n}$ : posterior covariance at time $k - 1$ .
$Q \in R^{n \times n}$ : process‐noise covariance.

Observation‐Equation Update: First compute the Kalman Gain $K$ . Then treat $K$ as the weight to fuse the predicted state with the sensor’s measurement, yielding the optimal state estimate: (3) Kalman Gain

K = \frac{P ^ _{k} H ^{T}}{H P ^ _{k} H ^{T} + R} ⟺ K = \hat{P}_{k} H^{T} (H \hat{P}_{k} H^{T} + R)^{- 1}

$K \in R^{n \times p}$ : Kalman‐gain matrix.
$\hat{P}_{k} \in R^{n \times n}$ : prior covariance (from (2)).
$H \in R^{p \times n}$ : observation matrix.
$H^{T} \in R^{n \times p}$ : transpose of $H$ .
$R \in R^{p \times p}$ : measurement‐noise covariance.

(4) Posteriori (updated) state estimate

\tilde{x}_{k} = \overset{x}{^}_{k} + K (z_{k} - H \overset{x}{^}_{k})

$\tilde{x}_{k}$ : Posterior (updated) estimate of the state at time $k$ .
$\overset{x}{^}_{k}$ : Prior estimate (from Eq 1).
$K$ : Kalman gain (from Eq 3).
$z_{k}$ : Measurement vector at time $k$ .
$H$ : Observation matrix.

(5) Posteriori covariance

\tilde{P}_{k} = (I - K H) \hat{P}_{k}

$\tilde{x}_{k} \in R^{n}$ : posterior (updated) state estimate at time $k$ .
$\overset{x}{^}_{k} \in R^{n}$ : prior state estimate (from (1)).
$K \in R^{n \times p}$ : Kalman gain (from (3)).
$z_{k} \in R^{p}$ : measurement vector.
$H \in R^{p \times n}$ : observation matrix. Notice that the prediction is in units of the system state, while the measurement is in the sensor’s native units. Therefore the Kalman Gain $K$ carries physical units, it has the same units as $H^{- 1}$ .

Learning SLAM

Explorer

Data Fusion and Kalman Filter

Intro example

Test 1: Measuring tape

Test 2: measuring tape + ruler of different resolutions

Test 3: measuring tape and micro meter

Intuition: No

Data fusion

What does Data fusion do?

Prerequisite for Data Fusion — Uncertainty

Result of Data Fusion: The Statistically Optimal Estimate

Example

State Space Representation

State equation

$A$ : State‑Transition Matrix

$B$ :Control‑Input Matrix

Summary

Observation Equation

Example of System State-Space Equations

State Equation

Observation Equation

Summary

Kalman Filter

What does a Kalman filter do?

Two Key Assumptions

Kalman Fitler Intuition

Why “zero out” the noises and invert $H$ ?

Kalman Gain

Summary

Graph View

Table of Contents

Backlinks

Learning SLAM

Explorer

Data Fusion and Kalman Filter

Intro example

Test 1: Measuring tape

Test 2: measuring tape + ruler of different resolutions

Test 3: measuring tape and micro meter

Intuition: No

Data fusion

What does Data fusion do?

Prerequisite for Data Fusion — Uncertainty

Result of Data Fusion: The Statistically Optimal Estimate

Example

State Space Representation

State equation

A: State‑Transition Matrix

B:Control‑Input Matrix

Summary

Observation Equation

Example of System State-Space Equations

State Equation

Observation Equation

Summary

Kalman Filter

What does a Kalman filter do?

Two Key Assumptions

Kalman Fitler Intuition

Why “zero out” the noises and invert H?

Kalman Gain

Summary

Graph View

Table of Contents

Backlinks

$A$ : State‑Transition Matrix

$B$ :Control‑Input Matrix

Why “zero out” the noises and invert $H$ ?