Oldies but goldies: MMSE estimator

In signal processing, a classic problem consists in estimating a signal, in the form of a complex column vector {x\in\mathbb{C}^{n_x}}, by observing a related signal {y\in \mathbb{C}^{n_x}}, which has been produced by multiplying the unknown {x} by a known matrix {H\in \mathbb{C}^{n_y \times n_x}} and adding noise {n\in\mathbb{C}^{n_y}}:

\displaystyle y = H x + n. \ \ \ \ \ (1)

1. Assumptions

The random signal {x} and noise {n} are independent of each other, Gaussian distributed with zero mean and covariance matrix {\Sigma_x:=\mathbb{E}[x x^{\dagger}]} and {\Sigma_n:=\mathbb{E}[n n^{\dagger}]}, respectively, where {.^\dagger} is the conjugate transpose. In symbols, {x\sim \mathcal{CN}(0,I)}, {n\sim \mathcal{CN}(0,\Sigma_n)}. To simplify computations, we further assume the signal to be uncorrelated, i.e., {\Sigma_x:=I}.

2. MMSE estimate

The minimum mean squared error (MMSE) estimate is that function {\widehat{x}^{\mathrm{MMSE}}} that guesses the unknown {x} upon observing {y} while minimizing the squared estimation error {\left\| x - \widehat{x}(y) \right\|^2}, computed in expectation across all possible realization of {x} and {y}:

\displaystyle \widehat{x}^{\mathrm{MMSE}} := \mathrm{argmin}_{\widehat{x}:\mathbb{C}^{n_y}\rightarrow \mathbb{C}^{n_x}} \mathbb{E}_{x,y} \left \| x - \widehat{x}(y) \right \|^2 \ \ \ \ \ (2).

As already discussed in another post, the MMSE estimate corresponds to the mean of unknown {x} given the observation {y}:

\displaystyle \widehat{x}^{\mathrm{MMSE}}(y)=\mathbb{E}[x|y]. \ \ \ \ \ (3)

Intuitively, the best (in the MMSE sense) guess for the unknown {x} is its expected value, conditioned on the observation.

2.1. A more specific formula

Since {x} and {y} are jointly Gaussian with zero mean, the MMSE estimate {\widehat{x}^{\mathrm{MMSE}}} can be computed explicitly as:

\displaystyle \widehat{x}^{\mathrm{MMSE}}(y)=\Sigma_{x,y} \Sigma_{y}^{-1} y \ \ \ \ \ (4)

We can further specialize the expression above by observing that:

\displaystyle \Sigma_{x,y} = \mathbb{E}\left[ x y^{\dagger} \right] = \mathbb{E}\left[ x x^{\dagger} H^{\dagger} + x n^{\dagger} \right] = H^{\dagger}, \ \ \ \ \ (5)

\displaystyle \Sigma_{y,y} = \mathbb{E} \left[ y y^{\dagger} \right] = \mathbb{E} \left[ (Hx+n)(x^{\dagger}H^{\dagger} + n^{\dagger}) \right] = H H^\dagger + \Sigma_n. \ \ \ \ \ (6)

Note that in both expressions above we exploited the fact that noise and signal are uncorrelated. By plugging (5) and (6) into (4), we obtain:

\displaystyle \widehat{x}^{\mathrm{MMSE}}(y)= H^{\dagger} \left( H H^\dagger + \Sigma_n \right)^{-1} y. \ \ \ \ \ (7)

3. Let’s simplify our life by whitening the noise

To further simplify (7) it is convenient to whiten the noise, i.e., to deal with an equivalent model where the noise covariance matrix is diagonal. This can be simply achieved by pre-multiplying {y=Hx+n} by the matrix {\sigma_n\Sigma_n^{-1/2}} (which exists since {\Sigma_n} is positive semi-definite, like for any covariance matrix), where {\sigma_n^2} is the noise power. Then, we obtain:

\displaystyle \widetilde{y} = \widetilde{H} x + \widetilde{n}

where {\widetilde{y}=\sigma_n\Sigma_n^{-1/2}y}, {\widetilde{H}=\sigma_n\Sigma_n^{-1/2}H}, {\widetilde{n}=\sigma_n\Sigma_n^{-1/2}n}. We can check that the equivalent noise {\widetilde{n}} is white by computing its covariance matrix:

\displaystyle \mathbb{E}\left[ \widetilde{n} \widetilde{n}^\dagger \right] = \mathbb{E}\left[ \sigma_n\Sigma_n^{-1/2}n n^\dagger\sigma_n \Sigma_n^{-1/2} \right]

\displaystyle \qquad = \sigma_n^2\Sigma_n^{-1/2} \Sigma_n \Sigma_n^{-1/2} = \sigma_n^2 \Sigma_n^{-1/2} \Sigma_n^{1/2} \Sigma_n^{1/2} \Sigma_n^{-1/2} = \sigma_n^2 I

We can then rewrite (7) in simpler terms as:

\displaystyle \widehat{x}^{\mathrm{MMSE}}(\widetilde y)=\widetilde H^{\dagger} \left( \widetilde H \widetilde H^\dagger + \sigma_n^2 I \right)^{-1} \widetilde y \ \ \ \ \ (8)

By invoking the matrix inversion lemma, we can equivalently write:

\displaystyle \widehat{x}^{\mathrm{MMSE}}(\widetilde y)= \left( \widetilde H^\dagger \widetilde H + \sigma_n^{2} I\right)^{-1} \widetilde H^{\dagger} \widetilde y \ \ \ \ \ (9)

4. Corner (but illuminating) cases

To develop a better understanding of how MMSE estimate works, it is interesting to investigate its behavior in a few extreme but important cases.

4.1. Negligible noise and {n_x=n_y}: Inverse

We start from the simplest sub-case: in the absence of noise, and assuming that signal and observation have the same size {n_x=n_y} and assuming {\widetilde H} invertible, then MMSE simply inverts {\widetilde H}:

\displaystyle \widehat{x}^{\mathrm{MMSE}}(\widetilde y) \overset{ \sigma_n\downarrow 0}{\longrightarrow} \widetilde H^{-1} (\widetilde H^\dagger)^{-1} \widetilde H^{\dagger}\widetilde y=\widetilde H^{-1} \widetilde y. \ \ \ \ \ (10)

4.2. Negligible noise: Pseudoinverse

A natural question arises: what if {n_x \ne n_y} instead, while noise still being small? In this case, (9) tends to:

\displaystyle \widehat{x}^{\mathrm{MMSE}}(\widetilde y) \overset{\sigma_n\downarrow 0}{\longrightarrow} \left(\widetilde H^\dagger \widetilde H \right)^{-1} \widetilde H^{\dagger} \widetilde y \ \ \ \ \ (11)

which is the pseudoinverse of {\widetilde H}.

In communication theory, this estimator is also called zero-forcer. In fact, if we pre-multiply the signal {y} by {\left(\widetilde H^\dagger \widetilde H \right)^{-1} \widetilde H^{\dagger}}, then the resulting {i}-th component only depends on the {i}-th input signal, hence eliminating any interference among different signal components:

\displaystyle \left( \widetilde H^\dagger \widetilde H \right)^{-1} \widetilde H^{\dagger} \widetilde y = x + \left( \widetilde H^\dagger \widetilde H \right)^{-1} \widetilde H^{\dagger} n. \ \ \ \ \ (12)

4.3. Uni-dimensional signal: Matched filter

Let us turn to another simple corner case: we assume the signal {x} to be uni-dimensional (while the observation is still multi-dimensional). In this case, we can write {y=hx+n} where we {h:=H} as it is a column vector. In this case, the MMSE estimate boils down to:

\displaystyle \widehat{x}^{\mathrm{MMSE}}(\widetilde y) = \frac{1}{\| h \| + \sigma_n^2} \, h^\dagger \widetilde y \ \ \ \ \ (13)

The signal {y} is projected onto {h} and appropriately scaled. This is called matched filter.

4.4. Orthogonal {H}: Matched filter

Let us now expand the case above by returning to the original multi-dimensional signal {x}, while assuming the matrix {\widetilde H} to be orthogonal: {\widetilde H^\dagger \widetilde H=I}, i.e., its columns are pairwise orthogonal. In this case, the MMSE estimate becomes:

\displaystyle \widehat{x}^{\mathrm{MMSE}}(\widetilde y) = \frac{1}{1+\sigma_n^{2}} \widetilde H^{\dagger} \widetilde y. \ \ \ \ \ (14)

In words, to estimate the {i}-th signal component {x_i}, the MMSE estimate projects the observation {y} onto the {i}-th column of {\widetilde H} and scales the result appropriately. Notice that (13) is indeed the special case of (14) when {n_x=1}.

4.5. Noise is overwhelming: No info

From (14),(13) it is apparent that, when the noise drowns the signal, the MMSE estimate tends to zero, whatever the observed signal is. This behavior is supported by intuition: when {y} does not bring any useful information on the signal {x}, the best estimate for {x} is our prior (unconditioned) information: {\mathbb{E}[x]=0}.

\displaystyle \widehat{x}^{\mathrm{MMSE}}(\widetilde y) \overset{ \sigma_n\uparrow \infty}{\longrightarrow} 0. \ \ \ \ \ (15)

References

[1] Kay, S. M. (1993). Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc..


Posted

in

by

Comments

Leave a comment