··

机器学习:高斯分布(其一):极大似然估计与无偏估计

Tip

本文经过Deepseek核对无误。

符号定义

  1. 数据矩阵定义: X=(x1x2xP)N×PX = \begin{pmatrix} x_1 & x_2 & \cdots & x_P \end{pmatrix}^{\top}_{N \times P}

  2. 约束条件:

    • xiRPx_i \in \mathbb{R}^P
    • xii.i.d.N(μ,Σ)x_i \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(\mu, \Sigma)
  3. 参数定义: θ=(μ,Σ)\theta = (\mu, \Sigma)

  4. 最大似然估计: MLE=argmaxθP(Xθ)\mathrm{MLE} = \underset{\theta}{\operatorname{argmax}} \mathcal{P}(X \mid \theta)

极大似然估计推导(一维)

Note

关于有偏估计与无偏估计将在下一小节中论证

P=1P=1θ=(μ,σ2)\theta=(\mu,\sigma^2) , p(x)=12πσ2exp((xμ)22σ2)p(x)=\frac{1}{\sqrt{2\pi}\sigma ^2}\exp(-\frac{(x-\mu)^2}{2\sigma^2}) 。由于 xii.i.d.N(μ,Σ)x_i \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(\mu, \Sigma) , 有 logP(Xθ)=logi=1Np(xi)\log{\mathcal{P}(X|\theta)} = \log{\prod \limits_{i=1}^N{\mathcal{p}(x_i)}} ,可做如下推导:

logP(Xθ)=logi=1Np(xiθ)=i=1Nlogp(xiθ)=i=1Nlog12πσexp((xiμ)22σ2)=i=1N[log12πlogσ(xμ)22μ2]\begin{align} \log{\mathcal{P}(X|\theta)} & = \log{\prod \limits_{i=1}^N{\mathcal{p}(x_i \mid \theta)}} \notag \\ & = \sum \limits_{i=1}^N{\log{\mathcal{p}(x_i \mid \theta)}} \notag \\ & = \sum \limits_{i=1}^N {\log{\frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{(x_i-\mu)^2}{2\sigma^2} )}} \notag \\ & = \sum \limits_{i=1}^N [{\log{\frac{1}{\sqrt{2\pi} } }}-\log{\sigma}-\frac{(x-\mu)^2}{2\mu^2} ] \tag{1} \end{align}

μMLE\mu_{\mathrm{MLE}} 推导

由(1)可得 μMLE=argmaxμP(Xθ)=argminμi=1N(xiμ)2\mu_{\mathrm{MLE}}=\underset{\mu}{\operatorname{argmax}}\mathcal{P}(X \mid \theta)=\underset{\mu}{\operatorname{argmin}} \sum \limits_{i=1}^N{(x_i-\mu)^2} 。此处我们令 μi=1N(xiμ)2=0\frac{\partial }{\partial \mu} { \sum \limits_{i=1}^N{(x_i-\mu)^2}} = 0 ,可做如下推导:

μi=1N(xiμ)2=i=1N2(xiμ)=0i=1N2(xiμ)=0μMLE=1Ni=1Nxi\begin{align} \frac{\partial }{\partial \mu} { \sum \limits_{i=1}^N{(x_i-\mu)^2}} & = \sum \limits_{i=1}^N{-2(x_i-\mu)} = 0 \notag \\ \sum \limits_{i=1}^N{-2(x_i-\mu)} & = 0 \notag \\ \Rightarrow \mu_\mathrm{MLE} &=\frac{1}{N} \sum \limits_{i=1}^N x_i \tag{2} \end{align}

此处 μMLE\mu_{\mathrm{MLE}}无偏

σMLE2\sigma^2_{\mathrm{MLE}} 推导

同理由(1)可得:

σMLE2=argmaxσP(Xθ)=argmaxσi=1Nlogσ(xiμMLE)22σLLσ=1σ+12(xiμMLE)2(2σ3)σMLE2=1Ni=1N(xiμMLE)2\begin{align} \sigma^2_{\mathrm{MLE}} & = \underset{\sigma}{\operatorname{argmax}}\mathcal{P}(X \mid \theta) \notag \\ & = \underset{\sigma}{\operatorname{argmax}} \sum \limits_{i=1}^N \underset{\mathcal{L}}{\underbrace{ -\log{\sigma}-{\frac{(x_i-\mu_{\mathrm{MLE}})^2}{2\sigma}}}} \notag \\ \notag \\ \frac{\partial \mathcal{L}}{\partial \sigma} &= -\frac{1}{\sigma}+\frac{1}{2}(x_i-\mu_{\mathrm{MLE}})^2(2\sigma^{-3}) \notag \\ \notag \\ & \Rightarrow \sigma^2_{\mathrm{MLE}} = \frac{1}{N}\sum \limits_{i=1}^N(x_i-\mu_{\mathrm{MLE}})^2 \tag{3} \end{align}

此处 σMLE\sigma_{\mathrm{MLE}}有偏

无偏估计相关

定义

E[T(x)]=T(x)\mathop{\mathbb{E}}[T(x)]=T(x) ,则无偏,否则有偏。

μMLE\mu_{\mathrm{MLE}} 无偏证明

E[μMLE]=E[1Ni=1Nxi]=1Ni=1NE[xi]=1Ni=1NE[μ](因为xii.i.d.N(μ,Σ))=μ\begin{aligned} \mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}] & = \mathop{\mathbb{E}}[\frac{1}{N}\sum \limits_{i=1}^N x_i] \\ & = \frac{1}{N}\sum \limits_{i=1}^N \mathop{\mathbb{E}}[x_i] \\ & = \frac{1}{N}\sum \limits_{i=1}^N \mathop{\mathbb{E}}[\mu] \quad (\text{因为}x_i \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(\mu, \Sigma)) \\ & = \mu \end{aligned}

σMLE2\sigma^2_{\mathrm{MLE}} 有偏证明与无偏估计量

化简:

σMLE2=1Ni=1N(xiμMLE)2=1Ni=1N(xi22xiμMLE+μMLE2)=1Ni=1Nxi21Ni=1N2xiμMLE2μMLE21Ni=1NμMLE2μMLE2=1Ni=1Nxi2μMLE2\begin{aligned} \sigma^2_{\mathrm{MLE}} & = \frac{1}{N} \sum \limits_{i=1}^N (x_i-\mu_{\mathrm{MLE}})^2 \\ & = \frac{1}{N} \sum \limits_{i=1}^N (x_i^2 -2x_i\mu_{\mathrm{MLE}}+\mu_{\mathrm{MLE}}^2) \\ & = \frac{1}{N} \sum \limits_{i=1}^N {x_i^2} - \underset{2\mu_{\mathrm{MLE}}^2}{\underbrace{\frac{1}{N} \sum \limits_{i=1}^N {2x_i\mu_{\mathrm{MLE}}}} } - \underset{\mu_{\mathrm{MLE}}^2}{\underbrace{\frac{1}{N} \sum \limits_{i=1}^N {\mu_{\mathrm{MLE}}^2}} } \\ & = \frac{1}{N} \sum \limits_{i=1}^N {x_i^2} - \mu_{\mathrm{MLE}}^2 \end{aligned}
1Ni=1N2xiμMLE\frac{1}{N} \sum \limits_{i=1}^N {2x_i\mu_{\mathrm{MLE}}} 化简过程

1Ni=1N2xiμMLE=2μMLE×1Ni=1Nxi(μMLE在此处为常数)=2μMLE2\begin{aligned} & \frac{1}{N} \sum \limits_{i=1}^N {2x_i\mu_{\mathrm{MLE}}} \\ & = 2\mu_{\mathrm{MLE}} \times\frac{1}{N} \sum \limits_{i=1}^N {x_i} && (\mu_{\mathrm{MLE}}\text{在此处为常数}) \\ & = 2\mu_{\mathrm{MLE}}^2 \end{aligned}

带入可得:

E[σMLE2]=E[1Ni=1Nxi2μMLE2]=E[1Ni=1N(xi2μ2(μMLE2μ2))]=E[1Ni=1N(xi2μ2)]E[1Ni=1N(μMLE2μ2)]\begin{aligned} \mathop{\mathbb{E}}[\sigma^2_{\mathrm{MLE}}] & = \mathop{\mathbb{E}}[\frac{1}{N} \sum \limits_{i=1}^N {x_i^2} - \mu_{\mathrm{MLE}}^2] \\ & = \mathop{\mathbb{E}}[\frac{1}{N} \sum \limits_{i=1}^N ({x_i^2} - \mu^2 - (\mu_{\mathrm{MLE}}^2-\mu^2))] \\ & = \underset{\text{①}}{\underbrace{\mathop{\mathbb{E}}[\frac{1}{N} \sum \limits_{i=1}^N ({x_i^2} - \mu^2)]} } - \underset{\text{②}}{\underbrace{\mathop{\mathbb{E}}[\frac{1}{N} \sum \limits_{i=1}^N (\mu_{\mathrm{MLE}}^2-\mu^2)]} } \end{aligned}

对于其中的①:

E[1Ni=1N(xi2μ2)]=1Ni=1NE[(xi2μ2)]=1Ni=1N(E[xi2]μ2)=Var(X)=σ2\begin{aligned} & \mathop{\mathbb{E}}[\frac{1}{N} \sum \limits_{i=1}^N ({x_i^2} - \mu^2)] \\ & = \frac{1}{N} \sum \limits_{i=1}^N \mathop{\mathbb{E}}[({x_i^2} - \mu^2)] \\ & = \frac{1}{N} \sum \limits_{i=1}^N (\mathop{\mathbb{E}}[{x_i^2}] - \mu^2) \\ & = \mathbf{Var}(X) = \sigma^2 \end{aligned}

对于其中的②:

E[1Ni=1N(μMLE2μ2)]=E[μMLE2]E[μMLE]2=Var(μMLE)=Var(1Ni=1Nxi)=1N2i=1NVar(xi)=1N2i=1Nσ2=1Nσ2\begin{aligned} & \mathop{\mathbb{E}}[\frac{1}{N} \sum \limits_{i=1}^N (\mu_{\mathrm{MLE}}^2-\mu^2)] \\ & = \mathop{\mathbb{E}}[\mu^2_{\mathrm{MLE}}] - \mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}]^2 \\ & = \mathbf{Var}(\mu_{\mathrm{MLE}}) \\ & = \mathbf{Var}(\frac{1}{N}\sum \limits_{i=1}^N x_i ) \\ & = \frac{1}{N^2}\sum \limits_{i=1}^N \mathbf{Var}(x_i ) \\ & = \frac{1}{N^2}\sum \limits_{i=1}^N \sigma^2 \\ & = \frac{1}{N}\sigma^2 \end{aligned}
关于 E[1Ni=1N(μMLE2μ2)]=E[μMLE2]E[μMLE]2\mathop{\mathbb{E}}[\frac{1}{N} \sum \limits_{i=1}^N (\mu_{\mathrm{MLE}}^2-\mu^2)]= \mathop{\mathbb{E}}[\mu^2_{\mathrm{MLE}}] - \mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}]^2

首先我们可以得到 E[1Ni=1N(μMLE2μ2)]=E[μMLE2μ2]=E[μMLE2]μ2\begin{aligned}\mathop{\mathbb{E}}[\frac{1}{N} \sum \limits_{i=1}^N (\mu_{\mathrm{MLE}}^2-\mu^2)] & = \mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}^2-\mu^2] \\ & = \mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}^2]-\mu^2\end{aligned} ,又由于 E[μMLE]\mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}] 无偏,即 E[μMLE]=μ\mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}]=\mu ,将其反代回原式即可得

E[1Ni=1N(μMLE2μ2)]=E[μMLE2]μ2=E[μMLE2]E[μMLE]2\begin{aligned} \mathop{\mathbb{E}}[\frac{1}{N} \sum \limits_{i=1}^N (\mu_{\mathrm{MLE}}^2-\mu^2)] & = \mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}^2]-\mu^2 \\ & = \mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}^2] - \mathop{\mathbb{E}}[\mu_{\mathrm{MLE}}]^2 \\ \end{aligned}

将①、②代回式中,我们可以得到 E[σMLE2]=N1Nσ2\mathop{\mathbb{E}}[\sigma^2_{\mathrm{MLE}}] = \frac{N-1}{N} \sigma^2 ,即 E[σMLE2]\mathop{\mathbb{E}}[\sigma^2_{\mathrm{MLE}}] 为有偏估计。

小结

本节中我们进行了一维高斯分布参数的极大似然度估计与无偏证明,在下一节中,我们将进行多维高斯分布的推导以及提出它的局限性。