统计推断(三) Exponential Family

指数族

1. Exponential family

Definition
- PDF: \(p(y;x)=\exp(\lambda(x)^T t(y)-\alpha(x)+\beta(y))\) \(y\sim \varepsilon(x;\lambda(\cdot),t(\cdot),\beta(\cdot))\)
- nature statistic: \(t(y)\)
- nature parameter: \(\lambda(x)\)
- log-partition function: \(\alpha(x)\)
- partition function: \(Z(x)=\exp(\alpha(x))\)
- distribution: \(\exp(\beta(y))\)
正则条件(regular)：若分布族中的任意一个分布 \(p(y;x)\) 都有其支集(support)与 x 无关，则为正则
- 实质上是要求 CRB 正则条件中求导和积分可换序 \[ \mathbb{E}\left[\frac{\partial}{\partial x}\ln p(y;x)\right]=\int\frac{\partial}{\partial x}p(y;x)dy = \frac{\partial}{\partial x}\int_a^b p(y;x)dy = 0 \]
指数分布族可以有多种获得方式
- 很多分布本身可以写成指数分布族形式
  - Bernulli distribution: \(y\sim \mathcal{B}(x)\)
  \[ p(y;x)=x^y (1-x)^{(1-y)} \\ \ln p(y;x)=\left(\ln(\frac{x}{1-x})\right)y-(-\ln(1-x)) \]
  - Gaussian \(y=[y_1,y_2]^T\sim \mathcal{N}(x,1)\) \[ p(y;x)=\frac{1}{\sqrt{2\pi}}\exp\left((y_1+y_2)x-x^2-\frac{y_1^2+y_2^2}{2}\right) \]
- 多个分布的几何均值 \[ p(y;x)=\frac{p_1^x(y)*p_2^{(1-x)}(y)}{Z(x)} \\ \ln p(y;x)=x\ln\left(\frac{p_1(y)}{p_2(y)}\right)-\ln Z(x)+\ln p_2(y) \]
  - 例如 \(p_1(y)\sim \mathcal{B}(\frac{1}{1+e^{-1}}), p_2(y)\sim \mathcal{B}(1/2)\) \[ p(y;x)=(\frac{1}{1+e^{-1}})^{xy}(\frac{e^{-1}}{1+e^{-1}})^{x(1-y)}(1/2)^{(1-x)}\sim \mathcal{B}(\frac{1}{1+e^{-x}}) \\ \frac{p(y=1;x)}{p(y=0;x)}=e^x \]
- Tilting \[ p(y;x)=\frac{p(y)e^{xy}}{Z(x)} \\ \ln p(y;x)=xy - \ln Z(x) + \ln p(y) \]
  - 例如 \(p(y)\sim \mathcal{N}(0,1)\)，\(p(y;x)\sim \mathcal{N}(x,1)\)
linear exponential family
- 定义：\(t(x)=x\)，\(\ln p(y;x)=x\ t(y) - \alpha(x)+\beta(y)\)
- 性质：\(\dot{\alpha}(x)=\mathbb{E}[t(y)], \ \ \dot{\dot{\alpha}}(x)=\mathbb{E}[t^2(y)]-\mathbb{E}[t(y)]^2=Var(t(y)) = J_y(x)\)
Proof： \[ \begin{align} Z(x) &= e^{\alpha(x)}=\int e^{x t(y)+\beta(y)}dy \\ \frac{\partial}{\partial x}Z(x) &= e^{\alpha(x)}\cdot \dot\alpha(x) = \int t(y)e^{xt(y)+\beta(y)}dy \\ \dot{\alpha}(x) &= \int t(y)p(y;x)dy = \mathbb{E}[t(y)] \end{align} \]

\[ \dot{\dot{\alpha}}(x)=\int t(y)\cdot p(y;x)\cdot (t(y)-\dot{\alpha}(x))dy \\ J_y(x) = \mathbb{E}\left[-\frac{\partial^2}{\partial x^2} \ln p(y;x)\right]=\dot{\dot{\alpha}}(x) \]
指数族分布与有效统计量(efficient statistics)
- 必要条件：若有效统计量存在，则可以写成指数族分布形式，且有 \[ t(x)=\int^x J_y(u)du, \ \ \ \alpha(x)=\int^x u J_y(u) du \]
Proof： \[ \begin{align} \hat {x}_{eff}(y) &= x+\frac{1}{J_y(x)}\frac{\partial}{\partial x}\ln p(y;x) \\ \frac{\partial}{\partial x}\ln p(y;x) &= J_y(x)\hat{x}_{eff}(y) - x J_y(x) \\ \ln p(y;x) &= \int^x J_y(u)du \cdot \hat{x}_{ML}(y) - \int^x u J_y(u) du \end{align} \]
- 充分条件：对于线性指数分布族，若有 \(J_y(x)\) 不依赖于 x，也即 \(J_y(x)\) 等于一个常数时，有效统计量存在
Proof：\(J_y(x)=J\) \[ \dot{\dot{\alpha}}(x)=J, \ \ \ \dot{\alpha}(x)=Jx-c \\ \hat x_{eff}(y) = x + \frac{1}{J}\frac{\partial}{\partial x}\ln p(y;x) = x + \frac{1}{J} (t(y)-\dot{\alpha}(x)) = x + \frac{1}{J}(t(y)-Jx+c)=\frac{t(y)}{J}+\frac{c}{J} \] 由于 \[ \frac{\partial}{\partial x}\ln p(y;x)|_{x=\hat x_{ML}} = 0 = t(y) - \dot{\alpha}(x)|_{x=\hat x_{ML}} \] 有 \[ \hat x_{eff}(y) = c/J + \frac{1}{J}\dot{\alpha}(x)|_{x=\hat x_{ML}} = \hat x_{ML}(y) \]

2. Sufficient statistics

2.1 Non-Bayesian case

Definition：t(y) 是关于分布 \(p_{\mathsf{y}}(\cdot;x)\) 的充分统计量，如果 \(p(y|t(y);x)\) 与 x 无关

Theorem 1(likelihood characterization)：

\(t(y)\) is sufficient w.r.t \(p(y;x)\) \(\iff \ \frac{p_{y}(y;x)}{p_t(t(y);x)}\) doesn't depend on x, for all x and y

Proof：omit...

Theorem 2(Neyman Factorization theorem)：

\(t(y)\) is sufficient w.r.t \(p(y;x)\) \(\iff \ 存在a(\cdot,\cdot)和b(\cdot)使得 \ \ p(y;x)=a\left(t(y),x\right) \cdot b(y)\)

Proof：omit...

minimum sufficient statistic：\(t^*\) 是 minimal 的，如果对任意其他充分统计量 t ，都存在 g() 使得 \(t^*=g(t)\)
complete：\(t^*\) 是 complete 的如果对任意函数 \(\phi(\cdot)\)，有 \(E[\phi(t^*(y))]=0 \ \ \forall x \iff \phi(\cdot) \equiv 0\)

Theorem：complete \(\Longrightarrow\) minimal

Proof：假设 t 为complete，s 为 minimal，存在 \(s=g(t)\)，\(E[t]=E\left[E\left[t|s=s\right]\right]\)

\(E[t|s=s]=f(s)=f(g(t))=\tilde{f}(t)\)

取 \(\phi(t)=t-\tilde{f}(t)\)，有 \(E[\phi(t)] = 0\)

根据 complete 的定义，有 \(\phi(t)\equiv0 \Longrightarrow t = \tilde{f}(t)=f(s)\)

故 t 也是 minimal

2.2 Bayesian case

Definition：t(y) 是关于分布 \(p_{\mathsf{y,x}}(\cdot,\cdot)\) 的充分统计量，如果 \(p_{\mathsf{y|t,x}}(y|t(y),x)=p_\mathsf{y|t}(y|t(y))\) 与 x 无关

Theorem(Belief characterization)：

\(t(y)\) is sufficient w.r.t \(p(y,x)\) \(\iff \ p(x|y)=p(x|t(y))\), for all x and y

Proof：omit...

Theorem(Neyman Factorization theorem)：

\(t(y)\) is sufficient w.r.t \(p(y,x)\) \(\iff \ p(y|x)=p(t(y)|x)\cdot p(y|t(y))\), for all x and y

Proof：omit...

3. Conjugate priors

Idea: Given a model \(p_\mathsf{y|x}\), look for a family of prior \(p_\mathsf{x}\) such that the induced posterior \(p_\mathsf{x|y}\) also in this family
Definition: a family of distribution \(q(\cdot;\theta)\) is conjugate to a model \(p_{y|x}\) if
- \(p_{y|x}(y_1,...,y_N|x) \propto q(x;\theta)\)
- \(q(x;\theta_1)q(x;\theta_2)\propto q(x;\theta_3)\)
Theorem: 对于采样数 N，联合分布 \(p^N_{y|x}()\) 有充分统计量，且其维度不依赖于 N，则对该模型存在共轭先验分布

统计推断

#指数族

统计推断(三) Exponential Family

https://glooow1024.github.io/2020/02/03/statistic/SI_Ch3_ExponentialFamily/

作者

Glooow

发布于

2020年2月3日

许可协议

统计推断(四) Information Geometry 上一篇

统计推断(二) Estimation Problem 下一篇