dancing with my shadow: Bayesian network

Wednesday, 20 February 2013

Bayesian network

The posterior probability is the probability of the parameters given the evidence : $P(\theta|X)$.
Likelihood is the probability of the evidence given the parameters: $P(X|\theta)$.

A maximum a posteriori probability (MAP) estimate is a mode of the posterior distribution. It can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to maximum likelihood (ML), but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of ML estimation.

Maximum likelihood estimate of $\theta$ is $$\hat\theta_{ML}(x) = \underset{\theta}{\arg\max}f(x|\theta)$$
If $g(\theta)$ is a prior distribution over $\theta$, $$\hat{\theta}_{MAP} = \underset{\theta}{\arg\max}f(x|\theta)g(\theta)$$

Conjugate prior

In Bayesian probability theory, if the posterior distributions $p(\theta|x)$ are in the same family as the prior probability distribution $p(\theta)$, the prior and posterior are then called conjugate distribution, and the prior is called a conjugate prior for the likelihood. For example, the Gaussian family is conjugate to itself with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian.

The posterior distribution of a parameter $\theta$ given some data x is $$p(\theta|x) = \frac{p(x|\theta)p(\theta)}{\int{p(x|\theta)p(\theta)d\theta}}$$ A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior.

Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution. The Dirichlet distribution with parameters $\alpha_1, \dots, \alpha_K > 0$ has a probability density function given by $$f(x_1, \dots, x_{K-1};\alpha_1, \dots, \alpha_K) \propto \prod_{i = 1}^Kx_i^{\alpha_i-1}$$ for $x_i > 0$ and $\sum x_i = 1$. This makes it suitable to be a prior distribution for a model parameter $\boldsymbol{\theta}$ for a multinomial distribution where $\boldsymbol{\theta}_i = x_i$.

dancing with my shadow

Pages

Wednesday, 20 February 2013

Bayesian network

Conjugate prior

No comments :

Post a Comment