Wednesday 20 February 2013

Bayesian network

The posterior probability is the probability of the parameters given the evidence : \(P(\theta|X)\).
Likelihood is the probability of the evidence given the parameters: \(P(X|\theta)\).

maximum a posteriori probability (MAP) estimate is a mode of the posterior distribution. It can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to maximum likelihood (ML), but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of ML estimation.

Maximum likelihood estimate of \(\theta\) is $$\hat\theta_{ML}(x) = \underset{\theta}{\arg\max}f(x|\theta)$$
If \(g(\theta)\) is a prior distribution over \(\theta\), $$\hat{\theta}_{MAP} = \underset{\theta}{\arg\max}f(x|\theta)g(\theta)$$

Conjugate prior

In Bayesian probability theory, if the posterior distributions \(p(\theta|x)\) are in the same family as the prior probability distribution \(p(\theta)\), the prior and posterior are then called conjugate distribution, and the prior is called a conjugate prior for the likelihood. For example, the Gaussian family is conjugate to itself with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian.

The posterior distribution of a parameter \(\theta\) given some data x is $$p(\theta|x) = \frac{p(x|\theta)p(\theta)}{\int{p(x|\theta)p(\theta)d\theta}}$$ A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior.

Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution. The Dirichlet distribution with parameters \(\alpha_1, \dots, \alpha_K > 0\) has a probability density function given by $$f(x_1, \dots, x_{K-1};\alpha_1, \dots, \alpha_K) \propto \prod_{i = 1}^Kx_i^{\alpha_i-1}$$ for \(x_i > 0\) and \(\sum x_i = 1\). This makes it suitable to be a prior distribution for a model parameter \(\boldsymbol{\theta}\) for a multinomial distribution where \(\boldsymbol{\theta}_i = x_i\).

No comments :

Post a Comment