Likelihood is the probability of the evidence given the parameters: P(X|θ).
A maximum a posteriori probability (MAP) estimate is a mode of the posterior distribution. It can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to maximum likelihood (ML), but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of ML estimation.
Maximum likelihood estimate of θ is ˆθML(x)=argmaxθf(x|θ)
If g(θ) is a prior distribution over θ, ˆθMAP=argmaxθf(x|θ)g(θ)
Conjugate prior
In Bayesian probability theory, if the posterior distributions p(θ|x) are in the same family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distribution, and the prior is called a conjugate prior for the likelihood. For example, the Gaussian family is conjugate to itself with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian.
The posterior distribution of a parameter θ given some data x is p(θ|x)=p(x|θ)p(θ)∫p(x|θ)p(θ)dθ A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior.
Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution. The Dirichlet distribution with parameters α1,…,αK>0 has a probability density function given by f(x1,…,xK−1;α1,…,αK)∝K∏i=1xαi−1i for xi>0 and ∑xi=1. This makes it suitable to be a prior distribution for a model parameter θ for a multinomial distribution where θi=xi.
The posterior distribution of a parameter θ given some data x is p(θ|x)=p(x|θ)p(θ)∫p(x|θ)p(θ)dθ A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior.
Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution. The Dirichlet distribution with parameters α1,…,αK>0 has a probability density function given by f(x1,…,xK−1;α1,…,αK)∝K∏i=1xαi−1i for xi>0 and ∑xi=1. This makes it suitable to be a prior distribution for a model parameter θ for a multinomial distribution where θi=xi.
No comments :
Post a Comment