Wednesday, 27 February 2013
Ray Kurzweil
Ray Kurzweil is a prolific inventor and is involved in fields like OCR, text-to-speech synthesis, and speech recognition. He did his undergraduate at MIT. In this talk, he emphasized the exponential growth of information technology. I think everyone can see that information technology is growing very fast. Just in the short span of my life so far, my daily life has transitioned from one that is almost devoid of electronic devices to one in which computers, smart phones, tablets etc are indispensable. However, I haven't thought too much about the actual growth speed before this talk. Grasping the concept of exponential growth is almost eye-opening. This really makes me feel very excited because the range of possibilities that technology can achieve is kind of unfathomable. It's exciting to live in this age of the world and I also feel an urgency in my work. I have to work fast, otherwise the work will be outdated very soon.
Labels:
computer science
,
video
Sunday, 24 February 2013
Vim cheat sheet
Compound command
Compound command | Equivalent in longhand | Effect |
---|---|---|
C | c$ | clear to the end of the line |
s | cl | clear to right |
S | ^C | clear to the start of the line |
I | ^i | insert from the start of the line |
A | $a | insert after the end of the line |
o | A<CR> | insert to the new line below |
O | ko | insert to the new line above |
Labels:
vim
Wednesday, 20 February 2013
Bayesian network
The posterior probability is the probability of the parameters given the evidence : \(P(\theta|X)\).
Likelihood is the probability of the evidence given the parameters: \(P(X|\theta)\).
A maximum a posteriori probability (MAP) estimate is a mode of the posterior distribution. It can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to maximum likelihood (ML), but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of ML estimation.
Maximum likelihood estimate of \(\theta\) is $$\hat\theta_{ML}(x) = \underset{\theta}{\arg\max}f(x|\theta)$$
If \(g(\theta)\) is a prior distribution over \(\theta\), $$\hat{\theta}_{MAP} = \underset{\theta}{\arg\max}f(x|\theta)g(\theta)$$
Likelihood is the probability of the evidence given the parameters: \(P(X|\theta)\).
A maximum a posteriori probability (MAP) estimate is a mode of the posterior distribution. It can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to maximum likelihood (ML), but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of ML estimation.
Maximum likelihood estimate of \(\theta\) is $$\hat\theta_{ML}(x) = \underset{\theta}{\arg\max}f(x|\theta)$$
If \(g(\theta)\) is a prior distribution over \(\theta\), $$\hat{\theta}_{MAP} = \underset{\theta}{\arg\max}f(x|\theta)g(\theta)$$
Conjugate prior
In Bayesian probability theory, if the posterior distributions \(p(\theta|x)\) are in the same family as the prior probability distribution \(p(\theta)\), the prior and posterior are then called conjugate distribution, and the prior is called a conjugate prior for the likelihood. For example, the Gaussian family is conjugate to itself with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian.
The posterior distribution of a parameter \(\theta\) given some data x is $$p(\theta|x) = \frac{p(x|\theta)p(\theta)}{\int{p(x|\theta)p(\theta)d\theta}}$$ A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior.
Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution. The Dirichlet distribution with parameters \(\alpha_1, \dots, \alpha_K > 0\) has a probability density function given by $$f(x_1, \dots, x_{K-1};\alpha_1, \dots, \alpha_K) \propto \prod_{i = 1}^Kx_i^{\alpha_i-1}$$ for \(x_i > 0\) and \(\sum x_i = 1\). This makes it suitable to be a prior distribution for a model parameter \(\boldsymbol{\theta}\) for a multinomial distribution where \(\boldsymbol{\theta}_i = x_i\).
The posterior distribution of a parameter \(\theta\) given some data x is $$p(\theta|x) = \frac{p(x|\theta)p(\theta)}{\int{p(x|\theta)p(\theta)d\theta}}$$ A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior.
Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution. The Dirichlet distribution with parameters \(\alpha_1, \dots, \alpha_K > 0\) has a probability density function given by $$f(x_1, \dots, x_{K-1};\alpha_1, \dots, \alpha_K) \propto \prod_{i = 1}^Kx_i^{\alpha_i-1}$$ for \(x_i > 0\) and \(\sum x_i = 1\). This makes it suitable to be a prior distribution for a model parameter \(\boldsymbol{\theta}\) for a multinomial distribution where \(\boldsymbol{\theta}_i = x_i\).
Labels:
ml
Linear algebra reivew
Homogeneous systems
$$b_{11}x_1 + b_{12}x_2 + \dots + b_{1n}x_n = 0 \\
b_{21}x_1 + b_{22}x_2 + \dots + b_{2n}x_n = 0 \\
\dots \\
b_{p1}x_1 + b_{p2}x_2 + \dots + b_{pn}x_n = 0$$
Properties:
Properties:
- Has at least one 1 solution [0, 0, ..., 0]
- If it has a non-zero solution, then it has infinite number of solutions.
Inverse of a matrix
A matrix \(A\) has an inverse if one of the following holds:
- \(det(A)\neq 0\)
- The reduced form of A is the identity matrix.
- \(A\) has full rank.
- The homogeneous equation \(Ax=0\) has a unique solution.
- The equation \(Ax = b\) has a unique solution for every b.
Eigenvalues and eigenvectors
An eigenvector is a non-zero vector which is transformed to a scalar multiple of itself.
The eigenvalue equation for a matrix \(A\) is \(Av-\lambda v = 0\), which is equivalent to \((A - \lambda I)v = 0\).
This equation only has non-zero solutions if and only if \(det(A - \lambda I) = 0\), i.e.\((A - \lambda I)\) is singular and not invertible.Showing that an eigenbasis makes for good coordinate systems:
Theorems
Number of eigenvalues of a matrix
Suppose that A is a square matrix of size n with distinct eigenvalues \(\lambda_1, \lambda_2, \lambda_3,\dots,\lambda_k\). Then \(\sum_{i=1}^k\alpha_A(\lambda_i) = n\), where \(\alpha_A(\lambda_i)\) is the algebraic multiplicity of \(\lambda_i\).
Maximum number of eigenvalues of a matrix
Suppose that A is a square matrix of size n. Then A cannot have more than n distinct eigenvalues.Spectral theorem
Consider a Hermitian map A on a finite-dimensional real or complex inner product space V endowed with a positive definite Hermitian inner product. The Hermitian condition means
An equivalent condition is that A* = A where A* is the hermitian conjugate of A. In the case that A is identified with an Hermitian matrix (one which is equal to its own conjugate transpose), the matrix of A* can be identified with its conjugate transpose. If A is a real matrix, this is equivalent to AT = A (that is, A is a symmetric matrix).
Theorem. There exists an orthonormal basis of V consisting of eigenvectors of A. Each eigenvalue is real.
So in less precise terms, and considering only real numbers: if A is a symmetric matrix, its eigenvectors form an orthonormal basis.
So in less precise terms, and considering only real numbers: if A is a symmetric matrix, its eigenvectors form an orthonormal basis.
Principal component analysis
One of the applications involving eigenvalues and eigenvectors is PCA. It transforms a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The first principal component has the largest possible variance (that is, accounts for as much as of the variability in the data as possible).
PCA is used in the eigenface technique for face recognition.
One question one would ask is why the eigenvectors of the covariance matrix \(\textbf{V}\) of the data are the principal component. So by definition, the first principal component \(\textbf{w}\) has the largest possible variance. If we project all the data in to this direction. The variance of the resultant data is \(\textbf{w}^T\textbf{Vw}\). So we want to choose a unit vector \(\textbf{w}\) to maximize the variance. Note that we need to constrain the maximization otherwise there is no maximum point of the objective function. The constraint is \(\textbf{w}\) is a unit vector so that \(\textbf{w}^T\textbf{w} = 1\). To do constrained optimization, we need to use Lagrange multiplier. The Lagrange function is thus
\begin{align}L(\textbf{w}, \lambda) &= \textbf{w}^T\textbf{w} - \lambda(\textbf{w}^T\textbf{w} - 1)\\
\frac{\partial u}{\partial \textbf{w}} &= 2\textbf{Vw} - 2\lambda\textbf{w} \\
\textbf{Vw} &= \lambda\textbf{w} \\
\textbf{AA}^T\textbf{w} &= \lambda\textbf{w}
\end{align}
This means the maximizing vector will be the eigenvector with the largest eigenvalue. The principal component vector is also a linear combination of the original variables.
$$M = U\Sigma V^*$$
PCA is used in the eigenface technique for face recognition.
One question one would ask is why the eigenvectors of the covariance matrix \(\textbf{V}\) of the data are the principal component. So by definition, the first principal component \(\textbf{w}\) has the largest possible variance. If we project all the data in to this direction. The variance of the resultant data is \(\textbf{w}^T\textbf{Vw}\). So we want to choose a unit vector \(\textbf{w}\) to maximize the variance. Note that we need to constrain the maximization otherwise there is no maximum point of the objective function. The constraint is \(\textbf{w}\) is a unit vector so that \(\textbf{w}^T\textbf{w} = 1\). To do constrained optimization, we need to use Lagrange multiplier. The Lagrange function is thus
\begin{align}L(\textbf{w}, \lambda) &= \textbf{w}^T\textbf{w} - \lambda(\textbf{w}^T\textbf{w} - 1)\\
\frac{\partial u}{\partial \textbf{w}} &= 2\textbf{Vw} - 2\lambda\textbf{w} \\
\textbf{Vw} &= \lambda\textbf{w} \\
\textbf{AA}^T\textbf{w} &= \lambda\textbf{w}
\end{align}
This means the maximizing vector will be the eigenvector with the largest eigenvalue. The principal component vector is also a linear combination of the original variables.
Singular value decomposition
Singular value decomposition can be expressed asThe column of V are eigenvectors of \(M^*M\). If M is positive semi-definite, the eigenvalue decomposition of M is the same as singular value decomposition. However, the eigenvalue decomposition and the singular value decomposition differ for all other matrices M.
Labels:
linear algebra
,
math
Thursday, 14 February 2013
Running Matlab in Awesome
Today I tried to run Matlab R2012a 64bit under Awesome, but I only got gray windows. After googling around, I found that X window integration in most modern Java virtual machine displays gray windows when used with non-re-parenting window manager such as Awesome. According to this post, Sun JVM 7 shows this symptom for some Java applications, while OpenJDK may work. So I changed the Matlab Java directory to point to OpenJDK's jre and it works.
I found the solution from this forum post.
export MATLAB_JAVA="/usr/lib/jvm/java-7-openjdk-amd64/jre"To run Matlab from command line, use:
matlab -desktopTo check the environment variables used by Matlab, use:
matlab -n
I found the solution from this forum post.
Labels:
matlab
Tuesday, 5 February 2013
Hand pose descriptor
Left column: hand pose after aligning major axis with x axis; right column: visualization of the cylindrical descriptor
The saturation of the color in the descriptor is proportional to the value of each bin. The hue of the color reflects the depth of the annulus section.
Frame 261: currently classified as class 1 using mclust (k means clustering) R package
Frame 341: currently classified as class 2
Frame 261:
Frame 333:
Frame 341:
Labels:
research
Subscribe to:
Posts
(
Atom
)