wiki:recipes/ASM/DimensionalityReduction

Dimensionality reduction

When carrying out statistical analysis on high dimensional data sets (e.g. templates or images), it is useful and often essential to be able to reduce the dimensionality of the data set. This page will explain how to do this with shape (i.e. template) data, the same basic idea applies to colour (PCI) models.

The components of the ASM model are arranged in order of variance explained. The component that explains the most variance is first, the component that explains the next most variance is second etc. Consequently, the ASM model can be truncated to take the first n components that explains the desired amount of variance.

It is assumed that you have already created an ASM (or PCI) model and used it to analyse a set of templates (or images).

Method

First, we need to find the amount of variance explained by each component. Open the .txt file in a spreadsheet program (in order to open it in excel you may need to rename it to a .csv file). You should get two columns, the first labelled ‘Eigenvector’ simply labels the eigenvectors in order from 0 to 1 minus the number of faces. The second, labelled ‘Eigenvalue’ is the eigenvalue of that particular component. The amount of variance explained must sum to one, so divide each eigenvalue by the sum of *all* eigenvalues to get the variance explained by each component.

Eigenvector Eigenvalue
0 4184.51 =B2/SUM($B$2:$B$164)
1 1485.68 =C2/SUM($B$2:$B$164)
2 1145.03
3 689.22
4 490.41
. .
. .
161 0.34
162 0.29

How you truncate depends on what statistics you wish to analyse, there are two possible methods. One; work out the cumulative variance explained for each component (i.e. the variance explained by that component and all the previous components) and truncate at the component which exceeds the desired amount of variance. Alternatively, you could truncate at the average eigenvalue and keep only those components that individual explain at least average variance.

Truncation of the PCA model involves editing the csv file created by the ‘Batch Analyse Shape’ menu option. In this file each column (except the first which holds the template name) stores weighting value for a particular component, in order of highest variance explained (left) to lowest (right). Values can be truncated by deleting columns after the desired number of components. I.e. if you wish to keep the first 7 components, delete all columns starting with column 8. The remaining 7 columns are the 7 principal components describing the face shape.

Last modified 12 years ago Last modified on Mar 26, 2012, 7:33:15 PM
Note: See TracWiki for help on using the wiki.