The basic idea of linear principal component analysis (PCA) involves d
ecorrelating coordinates by an orthogonal linear transformation. In th
is paper we generalize this idea to the nonlinear case. Simultaneously
we shall drop the usual restriction to Gaussian distributions. The li
nearity and orthogonality condition of linear PCA is replaced by the c
ondition of volume conservation in order to avoid spurious information
generated by the nonlinear transformation. This leads us to another v
ery general class of nonlinear transformations, called symplectic maps
. Later, instead of minimizing the correlation, we minimize the redund
ancy measured at the output coordinates. This generalizes second-order
statistics, being only valid for Gaussian output distributions, to hi
gher-order statistics. The proposed paradigm implements Barlow's redun
dancy-reduction principle for unsupervised feature extraction. The res
ulting factorial representation of the joint probability distribution
presumably facilitates density estimation and is applied in particular
to novelty detection.