Feature extraction from any combination of sensory stimuli can be seen
as a detection of statistically correlated combination of inputs. A m
athematical framework that describes this fact is formulated using con
cepts of the Information Theory. The key idea is to define a bijective
transformation that conserves the volume in order to assure the trans
mission of all the information from inputs to outputs without spurious
generation of entropy. In addition, this transformation simultaneousl
y constrains the distribution of the outputs so that the representatio
n is factorial, i.e., the redundancy at the output layer is minimal. W
e formulate this novel unsupervised learning paradigm for a linear net
work. The method converges in the linear case to the principal compone
nt transformation. Contrary to the ''infomax'' principle, we minimize
the mutual information between the output neurons provided that the tr
ansformation conserves the entropy in the vertical sense (from input t
o outputs).