It has earlier been proven that measured physicochemical properties are use
ful in the selection of building blocks for combinatorial chemistry as well
as for investigation of the scope and limitations of organic reactions. Ho
wever, measured physicochemical properties are only available for small sub
sets of reagents, starting materials or building blocks; therefore it is ne
cessary to use calculated descriptors and it is essential that the descript
ors are relevant. The objective was to investigate whether three different
descriptor data sets contained similar information about the chemical struc
ture, with the major aim to investigate whether calculated descriptors cont
ain similar information as experimental data. A total of 205 heterogeneous
primary amines were characterized using three different data sets of molecu
lar descriptor variables. The first set consisted of four physicochemical v
ariables compiled from the literature and commercially available chemicals
in chemical catalogues. From these four descriptors together with molecular
weight, three additional descriptors could be calculated, resulting in a t
otal of eight descriptor variables in the first data set. The second data s
et consisted of 81 calculated molecular descriptor variables relating to si
ze, connectivity, atom count, topology and electrotopology indices. The thi
rd data set consisted of 10 semi-empirical variables (AMI). All the calcula
ted variables were generated using the software Tsar 3.11. The descriptor v
ariable sets were compared using principal component analysis (PCA) and par
tial least squares projections to latent structures (PLS). The following re
sult shows that the different descriptor sets do contain similar latent inf
ormation and that the different types of calculated variables do correlate
well with the experimental data, making them suitable to use for e.g. combi
natorial library design. Copyright (C) 2000 John Wiley & Sons, Ltd.