The use of proteomics databases has become indispensable for daily work of
molecular biologists, but this situation has not yet been achieved for carb
ohydrate applications. One obvious reason is that existing data collections
are only rarely annotated and no cross-linking to other resources exists.
The existence of a generally accepted linear, canonical description for car
bohydrates which can be readily processed by computers will enable efficien
t automatic cross-linking of distributed carbohydrate data collections by s
erving as a unique and unambiguous database access key. Various possibiliti
es to derive a canonical notation are discussed. They can be divided into a
ttempts that require structure description alone and alternatives that prof
it from the fact that a preferred graph direction (non-reducing to reducing
end) exists within the structure. To open a fruitful discussion among glyc
oscientists a possible solution is presented where the reducing monosacchar
ide unit is selected as graph root and linkage information is used to defin
e the priority of the various branches. A Web interface (http://www.dkfz.de
/spec/linucs/) has been created that directly converts the commonly used ex
tended representation of complex carbohydrates into the preferred canonical
description or into its inverted form. (C) 2001 Elsevier Science Ltd. All
rights reserved.