Classification of human tumors according to their primary anatomical site o
f origin is fundamental for the optimal treatment of patient's with cancer.
Here we describe the use of large-scale RNA profiling and supervised machi
ne learning algorithms to construct a first-generation molecular classifica
tion scheme for carcinomas of the prostate, breast, lung, ovary, colorectum
, kidney, liver, pancreas, bladder/ureter, and gastroesophagus, which colle
ctively account for similar to 70% of all cancer-related deaths in the Unit
ed States. The classification scheme was based on identifying gene subsets
whose expression typifies each cancer class, and we quantified the extent t
o which these genes are characteristic of a specific tumor type by accurate
ly and confidently predicting the anatomical site of tumor origin for 90% o
f 175 carcinomas, including 9 of 12 metastatic lesions. The predictor gene
subsets include those whose expression is typical of specific types of norm
al epithelial differentiation, as well as other genes whose expression is e
levated in cancer. This study demonstrates the feasibility of predicting th
e tissue origin of a carcinoma in the context of multiple cancer classes.