ITA
ENG

Relating amino acid sequence to phenotype: Analysis of peptide-binding data

Authors

Segal, MR Cummings, MP Hubbard, AE

Citation

Mr. Segal et al., Relating amino acid sequence to phenotype: Analysis of peptide-binding data, BIOMETRICS, 57(2), 2001, pp. 632-642

Citations number

Categorie Soggetti

Biology,Multidisciplinary

Journal title

BIOMETRICS

ISSN journal

0006341X → ACNP

Volume

Issue

Year of publication

2001

Pages

632 - 642

Database

ISI

SICI code

0006-341X(200106)57:2<632:RAASTP>2.0.ZU;2-C

Abstract

We illustrate data analytic concerns that arise in the context of relating genotype, as represented by amino acid sequence, to phenotypes (outcomes). The present application examines whether peptides that bind to a particular major histocompatibility complex (MHC) class I molecule have characteristi c amino acid sequences. However, the concerns identified and addressed are considerably more general. It is recognized that simple rules for predictin g binding based solely on preferences for specific amino acids in certain ( anchor) positions of the peptide's amino acid sequence are generally inadeq uate and that binding is potentially influenced by all sequence positions a s well as between-position interactions. The desire to elucidate these more complex prediction rules has spawned various modeling attempts, the shortc omings of which provide motivation for the methods adopted here. Because of (i) this need to model between-position interactions, (ii) amino acids con stituting a highly (20) multilevel unordered categorical covariate, and (ii i) there frequently being numerous such covariates (i.e., positions) compri sing the sequence, standard regression/classification techniques are proble matic due to the proliferation of indicator variables required for encoding the sequence position covariates and attendant interactions. These difficu lties have led to analyses based on (continuous) properties (e.g., molecula r weights) of the amino acids. However, there is potential information loss in such an approach if the properties used are incomplete and/or do not ca pture the mechanism underlying association with the phenotype. Here we demo nstrate that handling unordered categorical covariates with numerous levels and accompanying interactions can be done effectively using classification trees and recently devised bump-hunting methods. We further tackle the que stion of whether observed associations are attributable to amino acid prope rties as well as addressing the assessment and implications of between-posi tion covariation.