Bayesian methods for multivariate modeling of pleiotropic SNP associations and genetic risk prediction
- 1 Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- 2 Department of Medicine, Boston University School of Medicine, Boston, MA, USA
Genome-wide association studies (GWAS) have identified numerous associations between genetic loci and individual phenotypes; however, relatively few GWAS have attempted to detect pleiotropic associations, in which loci are simultaneously associated with multiple distinct phenotypes. We show that pleiotropic associations can be directly modeled via the construction of simple Bayesian networks, and that these models can be applied to produce single or ensembles of Bayesian classifiers that leverage pleiotropy to improve genetic risk prediction. The proposed method includes two phases: (1) Bayesian model comparison, to identify Single-Nucleotide Polymorphisms (SNPs) associated with one or more traits; and (2) cross-validation feature selection, in which a final set of SNPs is selected to optimize prediction. To demonstrate the capabilities and limitations of the method, a total of 1600 case-control GWAS datasets with two dichotomous phenotypes were simulated under 16 scenarios, varying the association strengths of causal SNPs, the size of the discovery sets, the balance between cases and controls, and the number of pleiotropic causal SNPs. Across the 16 scenarios, prediction accuracy varied from 90 to 50%. In the 14 scenarios that included pleiotropically associated SNPs, the pleiotropic model search and prediction methods consistently outperformed the naive model search and prediction. In the two scenarios in which there were no true pleiotropic SNPs, the differences between the pleiotropic and naive model searches were minimal. To further evaluate the method on real data, a discovery set of 1071 sickle cell disease (SCD) patients was used to search for pleiotropic associations between cerebral vascular accidents and fetal hemoglobin level. Classification was performed on a smaller validation set of 352 SCD patients, and showed that the inclusion of pleiotropic SNPs may slightly improve prediction, although the difference was not statistically significant. The proposed method is robust, computationally efficient, and provides a powerful new approach for detecting and modeling pleiotropic disease loci.
Keywords: pleiotropy, SNP, GWAS, prediction, Bayesian
Citation: Hartley SW, Monti S, Liu C-T, Steinberg MH and Sebastiani P (2012) Bayesian methods for multivariate modeling of pleiotropic SNP associations and genetic risk prediction. Front. Gene. 3:176. doi: 10.3389/fgene.2012.00176
Received: 29 June 2012; Accepted: 20 August 2012;
Published online: 11 September 2012.
Edited by:Jielin Sun, Wake Forest University School of Medicine, USA
Reviewed by:Lili Ding, Cincinnati Children’s Hospital Medical Center, USA
Sha Tao, Van Andel Institute, USA
Riccardo Bellazzi, Università di Pavia, Italy
Copyright: © 2012 Hartley, Monti, Liu, Steinberg and Sebastiani. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.
*Correspondence: Stephen W. Hartley, National Institutes of Health/National Human Genome Research Institute, 5625 Fishers Lane, Suite 5N-01, Rockville, MD 20850, USA. e-mail: email@example.com; Paola Sebastiani, Department of Biostatistics, Boston University, 801 Massachusetts Avenue, 3rd floor, Boston, MA 02118, USA. e-mail: firstname.lastname@example.org