Hierarchical Modeling and Parallelized Bayesian Inference for the Analysis of RNAseq Data

This proposal focuses on the development of hierarchical models and parallelized Bayesian inference for the analysis of RNA sequencing (RNAseq) data. Special emphasis is placed on gene expression profiling of parental inbred lines and their hybrid offspring for the discovery of key genes underlying heterosis, the genetic phenomenon otherwise known as hybrid vigor. The project will be led by a collaborative team of researchers with expertise in the analysis of high-dimensional gene expression data, Bayesian inference, bioinformatics, biology, computational methods, genetics, genomics, and statistics. The proposed research provides new tools for the analysis of high-dimension and low-sample-size count data generated by RNAseq technology. Hierarchical modeling allows for flexible information sharing across dimensions to extract as much information as possible from data. Parallel methods for Bayesian inference harness the power of modern computing to produce comprehensive results in a timely manner. Specific methods will be developed for (i) the identification of genes that exhibit expression heterosis, (ii) the detection of expressed and non-expressed genes, and (iii) the discovery of differential allele usage in hybrids. These methods will provide a deeper understanding of the molecular mechanisms of heterosis and lead to the discovery of key genes whose expression patterns provide hybrids with advantages over their parents. This information can be used to efficiently predict which of thousands of possible crosses will result in top performing hybrids. In addition to the specific methods mentioned above, hierarchical generalized linear models for the simultaneous analysis of tens of thousands of response variables will be developed. This work will permit the analysis of RNAseq data from complex designs with multiple sources of variability and will greatly extend the range of applicability for the funded research to encompass a variety of challenges in high-dimensional data analysis.

Public Health Relevance: The proposed work will provide medical researchers with advanced tools for studying the functions of genes in complex biological systems. The enhanced understanding of gene functions obtained with the developed tools can deepen understanding of diseases and lead to new treatments for the improvement of public health.

Duration: 
09/01/2013 to 05/31/2018
Principal Investigator(s):