Development and Evaluation of Improved Strategies for Genomic Selection Via Simulations and Empirical Testing

The overall goal of the proposed project is to increase the efficiency of crop breeding programs by developing and deploying improved genomic selection strategies that rely on improvements in the selection and mating steps.As a consequence of growing populations, changing diets, and the challenges of climate change, agricultural systems must produce more with less. More means greater demand for agricultural products such as food, feed, energy and fiber. Less means reduced agricultural inputs (at least on a per output basis) such as water, fertilizer, pesticides and a reduced environmental footprint, all on less land. A key tool for making agriculture systems more productive, sustainable and resilient is genetic improvement via breeding.Selection based on favorable (phenotypic selection) traits was central to the domestication of crops and has been used successfully in plant breeding for thousands of years. Recurrent selection is a special case of phenotypic selection that refers to a type of population improvement which includes the following key steps: phenotyping (evaluation), selection, and mating.Phenotyping involves measuring trait values of a population, such as grain and biomass yield, flowering time and color, etc. This step is often time-consuming/labor intensive and many phenotypes cannot be accurately scored without replicated tests.Selection involves the identification of those individuals within the population that exhibit the most favorable trait values.Mating occurs following selection and selected lines are typically randomly or arbitrarily mated to create a new population for phenotyping and selection. Little research has been conducted on the effects of other mating strategies.The phenotyping step can be costly, time consuming, logistically challenging, and/or destructive because it often involves a large number of individuals and many factors that can affect the measurements. Genomic selection (GS), which relies on efficient, high-throughput genotyping technologies (to determine the genetic composition of plants), enables breeders to predict the phenotype of plants based on their genotypes using advanced statistical models and a population of plants of known phenotypes used to “train” the prediction model. This new approach has transformed breeding, because it eliminates or reduces the need for phenotyping.Even moderately accurate genomic prediction can result in substantial cost saving and improve the rate of genetic gain in breeding programs. Previous literature has focused on three areas: the first is to improve the accuracy of phenotype prediction, upon which selection is based; the second is to define other quantitative indicators in lieu of the predicted phenotype to reflect the fitness of the lines from the genetics perspective and their likelihood to produce superior progeny; the third is the selection of an optimal training population that maximizes the accuracy of the prediction model when only a limited number of plants can be phenotyped. Although genomic prediction has transformed breeding, the selection and mating steps have not received equal attention and typically remain the same as in traditional phenotypic selection. Optimizing these two steps is a focus of the proposed project.To design optimal selection and mating strategies and to understand their interactions with other aspect of breeding programs we will make use of simulations within a framework of operations research, which deals with the application of advanced analytical methods to help make better decisions. Significantly, we will also develop user-friendly tools to allow breeders to optimize these parameters for their own breeding scenarios. The new selection and mating strategies that we propose will be designed using advanced mathematical programming and optimization techniques, fully tailorable to reflect user preferences with respect to the trade-offs among cost, time, and probability of success.

Parallel Algorithms and Software for High-Throughput Sequence Assembly

High-throughput next-generation DNA sequencing technologies (NGS) are causing a major revolution in life sciences research by allowing rapid and cost-effective sampling of genomes and transcriptomes (expressed genomic sequences). Assembly of genomes and transcriptomes from billions of such randomly sampled sequences is an important problem in computational biology. While significant strides have been made, much work remains in addressing the diverse and rapidly emerging platforms, improving assembly quality, and scaling to both large-scale data sizes and large genomes.

This project will harness the power of high performance computing to develop effective solutions for sequence assembly. It will lead to the development of scalable, efficient parallel algorithms and a parallel integrated software framework for genome and transcriptome assembly. The project seeks to advance the state of the art by targeting important unsolved problems such as hybrid assembly of sequences from multiple NGS platforms, making fundamental algorithmic advances to improve assembly quality, and conducting an in-depth effort at parallel algorithms development for the entire gamut of problems that arise in connection with assembly. It will be carried out by an interdisciplinary team of investigators, in partnership with leading NGS manufacturers and academicians involved in large plant genome sequencing projects.

The project will lead to the release of a scalable parallel software package for sequence assembly that will be made available to the scientific community. Postdoctoral and graduate students will be trained in computer science driven interdisciplinary research and in writing efficient high performance computing software. The project will influence curriculum development and will lead to educational materials in bioinformatics for next-generation sequencing.