A Scalable Framework for Visual Exploration and Hypotheses Extraction of Phenomics Data using Topological Analytics

Understanding how gene by environment interactions result in specific phenotypes is a core goal of modern biology and has real-world impacts on such things as crop management. Developing and managing successful crop practices is a goal that is fundamentally tied to our national food security. By applying novel computational visual analytical methods, this project seeks to identify and unravel the complex web of interactions linking genotypes, environments and phenotypes. These methods will first need to be designed and developed into usable software applications that can handle large volumes of crop phenomics data. High-throughput sensing technologies collect large volumes of field data for many plant traits, such as flowering time, related to crop development and production. The maize cultivars used here come from multiple genotypes that have been grown under a variety of environmental conditions, in order to give the widest range of conditions for understanding the interactions. The resulting data sets are growing quickly, both in size and complexity, but the analytical tools needed to extract knowledge and catalyze scientific discoveries have significantly lagged behind. The methodologies to be developed in this project represent a systematic attempt at bridging this rapidly widening divide. The project is inherently interdisciplinary, involving close research partnerships among computer scientists, plant scientists, and mathematicians. The research outcomes will be tightly integrated with education using a multipronged approach that includes, among others, postdoctoral and student training (graduates and undergraduates), curriculum development for a new campus-wide interdisciplinary undergraduate degree in Data Analytics, conference tutorials for training phenomics data practitioners, and contribution to the recruitment and retention of underrepresented minorities (particularly women) in STEM fields through the Pacific Northwest Louis Stokes Alliance for Minority Participation.

This project will lead to the design and development of a new, scalable, visual analytics platform suitable for hypothesis extraction and refinement from complex phenomics data sets. Focus on hypothesis extraction is critical in the context of phenomics data sets because much of the high-throughput sensing data being generated in crop fields are generated in the absence of specifically formulated hypotheses. Extracting plausible hypotheses from the data represents an important but tedious task. To this end, this project will apply and develop new capabilities using emerging advanced algorithmic principles, particularly from the branch of mathematics called algebraic topology that studies shapes and structure of complex data. The research objectives are three-fold. First, the project will employ and extend emerging algorithmic techniques from algebraic topology to decode the structure of large, complex phenomics data. Second, an interactive visual analytic platform will be developed to facilitate knowledge discovery using the extracted topological structures. Lastly, the quality and validity of a new visual analytic platform designed by this team will be tested using real-world maize data sets as well as simulated inputs as testbeds. The developed framework will encode functions for scientists to delineate hypotheses of three kinds: i) genetic characterization of single complex traits; ii) genetic characterization of multiple traits that share potentially pleiotropic effects; and iii) decoding and detailed characterization of genotype-by-environmental interactions, in particular, through a collaborative pilot study of maize flowering and growth traits. The expected significance of the proposed work is that biologists will be able to extract different types of testable hypotheses from plant phenomics data sets by employing a new class of visual analytic tools, and thus obtain a deeper understanding of the interactions among genotypes, environments and phenotypes. The project is potentially transformative in two ways: i) it will introduce advanced mathematical and computational principles into mainstream phenomic data analysis; and ii) it will usher in a new era where biologists spearhead data-driven hypothesis extraction and discovery with the aid of interactive, informative, and intuitive tools. The project will have a direct impact on the state of software in phenomics for fundamental data-driven discovery. To facilitate broader community adoption, the project will integrate the tools into the CyVerse Institute, and to a community phenomics software outlet. It will also lead to the development of automated scientific workflows. Project website: http://tdaphenomics.eecs.wsu.edu/

08/01/2017 to 07/31/2020
Principal Investigator(s):