Handling Data Imbalance and G×E Interactions in On-Farm Trials Using Bayesian Hierarchical Models

Sophie Donnet

doi:10.24072/pci.mcb.100272

Handling Data Imbalance and G×E Interactions in On-Farm Trials Using Bayesian Hierarchical Models

Sophie Donnet based on reviews by Pierre Druilhet and David Makowski

A recommendation of:

Bayesian joint-regression analysis of unbalanced series of on-farm trials

Michel Turbet Delof , Pierre Rivière , Julie C Dawson, Arnaud Gauffreteau , Isabelle Goldringer , Gaëlle van Frank , Olivier David (2024), HAL, ver.2, peer-reviewed and recommended by PCI Mathematical and Computational Biology https://hal.science/hal-04380787

Read preprint in preprint server Now published in Peer Community Journal

Data used for results

Scripts used to obtain or analyze results

Abstract

ZH-CN

Bayesian joint-regression analysis of unbalanced series of on-farm trials

Participatory plant breeding (PPB) is aimed at developing varieties adapted to agroecologically-based systems. In PPB, selection is decentralized in the target environments, and relies on collaboration between farmers, farmers' organisations and researchers. By doing so, evaluation of new genotypes takes genotype x environment (GxE) interactions into account to select for specific adaptation. In many cases, there is little overlap among genotypes assessed from farm to farm because the farmers participating in a PPB project choose which ones to assess on their farm. In addition, on-farm trials can often generate more extreme observations than trials carried out on research stations. These features make the estimation of genotype, environment and interaction effects more difficult. This challenge is not unique to PPB, as many breeding programs use sparse testing or incomplete block designs to evaluate more genotypes, however in PPB genotypes are not always assigned randomly to environments. To explore methods of overcoming these challenges, this article tests various data analysis scenarios using a Bayesian approach with different models and a real wheat PPB dataset over 11 years. Four morpho-agronomic traits were studied, representing over 1000 GxE combinations from 189 on-farm trials. This dataset was severely unbalanced with more than 90% of GxE combinations missing. We compared various Bayesian Finlay-Wilkinson models and found that placing hierarchical distributions on model parameters and modelling residuals using a Student's t distribution jointly improved the estimates of main effects and interactions. Environment effects were the most important and explained more than 50% of the variance of observations. This statistical framework allowed us to estimate two indicators of genotype stability (one static and one dynamic) despite the high disequilibrium of the data. We found differences in mean and stability as between genotype categories, with registred varieties consistently shorter (-30 cm) and containing less protein (-0.3%) than other types of varieties. The methods developed could be used for evaluation and/or selection within networks of various stakeholders such as farmers, gardeners, plant breeders or managers of genetic resource centres.

decentralized participatory plant breeding; bread wheat; GXE interaction; hierarchical model; Finlay-Wilkinson model; Student's t distribution; varietal stability