Caner Ferhatoglu and Bradley Miller – North Central Soil Fertility Conference November 16-17, 2022
In this study, the effectiveness of six types of FS methods from four categories (filter, wrapper, embedded, and hybrid) were compared. These FS algorithms chose relevant covariates from a set of 1049 environmental covariates for predicting five soil fertility properties in ten fields, in combination with ten different ML algorithms. The resulting model performance was compared by three different metrics (R2 of 10-fold cross validation (CV), robustness ratio (RR; developed in this study), and independent validation with Lin’s concordance correlation coefficient (IV-CCC)). Wrapper (BorutaShap) and embedded (Lasso-FS, Random forest-FS) methods with decision-tree based ML algorithms usually led to the optimal models. FS improved CV, RR, and IV-CCC compared to the models built without FS for most fields and soil properties. Wrapper (BorutaShap) and embedded (Lasso-FS, Random forest-FS) methods usually led to the optimal models. The filter-based ANOVA-FS method mostly led to overfit models, especially for fields with smaller sample quantities. Decision-tree based models were usually part of the optimal combination of FS and ML. Considering RR helped identify optimal combinations of FS and ML that can improve the performance of DSM compared to models produced from full covariate stacks. FS can assist building better predictive soil models to create better digital soil maps, which in return can improve the farm management (e.g., fertilization, liming, and manuring).