Ferhatoglu, C.; Miller, B.A. Choosing Feature Selection Methods for Spatial Modeling of Soil Fertility Properties at the Field Scale. Agronomy 2022, 12, 1786. https://doi.org/10.3390/agronomy12081786
With the growing availability of environmental covariates, feature selection (FS) is becoming an essential task for applying machine learning (ML) in digital soil mapping (DSM). In this study, the effectiveness of six types of FS methods from four categories (filter, wrapper, embedded, and hybrid) were compared. These FS algorithms chose relevant covariates from an exhaustive set of 1049 environmental covariates for predicting five soil fertility properties in ten fields, in combination with ten different ML algorithms. Resulting model performance was compared by three different metrics (R2 of 10-fold cross validation (CV), robustness ratio (RR; developed in this study), and independent validation with Lin’s concordance correlation coefficient (IV-CCC)). FS improved CV, RR, and IV-CCC compared to the models built without FS for most fields and soil properties. Wrapper (BorutaShap) and embedded (Lasso-FS, Random forest-FS) methods usually led to the optimal models. The filter-based ANOVA-FS method mostly led to overfit models, especially for fields with smaller sample quantities. Decision-tree based models were usually part of the optimal combination of FS and ML. Considering RR helped identify optimal combinations of FS and ML that can improve the performance of DSM compared to models produced from full covariate stacks.