Variable selection procedures before partial least squares regression enhance the accuracy of milk fatty acid composition predicted by mid-infrared spectroscopy

J Dairy Sci. 2016 Oct;99(10):7782-7790. doi: 10.3168/jds.2016-10849. Epub 2016 Aug 10.

Abstract

Mid-infrared spectroscopy is a high-throughput technique that allows the prediction of milk quality traits on a large-scale. The accuracy of prediction achievable using partial least squares (PLS) regression is usually high for fatty acids (FA) that are more abundant in milk, whereas it decreases for FA that are present in low concentrations. Two variable selection methods, uninformative variable elimination or a genetic algorithm combined with PLS regression, were used in the present study to investigate their effect on the accuracy of prediction equations for milk FA profile expressed either as a concentration on total identified FA or a concentration in milk. For FA expressed on total identified FA, the coefficient of determination of cross-validation from PLS alone was low (0.25) for the prediction of polyunsaturated FA and medium (0.70) for saturated FA. The coefficient of determination increased to 0.54 and 0.95 for polyunsaturated and saturated FA, respectively, when FA were expressed on a milk basis and using PLS alone. Both algorithms before PLS regression improved the accuracy of prediction for FA, especially for FA that are usually difficult to predict; for example, the improvement with respect to the PLS regression ranged from 9 to 80%. In general, FA were better predicted when their concentrations were expressed on a milk basis. These results might favor the use of prediction equations in the dairy industry for genetic purposes and payment system.

Keywords: genetic algorithm; mid-infrared spectroscopy; milk fatty acid; variable selection.

MeSH terms

  • Animals
  • Fatty Acids
  • Least-Squares Analysis*
  • Milk / chemistry*
  • Phenotype
  • Spectrophotometry, Infrared

Substances

  • Fatty Acids