-
PDF
- Split View
-
Views
-
Cite
Cite
Claudia Scatigno, Giulia Festa, FTIR coupled with machine learning to unveil spectroscopic benchmarks in the Italian EVOO, International Journal of Food Science and Technology, Volume 57, Issue 7, July 2022, Pages 4156–4162, https://doi.org/10.1111/ijfs.15735
- Share Icon Share
Abstract
Non-destructive analytical analyses coupled with classification and regression algorithms are promising techniques for monitoring quality, traceability and safety assessment in food industry. To prevent food fraud, Italian Extra Virgin Olive Oil (EVOO) is particularly held in check. Here, attenuated total Reflectance-Fourier Transform Infrared spectroscopy (ATR-FTIR) with Machine Learning is carried out to study an Italian EVOO data set coming from 6 regions to verify the geographical traceability, the cultivar and the repeatability of the agronomical practices, till the adulterated EVOO from soy and corn. The present work is carried out without reagents or esterification processes and considering the entire frequency range without any spectral windows selection, drastically reducing time and costs. Toscana, Lazio, Puglia and Calabria result regions well reproducible in terms of geo-traceability, unlike the Sicilia and Umbria. The model extracts spectral benchmarks in EVOO in the following vibrational modes at 3004, 2952, 2922, 2852, 1742 and 1160 cm-1.

Introduction
The Extra Virgin Olive Oil (EVOO) represents a crucial ingredient of the Mediterranean diet (Battino et al., 2019). Recently, the consumption of EVOO has increased worldwide, even outside the Mediterranean and European countries because of benefits for human health due to high content of oleic acid, the monounsaturated fatty acid C18:1 and its richness in bioactive phenolic compounds, which act as natural antioxidants (Bendini et al., 2007; Lerma-García et al., 2010). It is a food where the series of major compounds (98–99 w%) is mainly made of triacylglycerols or triglycerides (TAG) and from the group consisting of free fatty acids (FFA) and diglycerides (MAG and DAG). Other derivatives of fatty acids (FA) are traditionally included in the minor compounds, consisting of sterols, aliphatic alcohols, hydrocarbons, biophenols, tocopherols and volatile compounds. Quality, geographical identification and traceability of EVOO are the main tasks to be care because of the beneficial health impact, such as reducing the incidence of cardiovascular and age-associated disease (Nocella et al., 2017), and its key role in the food industry (Varzakas, 2021). The uniqueness of some EVOO products is related to their cultivar, environment and agronomical/cultural practices (Losito et al., 2021). EVOO has expensive cost of production due to the hard and time-consuming tasks involved in the cultivation and harvesting of the olives. The producers should be guaranteed EVOO quality and geographical origin, justifying the high purchasing cost. In this context, the adulterations of EVOO with olive oils of lower quality, or with a different botanical origin, represent a problem. Italy is one of the most important countries in the world in terms of olive oil supply and demand: the Italian olive heritage contains over 500 varieties (Piscopo et al., 2021) amongst which Coratina, Leccino, Nocellara, Sinopolese (typical of South Italy) (Sicari et al., 2021) or Leccino-Frantoio, Carboncella, Itrana (Lazio), Moraiolo (Toscana), Coratina (Puglia). Previous works are devoted to study the EVOO for nutritional health purpose, diseases relapses, investigating cultivars and adulteration processes (Camposeo et al., 2021; Drira et al., 2021; Tomé-Rodríguez et al., 2021; Taiti et al., 2022) by destructive procedures such as extraction, deuteration and transesterification (Gurdeniz et al., 2007; Hirri et al., 2015; Drira et al., 2021; Mousa et al., 2021; Stilo et al., 2021). In this context, spectroscopic techniques combined with computational analyses are the new trend scenario (Mohamed et al., 2018; Jamwal et al., 2021; Zaroual et al., 2021). Fourier-transform infrared spectroscopy (FTIR), is an excellent tool, followed by multivariate treatment of the spectral data, used to classify vegetable oils according to their botanical origin (Sota-Uba et al., 2021) to distinguish EVOO from different geographical origins (Tapp et al., 2003; Bendini et al., 2007; Scatigno & Festa, 2021) and different genetic varieties (Sota-Uba et al., 2021) or for authentication purposes (Mohamed et al., 2018; Jamwal et al., 2021), focused on particular spectral regions of interest (Jiménez-Carvelo et al., 2017). Regarding the Italian EVOO, fingerprints’ identification is employed procedures such as phenolics extraction and sterols (Brodnjak-Vončin et al., 2005; Lerma-García et al., 2011). The first geo-discrimination traceability was recently carried out via X-ray Fluorescence spectroscopy (Scatigno & Festa, 2021). The preliminary results have opened new perspective in geomarkers’ identification pointing out different elemental interregional contributions as well the presence of specific trend in the same region of belonging, till contamination by heavy metals.
Here, attenuated total Reflectance-Fourier Transform Infrared Spectroscopy (ATR-FTIR) coupled with Machine Learning is applied to unveil spectroscopic benchmarks in the Italian EVOO coming from six Italian geographical regions. The present work is carried out without reagents or esterification sample treatments and considering the entire frequency range (3890 to 650 cm-1) without any spectral windows selection. The recorded spectra are analysed with Soft Independent Modelling of Class Analogy (SIMCA), Support Vector Machine Regression (SVMR) and Principal Component Regression (PCR) aiming to verify the geographical traceability, the cultivar and the repeatability of the agronomical practices, till the adulterated EVOO from several seed oils such as soy, corn, sunflower and linseed. The procedure finds a distinct way to benchmark the olive variety and the agronomic practices amongst the interregional areas. This work has an immediate impact on the time-consuming reducing pre-processing data time.
Materials and methods
Samples and data collection
A total of 72 samples of EVOOs (monocultivar and blend), seed oils, soils and irrigation waters are employed in this study (details are reported in Table S1, Figure S1). They were kindly donated by the consumers, local producers and by V&P Food Group Srl (botanical origin and quality grade of all the samples guaranteed thanks to PDO and PGI traceability), a company located at Siena (Toscana, Italy) where their bottling took place. Six pure standards typical of three Italian regions (Puglia, Sicilia and Lazio), and fifteen standard mixtures are used to validate the spectroscopic benchmark data set. Additionally, a Greek standard monocultivar (named Koroneiki) was added to the data set because these olive trees are also present in Italy. The samples are collected in the same harvest (2020/2021) to be sure to consider an agricultural scenario as homogeneous as possible. Two additional EVOO samples (from 2018 to 2019 and 2019 to 2020 olive harvests) and other samples of waters irrigation and soils from Lazio and Puglia were also collected (Table S1) to study the repeatability and reliability of the agronomical–cultural practices such as processing technologies, storage conditions and cultivars. Finally, to study the adulteration processes, seven seed oils such as sunflower, walnut, linseed, soy and corn are implemented in this work. Instrumental set-ups are reported in SI.
Regarding the machine learning analyses, the spectral data are traduced in a data set into two matrices of 13486 × 68 size and its transpose and used in the following techniques: (i) Soft Independent Modelling Of Class Analogy (SIMCA, model used PCA with 3 components), as supervised pattern recognition method, assigns unknown sample to the class models according to the proximity to the training samples. To associate the variables distant from the model to specific energy range or vibrational modes, the SIMCA is applied also to the matrix transpose. A class membership of 0.25 alpha is used in the 68 × 13486 matrix while 0.05 alpha for the 13486 × 68 size; (ii) Support Vector Machine ν-regression, v-SVMR, (Polynomial Kernel function with 3 degree), a statistical learning method, to highlight small subset of training points. The correlations between predicted and reference are close to the calibration that indicates that the data does not over-fitted in the calibration stage; (iii) Finally, a Principal Component Regression (PCR, SVD, Singular Value Decomposition) is performed due to the variables express common information and how highlighted by the PCA where the collinearity is pointed out. The Unscrambler X software (AspenTech company, Bedford, MA, USA, Camo Analytics Unscrambler (AspenTech Camo Analytics )) is used for all of the three analyses.
Results and discussion
Figure 1 shows ATR-FTIR vibrational modes observables for common triglycerides, the principal components in fats and in all vegetable oils (Moharam & Abbas, 2010; Jović et al., 2013). The spectra present a similar trend, and some differences are observed in the relative peak absorbance intensity, in the peak shape and in the bands shift (Figure S2). These effects on the recorded spectra are due to processes such as the type of the raw material, the adulteration and the thermo-oxidative processes in oil heating (Vlachos et al., 2006). The observation of an elevated number of spectra to unveil minimal differences can be find out by the application of extraction procedures such as Machine Learning algorithms. The latter is useful to highlight minimal or hidden differences, pointing out the benchmarks. The entire range from 3890 to 650 cm-1 is used in the ML analyses. The spectroscopic data set was translated in two matrices of 13486 × 68 size and its transpose. In this context, after the verification of the tendency to cluster by classes, due to principal component analysis (PCA), a prediction class membership is carried out using SIMCA. SIMCA measures of the influence of some variables over the given PCA model. The modelling power plot (Fig. 2), with a class membership of 5%, points out 10 samples (amongst EVOOs and seed oils) that have a modelling power larger than 0.3, which means that these samples are important for describing the PCA model. The corresponding modelling power plot in the variables’ domain (ATR-FTIR, energy transfer in wavenumbers) represents all the detected vibration modes plotted simultaneously and weighed over the samples, as reported in Fig. 2 where different energy ranges are classified as 1 (if there are similar spectral features) or 0 (if there are peak shift, peak shape differences). Looking at the energy ranges characterised by spectral differences all over the data set, it is possible to link these spectra effects to some processes. This is the case of the range 1734–1752 cm-1, associated with the vibrational modes of the triglycerides (C=O ester carbonyl) denoting the different presence of secoiridoids in the EVOO samples (resulting in a band’s shift in the spectra), a significant variance that can be linked to the regional provenance (Qusa et al., 2019). The ranges between 3005 and 3015 cm-1 are linked with the adulteration shift bands of the seed oils (spectra are reported in Figure S2a). ν-SVMR separates the data set in two classes with 34 support vectors required for the regression model (see Fig. 3). A first group includes 58 samples coming from Toscana, Lazio, Puglia and Calabria. Inside this, group 8 samples, labelled in red in Fig. 3, are set close to two seed oils such as soy and corn. This is attributed to a possible adulteration process.

ATR-FTIR spectra of EVOO and seed oils samples. (a) 3050–2800 cm-1 region; (b) 1830–650 cm-1 region. The blue numbers are associated with each vibrational mode and are listed in Table S2.

SIMCA—modelling power plot and associated class of membership of the PCA model in the wavenumber ranges. The asterisks in purple are associated with the band shift of the seed oils (1489, 1642 cm-1).

ν-SVMR prediction plot. The plot shows the reference values and the calibrated ‘predictions’. The support vectors (SV) are marked by a circle. The EVOOs are highlighted by the same colour of the region of belonging (for more details see Figure S1). The RMSECV (0.046) is close to RMSEC (0.044) value which indicates that the model has not been over-fitted.
The second group is composed by the 10 samples highlighted in the SIMCA previous analysis. The Itrana varietal, a specific cultivar of Gaeta Roman area, does not represent our Lazio samples. The other certified monocultivars well represent the data set, as can be seen from their homogeneous distribution. It can be notice that the spectroscopic data belonging to the same EVOO producers but in different harvests (2018–2022) such as the Gargano (Puglia) and Ronciglione (Lazio), are close in the prediction plot. This was attributed to the repeatability of the agronomic activities.
To extract the fingerprint regions responsible for EVOOs discrimination via spectral differences, a regression analysis is also carried out (Fig. 4). Amongst the vibrational modes, the stretching of the ester and the symmetric CH2 bond are the more intense (v = 1742 cm-1). This means that the unsaponifiable fraction of EVOO, characterised by secoiridoids, represents a distinct spectral vibration in all the data set as found in the literature (Schönemann & Edwards, 2011; De Ceglie et al., 2020). Another spectral benchmark is the antisymmetric (2922 cm-1) and symmetric (2852 cm-1) stretching of C–H in CH2, as well C–H ν in cis-C=C (3004 cm-1) and the asymmetric CH3 ν (2952 cm-1). Amongst the vibrations associated with C–O ν, that at 1160 cm-1, is a benchmark. In the regression analysis, the water and soils samples are added to the initial data set to point out the potentiality of the extraction procedure of different systems. The regression distinguishes the different system as separated by the EVOOs’ profiles.

PCR Loading Plot. Stretches of the triglycerides results as benchmarks. In the negative profile, other spectroscopic benchmarks, such as -CO3v1–v4, associated with water and Si-O-Siv and sym. Si-Ov soil fingerprints reported with blue and red asterisks respectively (the soils spectral attribution are reported in SI, Figure S3).
Conclusions
From a spectroscopic point of view, the differences in EVOO and in the vegetable oils are attributable to minimal changes in the relative peak absorbance intensity, in the peak shape and in the bands shift.
Here, a consistent data set of EVOO come from six Italian regions are studied to extract benchmarks thorough Machine Learning. Three different classification and regression analyses such as SIMCA, v-SVMR and PCR are carried out to point out those minimal spectral differences and individuating consistent and reliability in the regional trends as well as the ‘singular vectors’ responsible for some groupings and which deviate from the computation model till the extraction of distinct and representatives’ vibrational modes of the Italian EVOO fingerprint. Moreover, the present methodology uses samples without reagents and esterification procedures, considering the entire energy FTIR spectrum without any windows selection. SIMCA model has allowed to identify the energy ranges where differences, even minimal, are present as the ranges of 3005–3015 cm-1 linked with the adulteration shift bands of the seed oils or range 1734–1752 cm-1 associated with variance of the triglycerides in the matrix analysed. v-SVMR discriminates the geo-traceability, Toscana, Lazio, Puglia and Calabria result well reproducible regions, unlike the Sicilia and Umbria samples. Regarding the agronomical practices, the Gargano (Puglia) and Ronciglione (Lazio) samples from different harvests (2018–2021) are studied, showing the reproducibility of the measurement procedures. PCR model extracts spectral benchmarks in EVOO in the following vibrational modes: the symmetric CH2 ν (1742 cm-1), the antisymmetric (2922 cm-1) and symmetric (2852 cm-1) stretching of C–H in CH2, as well C–H ν in cis-C=C (3004 cm-1), the asymmetric CH3 ν (2952 cm-1) and the C–O ν (1160 cm-1).
This procedure intends to look ahead, proposing this pilot study as, for example screening method in the food industry, in large-scale industries, where it is important to know the evaluation of some parameters quickly. The Machine Learning represents a potential discriminating technique due to the distinct way of discriminating EVOO or any nutritional food based on their spectral imprint on a large data set.
Acknowledgments
The authors are grateful to Dr. Carlo Alberto Leccardi, QA Office and Food Safety, of the V&P Food Group Srl to provide us the standards and all the local producers and consumers that gave their EVOO for the study. C. Scatigno thanks the CREF for her post-doctoral PhD (Prot. 0000500).
Author contribution
Claudia Scatigno: Conceptualization (lead); data curation (lead); formal analysis (lead); investigation (lead); methodology (lead); project administration (supporting); software (lead); supervision (equal); validation (lead); visualization (lead); writing – original draft (lead); writing – review & editing (equal). Giulia Festa: Conceptualization (supporting); project administration (lead); supervision (supporting); visualization (supporting); writing – review & editing (equal).
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Permission to reproduce material from other sources. All the contents or materials are originals. No permissions are being necessaries.
Ethics statement
This article does not contain any studies involving human participants performed by any of the authors.
Peer review
The peer review history for this article is available at https://publons.com/publon/10.1111/ijfs.15735.
Data availability statement
The data generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
This reference was selected to demonstrate how the discrimination analysis (PCA) in vegetable oils were carried out through destructive methods (GC) and with esterification processes albeit supported by chemometric techniques. The treatment of the samples has an economic and time commitment impact.
This reference was selected to support the identification of FTIR fingerprints through techniques such as some used in our study (SVM and SIMCA) but with the use of extraction methods. The latter, have an economic and time commitment impact.
This reference was selected to support the present work. In particular, the results obtained in the previous work demonstrate how it is possible through advanced chemometric analyses to identify the geo-traceability of EVOO in the Italian regions through elemental markers obtained on the same data set without reagents. Even the repeatability of agronomic practices agrees with the results obtained in this work.
This reference was added, following the suggestion of a reviewer, to demonstrate once again how the identification of EVOO fingerprints through discrimination and classification methods is the new trend scenario in Food Science. Again, the discrimination methods were applied after subjecting the samples to reagents.