Abstract

Machine learning (ML) can accelerate biological research. However, the adoption of such tools to facilitate phenotyping based on sensor data has been limited by (i) the need for a large amount of human-annotated training data for each context in which the tool is used and (ii) phenotypes varying across contexts defined in terms of genetics and environment. This is a major bottleneck because acquiring training data is generally costly and time-consuming. This study demonstrates how a ML approach can address these challenges by minimizing the amount of human supervision needed for tool building. A case study was performed to compare ML approaches that examine images collected by an uncrewed aerial vehicle to determine the presence/absence of panicles (i.e. “heading”) across thousands of field plots containing genetically diverse breeding populations of 2 Miscanthus species. Automated analysis of aerial imagery enabled the identification of heading approximately 9 times faster than in-field visual inspection by humans. Leveraging an Efficiently Supervised Generative Adversarial Network (ESGAN) learning strategy reduced the requirement for human-annotated data by 1 to 2 orders of magnitude compared to traditional, fully supervised learning approaches. The ESGAN model learned the salient features of the data set by using thousands of unlabeled images to inform the discriminative ability of a classifier so that it required minimal human-labeled training data. This method can accelerate the phenotyping of heading date as a measure of flowering time in Miscanthus across diverse contexts (e.g. in multistate trials) and opens avenues to promote the broad adoption of ML tools.

Introduction

Artificial intelligence (AI) and machine learning (ML) present enormous opportunities for accelerating scientific discovery, especially in biological research where large-scale, complex problems are commonplace (Burke and Lobell 2017; Xie et al. 2021; Varela et al. 2022b; Wang et al. 2023). However, advanced AI/ML methods require substantial amounts of annotated data for training purposes and are highly context dependent; i.e. they do not perform reliably in contexts beyond that covered by the training data (Paullada et al. 2021; Wang et al. 2022). Biological research is an especially challenging use case for image classification problems because the appearance and function (i.e. phenotype) of an organism are the result of complex interactions between genotype, natural environment, and human intervention (Hayes et al. 2023). For example, the performance of a particular crop depends on variation in genotype, growing conditions, and management practices (Zhao et al. 2022; Cooper et al. 2023). This contrasts with everyday objects, which can take a variety of forms, but are fixed in time and space such that a given object (e.g. a teapot) will not change shape, texture, or color depending on the location and time at which it is imaged under standardized conditions. The potential for AI- and ML-enabled approaches to be applied to biological research has been demonstrated across many scales from cells to organs, organisms, communities, and ecosystems. However, this high contextual diversity means existing AI/ML tools will need to be retrained—at considerable cost and effort—for each new biological context in which they are to be used (Moen et al. 2019; Hesami et al. 2022), limiting their adoption. The increasing spatial, temporal, and spectral resolution of sensors, along with cloud computational processing, is all enhancing our ability to supply more sensor data for phenotyping (Lin et al. 2023). However, the greater spatial and temporal resolution of sensor data will only be fully exploited if it is matched by greater resolution in the human-annotated data used for training ML tools that find associations between the 2 data sets. Existing methods to minimize human effort in the production of annotated data include data augmentation (Shorten and Khoshgoftaar 2019), pseudo-labeling, and label propagation (Iscen et al. 2019; Gan et al. 2022). Meanwhile, active learning (Nagasubramanian et al. 2021), transfer learning (TL) (Tran et al. 2019), and semi-supervised learning (van Engelen and Hoos 2020) can reduce the need for annotated data in the training process. Recent research efforts to overcome these challenges include developing AI techniques that more rapidly generalize from few examples (Parnami and Lee 2022). However, while each of these approaches can deliver valuable benefits, the necessity for further innovation to address the intertwined issues of training effort and context dependency is clear (Alzubaidi et al. 2021; Sapoval et al. 2022; Ahmed et al. 2023).

This study proposes a generative adversarial learning strategy as an alternative to traditional supervised learning and TL, aiming to minimize the human-based supervision required for a computer vision tool. This involves exploiting the ability of a generative adversarial network (GAN) to learn the salient features of the data from unlabeled images captured with an aerial platform. It is anticipated that allowing the model to learn the underlying latent space representation in the data can be leveraged to enhance the model's discriminative ability in a classification task with minimal human assistance. A key feature of this approach is using a coinformative learning strategy between the unsupervised and supervised classifiers within the GAN. This is intended to allow learning of the salient features of the large, unlabeled image set to be complemented by the use of a smaller pool of labeled images to efficiently achieve the classification task. We describe this approach as an Efficiently Supervised GAN (ESGAN).

A case study of the proposed approach is performed by classifying thousands of diverse, field-grown Miscanthus genotypes as having produced panicles, or not, on a given date in a time course of imagery collected by an uncrewed aerial vehicle (UAV, or uncrewed aerial system, or drone). Biomass and valuable chemical compounds from dedicated bioenergy crops are expected to play a central role in the provision of more sustainable energy and bioproducts (Somerville et al. 2010; Martinez-Feria et al. 2022; Eckardt et al. 2023). Miscanthus sacchariflorus and Miscanthus sinensis are crossed to produce very productive, sterile hybrids (Heaton et al. 2008). Flowering time is a key trait influencing productivity and adaptation of Miscanthus to different growing regions (Jensen et al. 2011; Clifton-Brown et al. 2019). Flowering time in Miscanthus, like many other grass crops, can be assessed in terms of “heading date,” i.e. when panicles are outwardly visible in 50% of the culms that reach the top of the canopy (Li et al. 2006; Crowell et al. 2016; Clark et al. 2019; Wu et al. 2019). Repetitive visual inspections of thousands of individuals grown in extensive field trials are very labor intensive (Clark et al. 2019). Repeated assessment of a crop trial to assess when in a seasonal time course, a panicle is first observed then allows estimation of heading date. Increasing the frequency with which the crop is assessed increases the precision of heading date estimates and also increases labor and has motivated the development of ML-enabled remote sensing tools to identify reproductive organs and to assess if plants have reached developmental milestones (Wu et al. 2019; Liu et al. 2020; Fan et al. 2022). In Miscanthus, a fully supervised 3D-convolutional neural network (CNN) assessed heading date from UAV images (Varela et al. 2022b). However, the common challenges of context dependency demand for substantial training data and limit to generalization ability remain if such tools are to be widely adopted (Rasmussen et al. 2022). Therefore, this is just one of many phenotypes for which reducing the demand for manual supervision of ML tools will be valuable.

This study tests the ability of ESGAN to classify aerial images of individual plants of M. sacchariflorus and M. sinensis on the basis of panicles being visible or not, i.e. the most repeated and labor-demanding step in heading date determination. The performance of ESGAN is compared to various popular algorithms based on the fully supervised learning (FSL) paradigm and traditional TL with varying degrees of complexity, including K-nearest neighbor (KNN), random forest (RF), custom CNN, and ResNet-50. This analysis was repeated as the number of annotated images provided to train a given model was reduced from 3,137 (100%) to 32 (1%), while simultaneously providing ESGAN with access to the complete set of unannotated images (i.e. n = 3,137). The objective is to understand the trade-offs between predictive ability and the level of dependence on manual annotation for each of the algorithms. In addition, we test how ESGAN exploits its unique generative and adversarial learning strategy to leverage its own predictive ability. Finally, class activation visualization (Selvaraju et al. 2020) is used to understand how ESGAN exploits the information in the images to maximize its predictive ability.

Results

Benchmarks and ESGAN algorithms evaluation

As a baseline, all 5 model types were able to correctly classify whether plants have reached heading or not when provided with the full (100%) training data set of 3,137 images (Fig. 1, A, B, and H). The convolutional-based models CNN, ResNet-50, and ESGAN models all performed well (overall accuracy [OA] = 0.89 to 0.92, F1 score = 0.87 to 0.90) and had superior performance than the tabular methods of KNN and RF (OA = 0.78 to 0.79, F1 score = 0.73 to 0.76).

Evaluation of heading detection in testing data. Performance of benchmarks and ESGAN algorithms under an increasing number of annotated samples via OA A) and F1 score B) metrics. Error bars represent the Sd of performance metrics after 3 training and testing iterations. Performance evaluation using ROC analysis is also presented for the same models under the same conditions C to H). See Materials and methods for explanation of metrics.
Figure 1.

Evaluation of heading detection in testing data. Performance of benchmarks and ESGAN algorithms under an increasing number of annotated samples via OA A) and F1 score B) metrics. Error bars represent the Sd of performance metrics after 3 training and testing iterations. Performance evaluation using ROC analysis is also presented for the same models under the same conditions C to H). See Materials and methods for explanation of metrics.

All model types demonstrated some reduction in the ability to detect heading accurately as the amount of annotated training data was reduced, but to very different degrees. For ESGAN, the penalty for reducing the number annotated images used for training down to 1% of available data (32 images) was negligible in terms of OA (decline from 0.89 to 0.87), F1 score (decline from 0.87 to 0.85), and receiver operating characteristic (ROC) analysis (Fig. 1). TL using ResNet-50 was the next most robust method, maintaining performance as annotated training data were reduced to 10% (314 images), before being heavily penalized as the amount of annotated training data declined further (Fig. 1). CNN performed at an intermediate level, maintaining performance as annotated training data were reduced to 30% (941 images), before being heavily penalized as the amount of annotated training data decline further (Fig. 1). KNN and RF were less sensitive than CNN and ResNet-50 to reductions in the amount of annotated training data provided, but this only partially compensated for the poorer baseline performance of KNN and RF (Fig. 1).

When the amount of annotated data was most restricted (1% of data available for training), ESGAN's performance (OA = 0.87 to 0.89, F1 score = 0.85 to 0.87) was substantially better than all other models (OA = 0.43 to 0.75, F1 score = 0.16 to 0.72) (Fig. 1, A and B). This also agreed with ROC analysis, where ESGAN was the most effective model for correctly identifying the 2 image classes when fewer than hundreds of annotated images were available for training (Fig. 1, C and D).

Understanding the predictive improvement of ESGAN

The ability of ESGAN to accurately determine heading of plants from aerial imagery can be explained by the synergic contributions of ESGAN's generator and ESGAN's discriminator submodels. The ability of the ESGAN generator to improve the visual representations of “fake” images was notable during the training process (Fig. 2, A, B, D, and E). The initial attempts of the ESGAN generator to generate images produced very noisy and unrealistic representations of Miscanthus plants (Fig. 2, A and D). It was notable that the ESGAN generator submodel progressively learned to better match the RGB color intensity and spatial distribution of pixels of the real images turning them into very realistic representations of plants (Fig. 2, B and E). This improvement was in agreement with the increasing performance of the ESGAN discriminator (Fig. 2, C and F) along the successive minibatch steps of training, where the ability of this submodel to identify plants with panicles consistently improved regardless of whether very few (e.g. 32 images, Fig. 2C) or many (Fig. 2F) annotated training images were provided.

Visual representation of “fake” images generated by the ESGAN generator during modeling implementation at early (400) A, D) and advanced (9,800) B, E) training steps. Evaluation of heading detection by the ESGAN discriminator-supervised classifier at early (400) and advanced (9,800) training steps under limited (1%) C) and large (80%) F) number of annotated samples.
Figure 2.

Visual representation of “fake” images generated by the ESGAN generator during modeling implementation at early (400) A, D) and advanced (9,800) B, E) training steps. Evaluation of heading detection by the ESGAN discriminator-supervised classifier at early (400) and advanced (9,800) training steps under limited (1%) C) and large (80%) F) number of annotated samples.

Explaining ESGAN's learning via Grad-CAM

Since the ultimate goal of this study was to maximize the ability of the ESGAN discriminator-supervised classifier to accurately determine the heading status of plants, gaining insights and interpretability about the learning process of this classifier was a key component of the analysis. When interpreting the learning process of this model via Gradient-weighted Class Activation Mapping (Grad-CAM) to highlight which parts of an image contribute the most to a model's decision (Selvaraju et al. 2020), it was notable that the model successfully focused on plant pixels versus background pixels and varied its activation levels depending on the class of image being considered. For plants without visible panicles (Fig. 3A), higher activation regions (yellow) were visibly located over the green areas of the plant, this was especially notable over the upper leaves, while lower leaves and background regions (i.e. soil) were assigned with lower (blue) activation level (Fig. 3C), meaning they were less informative. For the class of plant that had reached heading (Fig. 3B), higher activation was particularly noticeable over the regions of panicles (i.e. silver-white objects) of the plants, while the model assigned lower activation level to vegetative tissues (Fig. 3D).

Visualization of examples real RGB images and Grad-CAM activation maps. Example preheading A) plant class and the corresponding activation map C) extracted from ESGAN D supervised classifier. Example postheading B) plant class and the corresponding activation map D). Activation levels in the images are represented on a 0 to 255 scale.
Figure 3.

Visualization of examples real RGB images and Grad-CAM activation maps. Example preheading A) plant class and the corresponding activation map C) extracted from ESGAN D supervised classifier. Example postheading B) plant class and the corresponding activation map D). Activation levels in the images are represented on a 0 to 255 scale.

Evaluation of labor requirements for ESGAN versus human evaluation of heading status

The combined Miscanthus breeding trials studied here featured 3,040 plots, including 12,400 individual plants at the time of establishment (1 per plot for M. sacchariflorus and 10 per plot for M. sinensis). Heading status of each plant was assessed on 3 occasions. Visual inspection by humans walking through the trials, including recording of data on an electronic device, required approximately 10.5 person-seconds per plant or 36 person-hours in total on each occasion that phenotyping was performed (Table 1). By comparison, the time demand could be reduced >8-fold to 4.33 person-hours in total, or ∼1.2 s per plant, when acquiring images by UAV and analyzing them with ESGAN (Table 1). This reduction in time commitment reduces labor requirements below the threshold where, weather permitting, a single person could maximize the accuracy of heading data estimates by performing phenotyping on a daily basis.

Table 1.

Description of activities and time required to phenotype the heading status of Miscanthus breeding trials by traditional visual inspection on the ground versus UAV imaging plus analysis by ESGAN

Visual inspection by humans on the groundUAV imaging and ESGAN analysis
ActivityTimeActivityTime
In-field evaluation and data recording on an electronic device.36 hFlight planning and execution1 h 20 min
Image processing2 h 40 min
Image chip generation and ESGAN predictive inference20 min
Total 36 hTotal 4 h 20 min
Visual inspection by humans on the groundUAV imaging and ESGAN analysis
ActivityTimeActivityTime
In-field evaluation and data recording on an electronic device.36 hFlight planning and execution1 h 20 min
Image processing2 h 40 min
Image chip generation and ESGAN predictive inference20 min
Total 36 hTotal 4 h 20 min

Data correspond to the effort required to phenotype the 3 trials (3,040 plots) in this study on 1 occasion, i.e. at a single point in a seasonal time course.

Table 1.

Description of activities and time required to phenotype the heading status of Miscanthus breeding trials by traditional visual inspection on the ground versus UAV imaging plus analysis by ESGAN

Visual inspection by humans on the groundUAV imaging and ESGAN analysis
ActivityTimeActivityTime
In-field evaluation and data recording on an electronic device.36 hFlight planning and execution1 h 20 min
Image processing2 h 40 min
Image chip generation and ESGAN predictive inference20 min
Total 36 hTotal 4 h 20 min
Visual inspection by humans on the groundUAV imaging and ESGAN analysis
ActivityTimeActivityTime
In-field evaluation and data recording on an electronic device.36 hFlight planning and execution1 h 20 min
Image processing2 h 40 min
Image chip generation and ESGAN predictive inference20 min
Total 36 hTotal 4 h 20 min

Data correspond to the effort required to phenotype the 3 trials (3,040 plots) in this study on 1 occasion, i.e. at a single point in a seasonal time course.

Before ESGAN can be deployed to analyze UAV imagery, it must be trained on human-annotated images. The number of training images annotated by in-field, human phenotyping needed to maximize how accurately plants were classified as having reached heading or not was substantially fewer for ESGAN (∼32 images) than for TL by ResNet-50 (∼314) or a traditional, fully supervised CNN (∼941 images). Based on the average time to phenotype each plant, this means that the time required to collect annotation data in each new context that a model would be used decreases by an order of magnitude for ESGAN relative to TL and CNN (Fig. 4A).

Evaluation of A) time for acquiring annotation data for training for models that accurately classify images (OA > 0.85) and B) training time for each model relative to the number of annotated samples in the training data.
Figure 4.

Evaluation of A) time for acquiring annotation data for training for models that accurately classify images (OA > 0.85) and B) training time for each model relative to the number of annotated samples in the training data.

In addition, the training time for ESGAN varied from ∼750 to 900 s depending on the number of annotated samples analyzed. This was 3- to 4-fold slower than for other learning methods (Fig. 4B). However, this increase in computational time is small compared to the gains in efficiency with respect to fieldwork (Fig. 4A).

Discussion

This study successfully demonstrated that an ESGAN approach can substantially reduce the amount of human-annotated training data needed to accurately perform an image classification task. Only tens of human-annotated images were needed to achieve high levels of accuracy in detecting plants that had reaching heading, or not, even when the problem was presented in the challenging context of a large population of Miscanthus genotypes, which feature a wide diversity of visual appearance both before and after heading. By contrast, hundreds of human-annotated images were needed to train a TL tool (ResNet-50), and thousands of annotated images were needed to train a fully supervised CNN. Meanwhile, KNN and RF were not able to classify images with high levels of accuracy, even when provided with thousands of training images. These findings highlight how a generative and adversarial learning strategy can provide an efficient solution to the common problem of needing large amounts of annotated training data for high-performing FSL DL approaches. This is a particularly significant discovery for the many potential applications of computer vision, such as high-throughput phenotyping in crop breeding, where frequent retraining of a DL model is needed to cope with strong context dependency of outcomes. The time required to acquire imagery by UAV and perform analysis with the ESGAN tool was ∼8-fold less than the time required for people to visually assess and record the heading status of Miscanthus while walking through the field trials. The time required to train any of the ML models is trivial relative to the time required for data acquisition. Combined with reducing the requirement for training data by 1 to 2 orders of magnitude by using ESGAN versus FSL or TL, this represents a major reduction in the effort needed to develop and use custom-trained ML models for phenotyping heading date in trials involving other locations, breeding populations, or species. For the Miscanthus breeding program at UIUC, the reduction in labor on each occasion the heading status of the breeding trials is assessed, from 36 to 4.33 person-hours, creates the opportunity to increase the frequency of assessment from once per week to once every 2 or 3 d, and thereby increase the accuracy of heading date estimates (Varela et al. 2022a).

The power of ESGAN is valuable to research in the biological science domain, particularly at the intersection of remote sensing, precision agriculture, and plant breeding. The integration of automated data collection based on noncontact sensors and ESGAN can provide a cost-effective solution for exploiting large volumes of unannotated inputs, which can be collected at relatively low cost using remote sensing platforms. It can reduce dependence on large annotation data sets while achieving performance equivalent to traditional FSL approaches. Making these advances in a highly productive perennial grass, such as Miscanthus, is particularly important and challenging because these crops are more difficult to phenotype, i.e. highly segregating outbred populations with each individual genetically unique, and voluminous perennial plants that grow larger each year make field screening by humans on the ground more difficult and time-consuming than in annual, short-stature crops (Varela et al. 2022b). Implementing this ESGAN-enabled strategy may allow breeders to grow and evaluate larger populations in more locations as a means to accelerate crop improvement (Lewandowski et al. 2016) but at lower cost given the reduced dependence on manual annotation. It will be interesting to test how the ESGAN applied can be transferred to assess heading in other important crops including maize (Zea mays), sorghum (Sorghum bicolor), rice (Oryza sativa), wheat (Triticum aestivum), and switchgrass (Panicum virgatum), which also have panicles visible at the top of the canopy. The focus would shift to supplying a reduced number of highly quality and strategic annotations, while relying on the generative and adversarial element of the ESGAN to reduce the gap in predictive ability instead of depending on large data collection campaigns required for robust FSL implementations.

Previous studies have reported the use of high spatial resolution remote sensing imagery to detect reproductive organs in plants using pixel-wise classification (Kurtulmuş and Kavdir 2014; Kumar et al. 2021) and morphological operations (Zhao et al. 2021). While these studies demonstrated relevant advances for rapid detection of maize tassels and sorghum panicles, they heavily relied on manual supervision, for example, to determine optimal features for the supervised classifier or morphological filtering. This makes these approaches more highly context dependent, where they would require continuous retraining when exposed to new cases (i.e. a different geographic location or year of the breeding trial in which environmental conditions alter crop appearance), which can present important challenges for successful scalability (Zhou et al. 2017). More recently, convolutional-based deep learning has been successfully implemented to detect reproductive organs of plants via classification (Zan et al. 2020) and object detection (Liu et al. 2020). The convolutional operation of the algorithm enables the model to fully exploit the information in the image, whereas both the intensity of the signal and the spatial arrangement of pixels in the image can be informative features to characterize the target trait (Yamashita et al. 2018). This also enables fully automatized feature extraction that provides a clear advantage over traditional machine learning, which heavily relies on expert knowledge and manual feature engineering to identify meaningful features (O’Mahony et al. 2020). Nevertheless, the cost of creating a large volume of annotated data sets can be operationally and financially unfeasible in numerous applications (Weinstein 2018; Lotter et al. 2021; Sager et al. 2021). Previous efforts to reduce dependency on training data have successfully integrated remote and proximal sensing with TL for plant species recognition (Letsoin et al. 2022), seedling detection (Tseng et al. 2022), and disease detection (Kamal et al. 2019). For example, the use of TL significantly reduced the computational time and resources needed when compared to implementing custom CNNs from scratch (Jha et al. 2019). While TL has proven an effective approach, the similarity between the original and target tasks can bring challenges for successful transferability; thus, additional data are often needed (Letsoin et al. 2022) to improve the generalization ability of TL. Moreover, the relationship between transferability, data annotation size, and predictive ability of TL has not been extensively tested in plant science applications. Therefore, the implementation of robust analytical approaches that directly address this challenge and accurately determine differences in the phenological characteristics of individuals in a population with diverse origins can be of significant interest for a challenging task that is typically done by manual inspection (Dong et al. 2021) and requires copious amount of manual annotation for robust FSL implementations (Varela et al. 2022a).

ESGAN clearly outperformed FSL models when only tens of training images were provided. Overall, this highlights the particular ability of ESGAN to exploit unannotated imagery to produce meaningful improvements for more accurate determination of the heading status under minimal annotation. This can be attributed to ESGAN's ability to effectively enrich the latent space representation, which is crucial for classifiers in the discriminator to accurately distinguish between target classes and outperform other convolutional-based benchmark models. A previous study also used a GAN for discrimination of crops versus weeds (Khan et al. 2021) in high spatial resolution aerial imagery, but the reductions in demand for human-annotated training data were significantly less than achieved with ESGAN. This difference may reflect variations in the particular design of the ESGAN's discriminator. ESGAN benefits from using 2 CNNs (supervised and nonsupervised classifiers) that share weights, allowing synergic feature matching even when annotations are severely restricted. Specifically, the architecture design and training sequence of ESGAN ensure that weight updates in 1 classifier affect the other one (Fig. 7B), facilitating feature matching. This design and sequence of steps during training enable the model to synergistically exploit both types of data sources (i.e. annotated and unannotated), providing a clear advantage over the FSL and traditional TL strategies. Even though the main goal of this study was to maximize discriminator-supervised classifier performance, the generative component of the algorithm showed a significant improvement in the quality of the visual representation of Miscanthus plants during the learning process (Fig. 2, A, B, D, and E). This allowed synergistic gains in the performance of the ESGAN discriminator and ESGAN generator as gradient updates and loss function information passed between submodels.

The dependence of the CNN model on voluminous amounts of annotated images was strong. This was not surprising and agreed with previous studies noting the importance of large and high-quality data sets for optimal performance of FSL algorithms (Wang et al. 2021). This constraint was also evident, although to a lesser degree, when using the TL strategy. This demonstrated that the TL strategy was capable of exploiting prior knowledge, but the dependence on annotated images was consistently larger than for ESGAN.

The generally poorer performance of tabular-based algorithms (KNN and RF) versus CNN-based methods (ESGAN, TL, and CNN) suggested that the use of spectral features summarized over an area of interest via statistical descriptors was a suboptimal solution for the effective determination of heading status at the image level. This can in part be explained by the fact that inflorescences of the plants appeared as small silver color objects in the images, and it is logical to argue that the use of spectral descriptors extracted and summarized over an area of interest (i.e. whole image chip) into a unique tabular value may only partially capture these patterns in the image. The capacity of convolutional-based approaches to automatically map spatial-dependent patterns in the images and use them as informative features (Wei et al. 2019) can explain the superior ability of convolutional-based approaches.

Grad-CAM showed that the algorithm prioritized information gain from areas of the image occupied by inflorescences and vegetative tissue as a means to differentiate each class without the need for manual supervision to identify regions of interest. This extends the degree to which expert supervision was not needed during implementation of the analysis. This is particularly important in biological systems, such as crop breeding, where high levels of phenotypic diversity from genetic and environmental sources occur, which would otherwise limit the broad application of existing AI tools.

By reducing the dependence from manual annotation, the traditional requirement for exhaustive field-wide surveying can be alleviated also to determine the heading dynamics. This has important implications for optimizing the operational costs associated with phenotyping trials. Rather than conducting comprehensive surveys of the entire field at each round of evaluations, surveying could focus on representative sections to optimize the operational cost. Complementing these targeted ground surveys with aerial surveys would further enhance temporal coverage by better distributing the operational cost associated, without compromising accuracy in heading status predictions and reducing the cost of capturing finer temporal dynamics. ESGAN's strong predictive performance even with reduced data availability suggests that this hybrid approach could maintain high levels of accuracy across time points, offering a practical, cost-efficient, and scalable alternative for large-scale phenotyping in agricultural research.

Conclusion

The generative-discriminative nature of the proposed ESGAN approach, which is characterized by an adversarial and synergic training of neural networks, presents a promising avenue for reducing the dependence on human supervision and enhancing the model's capacity for generalization through more efficient incorporation of contextual information. In the case study presented, this meant that heading detection in plants could be effectively determined using high-spatial-resolution aerial imagery with only very limited (tens) of human-annotated training images. This represents a significant potential reduction of manual annotation given by ESGAN compared to traditional FSL, all with negligible penalization. These outcomes are valuable for designing future strategies to optimize the integration of manual field screening efforts and aerial data collection. More broadly, this work could address the need for advanced modeling techniques that can produce both robust accuracy while reducing the operational cost of collecting time-consuming annotated data for many computer vision problems in plant science applications.

Materials and methods

Field trials

Data were collected from 3 Miscanthus diversity trials located at the University of Illinois Energy Farm, Urbana (40.06722°N, 88.19583°W). The trials were planted in the spring of 2019. This study focused on the second year (2020) of their establishment, which is the first growing season in which the Miscanthus trials are typically phenotyped. The broader aims of the breeding program include assessment of overwintering survival and evaluation of germplasm adapted to a wide range of latitudes and environments. Since plants that were lost to lethal winter temperatures were randomly distributed within the trials, all locations were phenotyped by humans and UAV imaging regardless of survival status. Not all germplasm experienced the photoperiod necessary to achieve a vegetative-reproductive transition and achieve heading at this location.

The M. sacchariflorus trial included 2,000 entries as single-plant plots in 4 blocks, each block including 58 genetic backgrounds (half-sib families) (Varela et al. 2022b). The size of the trial was 79 m long × 97 m wide, and each plot (plant) was 1.83 × 1.83 m size.

One M. sinensis trial included germplasm from South Japan while the other included germplasm from Central Japan (Clark et al. 2015). Each of these 2 trials included 2 blocks, with 130 plots per block. Each plot contained seedlings from a single half-sib family, with 10 plants at a spacing of 0.91 m, requiring transplant of 10,400 individuals in total at the start of the trial. In the M. sinensis South Japan trial, there were 124 families, and in the Central Japan trial, there were 117 families. Therefore, a few families were planted in more than 1 plot per block to avoid leaving empty space. The size of the field that included both of the M. sinensis trials was 115 m long × 121 m wide.

Trait of interest and ground truthing

Every plant in both single-plant and multi-plant plots was phenotyped individually through observation on the ground by an expert evaluator to determine if it had produced panicles or not. A plant was considered to have reached heading once the culms that contribute to the canopy height have 50% panicles that had emerged ≥1 cm beyond the flag leaf sheath. Data were recorded to separately track plants that died or never reached heading. Examples of plants with emerging panicles imaged at the ground level and by UAV are shown in Fig. 5. The M. sacchariflorus trial was inspected on day of the year (DOYs) 248, 262, and 276 and the M. sinensis on DOYs 245, 265, and 280. This matched as close as possible (i.e. depending on optimal weather conditions) the dates of UAV data collection.

Example cases of plant with emerging inflorescences from ground A), plants not yet heading B, C), and plants after heading D, E) from UAV imagery collected in the 2020 season.
Figure 5.

Example cases of plant with emerging inflorescences from ground A), plants not yet heading B, C), and plants after heading D, E) from UAV imagery collected in the 2020 season.

Aerial data acquisition and imagery preprocessing

A Matrice 600 Pro hexacopter (DJI, Shenzhen, China) UAV equipped with a Gremsy T1 gimbal (Gremsy, Ho Chi Minh, Vietnam) mounted with a multispectral RedEdge-M sensor (MicaSense, Seattle, WA, USA) was utilized for aerial data collection. The sensor included 5 spectral bands in the blue (465 to 485 nm), green (550 to 570 nm), red (663 to 673 nm), rededge (712 to 722 nm), and near-infrared (820 to 860 nm) regions of the electromagnetic spectrum. Flights were conducted 3 times (DOYs 247, 262, and 279) in the season corresponding to the period when most inflorescences emerge. The aerial data were collected under clear sky conditions around ±1 h from solar noon to ensure consistent reflectance values across days of data collection. The flight altitude was 20 m above ground level, resulting in a ground sampling distance of 0.8 cm/pixel. Flight settings included 90% forward and 80% side overlapping during data acquisition to ensure high-quality image stitching during postprocessing steps. Ten black and white square panels (70 cm × 70 cm) were distributed in the trials as ground control points (GCPs). An real-time kinematic survey was done using a Trimble R8 global navigation satellite system integrated with CORS-ILUC local station to survey the GCPs to ensure consistent spatial extraction of the image chips between days of data collection. A MicaSense calibration panel was imaged on the ground before and after each of the flights for spectral calibration of the images via an empirical procedure (Poncet et al. 2019). Images were imported into Metashape version 1.7.4 (Agisoft, St. Petersburg, Russia) to generate calibrated surface reflectance multispectral orthophotos. Image processing and analysis were performed with a i9-12900H processor, with 14 cores 32GB RAM, and a NVIDIA GeForce RTX 3080 16GB GPU. The orthophotos from each of the 3 sampling dates were resampled to a common 0.8-cm/pixel resolution and stacked into a 3-band RGB (i.e. red, green, blue bands) raster stack object. Further steps in the analysis consider only the RGB bands of the multispectral sensor for the following reasons: (i) RGB has proven to be highly sensitive and competitive with the rededge and near-infrared spectral regions of the electromagnetic spectrum for monitoring heading in Miscanthus (Varela et al. 2022b). (ii) The use of RGB bands allowed testing of TL as potential alternative approach into the analysis. Image chips for each plot/plant were generated by clipping the stacked orthophoto objects using a polygonal shapefile that includes each plot polygon of the trials. The resulting image chips containing the 3 dates of RGB bands were further split into single date matrix arrays in Python for further analysis. The size of the image chips was 108 pixels × 108 pixels × 3 RGB bands per date.

Data set

After accounting for plants that died due to lethal winter temperatures or never reached heading, a subset of 1,309 genetically diverse plants were identified for which ground truth data and UAV imagery were available on each of the 3 sampling dates during the growing season. This resulted in a data set of 3,921 instances of single-plant images and associated heading status.

Algorithms

KNN and RF

KNN is an extensively used algorithm for pattern classification (Weinberger et al. 2009). The proximity distance between individuals is used to determine class discrimination in a population. The core concept is that the closer the individuals are in the feature space, the higher probability of belonging to the same class (Short and Fukunaga 1981). The advantage of this method is the reduced number of parameters and fast computation, while the downside is the sensitivity to irrelevant features and difficulty for determining the optimal value of the parameter number of neighbors. In our study, after preliminary experimentation, parameter number of neighbors was set equal to 10.

RF is a versatile nonparametric algorithm that has been broadly used in classification tasks (Belgiu and Drăguţ 2016). It exploits bagging and feature randomness to build an ensemble of trees in which prediction by committee tends to be more accurate than in any of the individual trees (Breiman 2001). RF is straightforward to use and requires simple hyperparameter tuning to deliver high predictive performance. Another advantage of this algorithm is that it does not assume normal distribution of data or any form of association between the predictors and the response variable (Probst et al. 2019). Furthermore, as an ensemble of trees, RF is highly capable for managing overfitting. For implementation, parameters number of estimators and maximum depth of trees were optimized via GridSearchCV function in Python.

The KNN and RF algorithms require tabular features as inputs for modeling implementation. Tabular-based features were generated from the image chips using Numpy Stats functions in Python. Statistical descriptors median, range, Sd, percentile 75, percentile 95, and percentile 99 values were utilized to extract tabular feature values from the RGB bands of each of the image chips (based on structural and multispectral bands not contributing additional explanatory power in prior assessment of heading by Miscanthus in UAV images; Varela et al. 2022b). This process generated a total number of 18 features that were further used as inputs of the algorithms to determine the heading status of each of the plants (i.e. image chip level).

Custom CNN

CNN is a deep learning technique successfully utilized for image analysis (Teuwen and Moriakov 2020). The architecture of the algorithm consists of a series of hidden layers that map the input images to output values (Yamashita et al. 2018). The core component of the algorithm is the convolution operation, where a set of trainable kernels are applied to the input image to automatically generate a set of spatial features that best describe the target predictor. The model learns basic features in the first layers and more complex feature representations at deeper layers iteratively (i.e. via gradient loss and backpropagation) (Szegedy et al. 2015). The typical architecture of the algorithm includes a backbone feature generator and classifier or regressor head. In this study, the backbone feature extractor of the custom CNN includes 6 convolutional layers all including maximum pooling and batch normalization. Convolutional Layers 2, 4, and 6 additionally consider 40% features dropout, flattening layer. Then, the backbone feature extractor also included a fully connected layer, batch normalization, and 50% feature dropout. Finally, the classifier head of the CNN includes a sigmoid activation layer that delivers predictions as normalized probability distribution values (i.e. with panicles or without panicles) for each image chip. After preliminary experimentation, the number of features in each layer was set to 32, 32, 64, 64, 128, and 128. Zero padding, stride equal to 1 with no overlapping, and rectified linear unit (ReLU) activation function were also considered in the architecture design. The CNN's kernel filter size was set to 3 pixels × 3 pixels × 3 RGB bands, and max pooling was set equal to 2 following each convolution. Binary cross entropy was utilized as loss function of the classifier head of the neural network.

ResNet-50

TL is a deep learning technique that exploits stored knowledge gained while solving 1 problem that can be then applied to solve a different but related task (Yang 2008). This prior knowledge is stored in large neural networks and then transferred to solve a target task (Ribani and Marengoni 2019). This implies several advantages over training a custom CNN from scratch, e.g. reduction in computational resources and latency for delivering predictions (Jha et al. 2019), and boost in predictive ability over the target task (Wang et al. 2018). Deep neural networks trained on ImageNet data set have reported state-of-the-art performance in TL applications (Kornblith et al. 2019). ResNet-50 (Deng et al. 2009) is a deep 50-layer neural network specifically designed to exploit residual connections between convolutional layers trained on large ImageNet data set. This ensures that weights learned from previous layers do not vanish during backpropagation (Noh 2021), which represents an advantageous trick in the design that enables the use of a large number (i.e. deeper) of layers in the architecture of the network. For implementation, we followed the steps suggested by (Kornblith et al. 2019), where the strategy is as follows: (i) remove the original head of the pretrained neural network, (ii) add a custom binary classifier head, and (iii) fine-tune the top 5 layers while keeping bottom layers frozen. ResNet-50's pretrained weights and biases were imported from Keras (Chollet ()). The original image chips were resampled to a 128-pixel size to fit the input image size of the ResNet-50 network.

ESGAN

The core concept of GAN involves training deep generative networks based on game theory (Goodfellow et al. 2020). The model contains 2 CNN submodels: (i) a generator (G) and (ii) a discriminator (D) that are trained in an adversarial manner. Both G and D are trained to optimize the results, where the goal of G is to mislead D, and the goal of D is to distinguish between fake images generated by G and real images collected with the UAV. GAN has been successfully implemented for image generation (Ahmad et al. 2022), augmentation (Sandfort et al. 2019), and classification (He et al. 2017) tasks. Adapting GANs to the semi-supervised context by forcing the discriminator network to output class labels was discussed in Odena (2016). We explore this direction in the context of plant science applications. During the training process, the data generated by the generator (G) is used to train the discriminator (D). This process enabled D not only to distinguish between real and fake data but also to identify whether a plant has reached the heading stage (Fig. 6). Therefore, it is expected that D can learn features that allow for the discrimination of images with plants prior to, or after, heading using much less human-annotated training data than in FSL.

Diagram of ESGAN and data workflow including the generator (G) and discriminator (D) submodels utilized to assess flowering status.
Figure 6.

Diagram of ESGAN and data workflow including the generator (G) and discriminator (D) submodels utilized to assess flowering status.

ESGAN was implemented following the steps suggested in Salimans et al. (2016), while creating separated classifiers for supervised and unsupervised D (Fig. 7). First, the D supervised classifier is implemented to infer the 2 classes (i.e. plants prior to heading or after heading) from real images using Softmax activation function. Then, the supervised D produces predictive outputs for each image (i.e. between 0 and 1), which represent the normalized probability of the image belonging to the 2 image classes. The D unsupervised classifier is implemented by taking D supervised prior to Softmax activation (i.e. D supervised backbone feature extractor) and reusing its feature extraction layers weights. It then calculates the normalized sum of exponential outputs (i.e. between 0 and 1) via a custom function, which represents the probability of the image being real or fake. This means that updates to one of the classifier models will impact both models.

The supervised loss function (LD supervised) is defined as the negative log probability of y when the correct class is allocated by x. LD supervised focuses on correctly classifying input images to given labels.

(1)

Unlabeled image loss functions constitute the unsupervised loss function (LD unsupervised). Pmodel (y = K + 1|x) represents the probability that x is fake (i.e. traditional GAN), corresponding to 1 − D(x) of GAN architecture. Xu denotes unannotated data samples. The unannotated real images are classified to one of the K classes by the first term of LD unsupervised. The second term in the LD unsupervised classifies the images generated by the G as K + 1 (fake).

(2)

By minimizing LD supervised and LD unsupervised, the classifiers are trained with gradient descent. D weights are stochastically updated by their gradient (Equation 3) at each training step via gradient descent of Equations (1) and (2).

(3)

For all m samples in a minibatch, σ(x)j = Pmodel(y = j|x) (SoftMax function) was applied at the output of D supervised. After some preliminary experimentation, a 72-pixel image size was used as inputs for CNN and ESGAN given the negligent penalization in predictive performance but significant saving on computational time.

Diagram of ESGAN architecture. Components include: G A) and D B) submodels with the corresponding inputs (left vector and cuboid), hidden layers (center cuboids and vectors), and outputs (i.e. right cuboid as fake image in G and classes predictions in D).
Figure 7.

Diagram of ESGAN architecture. Components include: G A) and D B) submodels with the corresponding inputs (left vector and cuboid), hidden layers (center cuboids and vectors), and outputs (i.e. right cuboid as fake image in G and classes predictions in D).

Balanced sampling between annotated and unannotated images at each minibatch iteration was considered to ensure consistent performance of ESGAN during training. G initializes with a latent vector (Fig. 7A, orange vector) as input, which is then reshaped (Fig. 7A, green cuboid) and upscaled through 2 deconvolution (i.e. transpose convolution) operations (Fig. 6A, blue cuboids) into a fake image (Fig. 7A, yellow cuboid) that ensures match to the size of real images (Fig. 7A, yellow cuboid) as output of G. D inputs both real and fake 72 × 72 × 3 (Fig. 7B, orange cuboid) images. It is then followed by 4 convolutional operations and max pooling layers of size 2, followed by a flattening layer and 40% features drop out. The size of the convolutional kernel was 3 × 3 and the Leaky ReLU activation function was applied to all the layers of G and D, except for the output of G, which used the Tanh function. The standard Adam optimizer and learning rate equal to 0.0001 were employed in G and D submodels. The size of the convolutional kernel was 3 × 3. Each classifier could predict the input data to a label y from 2 K classes (plants with or without panicles) or to a fake sample (k + 1 class).

Algorithm implementation and metrics

KNN and RF were implemented using Scikit-learn library, while CNN, ResNet-50, and ESGAN were implemented in Keras, both in Python version 3.9.16. Each model fitting was iterated 3 times using a random training and testing partition to ease the convergence of the models’ prediction metrics. The number of image chips with the corresponding ground truth data was 3,921; 2,021 images came from the M. Sacchariflorus trial and 1,900 came from the M. sinensis trials. The full data set was split (80:20) into training and testing data sets. The training data set was split further (80:20) into training and validation data sets. The validation data set was used to optimize the models’ performance and prevent overfitting during training. CNN and ResNet-50 were trained for up to 300 epochs, while ESGAN was trained for up to 1,000 epochs. Early stopping strategies were incorporated in these 3 models to prevent overfitting, optimize performance, and reduce computational time. The test data set was utilized to expose the models to unseen data to evaluate the generalization ability of the models. As the number of annotated images used for training was altered to generate the 8 sample size cases (Fig. 1, A and B), the number of test images was held constant at the equivalent of 20% of the full data set. This ensures that all models are evaluated on the same test set and size. Training was implemented on batches.

Description of batch training loop of ESGAN is as follows:

  • 1) train D with frozen G weights (i.e. nontrainable):

    • a) train supervised classifier with annotated images, and determine LD supervised, gradient computation, and D weights update.

    • b) train unsupervised classifier with real images, and determine LD unsupervised, gradient computation, and D weights update.

    • c) train unsupervised classifier with fake images, and determine LD unsupervised, gradient computation, and D weights update.

  • 2) train G with frozen D weights (i.e. nontrainable):

    • a) feed D with G fake images.

    • b) extract features and calculate G loss.

    • c) update G weights by their gradient.

The OA, F1 score, and ROC curve analysis were utilized as performance metrics of the models on classifying heading status (i.e. with or without visible panicles). OA and F1 score metrics are described in Equations (5) and (6):

(4)
(5)

In Equations (5) and (6), true positive (TP) is defined as plants with panicles correctly classified as plants with panicles. True negative (TN) is defined as plants without panicles correctly classified as plants without panicles. False positive (FP) is defined as plants without panicles (i.e. ground truth) incorrectly classified as plants with panicles (i.e. positive class). False negative (FN) is defined as plants with panicles (i.e. ground truth) incorrectly classified as plants without panicles (i.e. negative class).

ROC analysis is especially useful for assessing models where the output is a probability score that can be thresholded to produce binary decisions. The technique involves plotting the ROC curve, which is a graphical representation of a classifier's diagnostic ability between TP rate and FP rate at various threshold settings. The area under the ROC curve quantifies the overall ability of the classifier to discriminate between positive and negative classes (Fawcett 2006).

Grad-CAM is a technique used in deep learning to visualize which parts of an image contribute the most to a model's decision. It highlights the regions of an input image that were more important for making a specific prediction (Selvaraju et al. 2020). The technique was implemented to improve the interpretation of the ESGAN D supervised classifier's learning process. The visualizing technique highlights the importance of different regions of the image in the output prediction by projecting back the weights of the output layer onto the convolutional feature maps (Zhou et al. 2016). As recommended by Chollet (), the following steps were used to generate the class activation maps. First, the ESGAN D supervised classifier mapped the input image to the activations of the last convolution layer as well as the output predictions. The gradient of the predicted value for the input image with respect to the activations of the last convolution layer was computed. Each image channel in the feature map array was weighed by how important this channel was with regard to the predicted value, and then all the channels were summed to generate the corresponding activation map array. The Grad-CAM activation map provided a measure of how strongly portions of the image contributed to the predictions made by the ESGAN D supervised classifier visualized in a 0 to 255 scale map array.

Acknowledgments

We thank Tim Mies at the Energy Farm at University of Illinois for technical assistance.

Author contributions

S.V. and A.D.B.L. conceived the study, interpreted the data, and wrote the manuscript. E.S., X.Z., and J.N. established, maintained, and collected the ground truthing data in the field trials. J.R. and D.A. collected the aerial data. S.V. implemented the data pipeline in the study.

Funding

This work was funded by the DOE Center for Advanced Bioenergy and Bioproducts Innovation (U.S. Department of Energy, Office of Science, Biological and Environmental Research Program under award number DE-SC0018420), Artificial Intelligence for Future Agricultural Resilience, Management, and Sustainability Institute (Agriculture and Food Research Initiative (AFRI) grant no. 2020-67021-32799/project accession no.1024178 from the USDA National Institute of Food and Agriculture), and a generous gift from Tito's Handmade Vodka. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the U.S. Department of Energy.

Data availability

The data sets used and coding implementation during the current study are publicly available from the online Illinois Databank at https://doi.org/10.13012/B2IDB-8462244_V2 and GitHub repository https://github.com/pixelvar79/ESGAN-Flowering-Detection-paper.

References

Ahmad
 
W
,
Ali
 
H
,
Shah
 
Z
,
Azmat
 
S
.
A new generative adversarial network for medical images super resolution
.
Sci Rep
.
2022
:
12
(
1
):
9533
.

Ahmed
 
SF
,
Alam
 
MDSB
,
Hassan
 
M
,
Rozbu
 
MR
,
Ishtiak
 
T
,
Rafa
 
N
,
Mofijur
 
M
,
Shawkat Ali
 
ABM
,
Gandomi
 
AH
.
Deep learning modelling techniques: current progress, applications, advantages, and challenges
.
Artif Intell Rev
.
2023
:
56
(
11
):
13521
13617
.

Alzubaidi
 
L
,
Zhang
 
J
,
Humaidi
 
AJ
,
Al-Dujaili
 
A
,
Duan
 
Y
,
Al-Shamma
 
O
,
Santamaría
 
J
,
Fadhel
 
MA
,
Al-Amidie
 
M
,
Farhan
 
L
.
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
.
J Big Data.
 
2021
:
8
(
1
):
53
.

Belgiu
 
M
,
Drăguţ
 
L
.
Random forest in remote sensing: a review of applications and future directions
.
ISPRS J Photogramm Remote Sens.
 
2016
:
114
:
24
31
.

Breiman
 
L
.
Random forests
.
Mach Learn.
 
2001
:
45
(
1
):
5
32
.

Burke
 
M
,
Lobell
 
DB
.
Satellite-based assessment of yield variation and its determinants in smallholder African systems
.
Proc Natl Acad Sci U S A.
 
2017
:
114
(
9
):
2189
2194
.

Chollet
 
F
. Keras applications API. Keras 2015a [accessed 2024 Jan 6]. https://keras.io/api/applications/.

Chollet
 
F
. Grad-CAM class activation visualization. Keras 2015b [accessed 2024 Jan 6]. https://keras.io/examples/vision/grad_cam/.

Clark
 
LV
,
Dwiyanti
 
MS
,
Anzoua
 
KG
,
Brummer
 
JE
,
Ghimire
 
BK
,
Głowacka
 
K
,
Hall
 
M
,
Heo
 
K
,
Jin
 
X
,
Lipka
 
AE
, et al.  
Biomass yield in a genetically diverse Miscanthus sinensis germplasm panel evaluated at five locations revealed individuals with exceptional potential
.
Glob Change Biol Bioenergy.
 
2019
:
11
(
10
):
1125
1145
.

Clark
 
LV
,
Stewart
 
JR
,
Nishiwaki
 
A
,
Toma
 
Y
,
Kjeldsen
 
JB
,
Jørgensen
 
U
,
Zhao
 
H
,
Peng
 
J
,
Yoo
 
JH
,
Heo
 
K
, et al.  
Genetic structure of Miscanthus sinensis and Miscanthus sacchariflorus in Japan indicates a gradient of bidirectional but asymmetric introgression
.
J Exp Bot.
 
2015
:
66
(
14
):
4213
4225
.

Clifton-Brown
 
J
,
Schwarz
 
K-U
,
Awty-Carroll
 
D
,
Iurato
 
A
,
Meyer
 
H
,
Greef
 
J
,
Gwyn
 
J
,
Mos
 
M
,
Ashman
 
C
,
Hayes
 
C
, et al.  
Breeding strategies to improve Miscanthus as a sustainable source of biomass for bioenergy and biorenewable products
.
Agronomy
.
2019
:
9
(
11
):
673
.

Cooper
 
M
,
Powell
 
O
,
Gho
 
C
,
Tang
 
T
,
Messina
 
C
.
Extending the breeder's equation to take aim at the target population of environments
.
Front Plant Sci.
 
2023
:
14
:
1129591
.

Crowell
 
S
,
Korniliev
 
P
,
Falcão
 
A
,
Ismail
 
A
,
Gregorio
 
G
,
Mezey
 
J
,
McCouch
 
S
.
Genome-wide association and high-resolution phenotyping link Oryza sativa panicle traits to numerous trait-specific QTL clusters
.
Nat Commun
.
2016
:
7
(
1
):
10527
.

Deng
 
J
,
Dong
 
W
,
Socher
 
R
,
Li
 
L-J
,
Li
 
K
,
Fei-Fei
 
L
.
ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009. p. 248–255. Piscataway, NJ: IEEE
.

Dong
 
H
,
Clark
 
LV
,
Jin
 
X
,
Anzoua
 
K
,
Bagmet
 
L
,
Chebukin
 
P
,
Dzyubenko
 
E
,
Dzyubenko
 
N
,
Ghimire
 
BK
,
Heo
 
K
, et al.  
Managing flowering time in Miscanthus and sugarcane to facilitate intra- and intergeneric crosses
.
PLoS One
.
2021
:
16
(
1
):
e0240390
.

Eckardt
 
NA
,
Ainsworth
 
EA
,
Bahuguna
 
RN
,
Broadley
 
MR
,
Busch
 
W
,
Carpita
 
NC
,
Castrillo
 
G
,
Chory
 
J
,
DeHaan
 
LR
,
Duarte
 
CM
, et al.  
Climate change challenges, plant science solutions
.
Plant Cell.
 
2023
:
35
(
1
):
24
66
.

Fan
 
J
,
Zhou
 
J
,
Wang
 
B
,
de Leon
 
N
,
Kaeppler
 
SM
,
Lima
 
DC
,
Zhang
 
Z
.
Estimation of maize yield and flowering time using multi-temporal UAV-based hyperspectral data
.
Remote Sens (Basel).
 
2022
:
14
(
13
):
3052
.

Fawcett
 
T
.
An introduction to ROC analysis
.
Pattern Recognit Lett.
 
2006
:
27
(
8
):
861
874
.

Gan
 
Y
,
Zhu
 
H
,
Guo
 
W
,
Xu
 
G
,
Zou
 
G
.
Deep semi-supervised learning with contrastive learning and partial label propagation for image data
.
Knowl Based Syst.
 
2022
:
245
:
108602
.

Goodfellow
 
I
,
Pouget-Abadie
 
J
,
Mirza
 
M
,
Xu
 
B
,
Warde-Farley
 
D
,
Ozair
 
S
,
Courville
 
A
,
Bengio
 
Y
.
Generative adversarial networks
.
Commun ACM
.
2020
:
63
(
11
):
139
144
.

Hayes
 
BJ
,
Chen
 
C
,
Powell
 
O
,
Dinglasan
 
E
,
Villiers
 
K
,
Kemper
 
KE
,
Hickey
 
LT
.
Advancing artificial intelligence to help feed the world
.
Nat Biotechnol
.
2023
:
41
(
9
):
1188
1189
.

He
 
Z
,
Liu
 
H
,
Wang
 
Y
,
Hu
 
J
.
Generative adversarial networks-based semi-supervised learning for hyperspectral image classification
.
Remote Sens (Basel).
 
2017
:
9
(
10
):
1042
.

Heaton
 
EA
,
Dohleman
 
FG
,
Long
 
SP
.
Meeting US biofuel goals with less land: the potential of Miscanthus
.
Glob Chang Biol.
 
2008
:
14
(
9
):
2000
2014
.

Hesami
 
M
,
Alizadeh
 
M
,
Jones
 
AMP
,
Torkamaneh
 
D
.
Machine learning: its challenges and opportunities in plant system biology
.
Appl Microbiol Biotechnol
.
2022
:
106
(
9–10
):
3507
3530
.

Iscen
 
A
,
Tolias
 
G
,
Avrithis
 
Y
,
Chum
 
O
. Label propagation for deep semi–supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA: IEEE; 2019. p. 5070–5079.

Jensen
 
E
,
Farrar
 
K
,
Thomas-Jones
 
S
,
Hastings
 
A
,
Donnison
 
I
,
Clifton-Brown
 
J
.
Characterization of flowering time diversity in Miscanthus species
.
Glob Change Biol Bioenergy.
 
2011
:
3
(
5
):
387
400
.

Jha
 
D
,
Choudhary
 
K
,
Tavazza
 
F
,
Liao
 
W-K
,
Choudhary
 
A
,
Campbell
 
C
,
Agrawal
 
A
.
Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning
.
Nat Commun
.
2019
:
10
(
1
):
5316
.

Kamal
 
KC
,
Yin
 
Z
,
Li
 
B
,
Ma
 
B
,
Wu
 
M
.
Transfer learning for fine-grained crop disease classification based on leaf images
.
In: Proceedings of the 2019 10th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS). New York (NY): Institute of Electrical and Electronics Engineers (IEEE)
.
2019
. p.
1
5
.

Khan
 
S
,
Tufail
 
M
,
Khan
 
MT
,
Khan
 
ZA
,
Iqbal
 
J
,
Alam
 
M
.
A novel semi-supervised framework for UAV based crop/weed classification
.
PLoS One
.
2021
:
16
(
5
):
e0251008
.

Kornblith
 
S
,
Shlens
 
J
,
Le
 
QV
.
Do better ImageNet models transfer better?
 
In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach (CA): Institute of Electrical and Electronics Engineers (IEEE)
.
2019
. p.
2656
2666
.

Kumar
 
A
,
Desai
 
SV
,
Balasubramanian
 
VN
,
Rajalakshmi
 
P
,
Guo
 
W
,
Balaji Naik
 
B
,
Balram
 
M
,
Desai
 
UB
.
Efficient maize tassel-detection method using UAV based remote sensing
.
Remote Sens Appl.
 
2021
:
23
:
100549
.

Kurtulmuş
 
F
,
Kavdir
 
.
Detecting corn tassels using computer vision and support vector machines
.
Expert Syst Appl.
 
2014
:
41
(
16
):
7390
7397
.

Letsoin
 
SMA
,
Purwestri
 
RC
,
Rahmawan
 
F
,
Herak
 
D
.
Recognition of sago palm trees based on transfer learning
.
Remote Sens (Basel).
 
2022
:
14
(
19
):
4932
.

Lewandowski
 
I
,
Clifton-Brown
 
J
,
Trindade
 
LM
,
van der Linden
 
GC
,
Schwarz
 
K-U
,
Müller-Sämann
 
K
,
Anisimov
 
A
,
Chen
 
C-L
,
Dolstra
 
O
,
Donnison
 
IS
, et al.  
Progress on optimizing Miscanthus biomass production for the European Bioeconomy: results of the EU FP7 project OPTIMISC
.
Front Plant Sci.
 
2016
:
7
:
1620
.

Li
 
JZ
,
Huang
 
XQ
,
Heinrichs
 
F
,
Ganal
 
MW
,
Röder
 
MS
.
Analysis of QTLs for yield components, agronomic traits, and disease resistance in an advanced backcross population of spring barley
.
Genome
.
2006
:
49
(
5
):
454
466
.

Lin
 
Y
,
Li
 
S
,
Duan
 
S
,
Ye
 
Y
,
Li
 
B
,
Li
 
G
,
Lyv
 
D
,
Jin
 
L
,
Bian
 
C
,
Liu
 
J
.
Methodological evolution of potato yield prediction: a comprehensive review
.
Front Plant Sci
.
2023
:
14
:
1214006
.

Liu
 
Y
,
Cen
 
C
,
Che
 
Y
,
Ke
 
R
,
Ma
 
Y
,
Ma
 
Y
.
Detection of maize tassels from UAV RGB imagery with faster R-CNN
.
Remote Sens (Basel).
 
2020
:
12
(
2
):
338
.

Lotter
 
W
,
Diab
 
AR
,
Haslam
 
B
,
Kim
 
JG
,
Grisot
 
G
,
Wu
 
E
,
Wu
 
K
,
Onieva
 
JO
,
Boyer
 
Y
,
Boxerman
 
JL
, et al.  
Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach
.
Nat Med
.
2021
:
27
(
2
):
244
249
.

Martinez-Feria
 
RA
,
Basso
 
B
,
Kim
 
S
.
Boosting climate change mitigation potential of perennial lignocellulosic crops grown on marginal lands
.
Environ Res Lett
.
2022
:
17
(
4
):
044004
.

Moen
 
E
,
Bannon
 
D
,
Kudo
 
T
,
Graf
 
W
,
Covert
 
M
,
Van Valen
 
D
.
Deep learning for cellular image analysis
.
Nat Methods
.
2019
:
16
(
12
):
1233
1246
.

Nagasubramanian
 
K
,
Jubery
 
T
,
Fotouhi Ardakani
 
F
,
Mirnezami
 
SV
,
Singh
 
AK
,
Singh
 
A
,
Sarkar
 
S
,
Ganapathysubramanian
 
B
.
How useful is active learning for image-based plant phenotyping?
 
Plant Phenome J.
 
2021
:
4
(
1
):
e20020
.

Noh
 
S-H
.
Performance comparison of CNN models using gradient flow analysis
.
Informatics
.
2021
:
8
(
3
):
53
.

Odena
 
A
. Semi-supervised learning with generative adversarial networks. arXiv 1606.01583. , 5 June 2016, preprint: not peer reviewed.

O’Mahony
 
N
,
Campbell
 
S
,
Carvalho
 
A
,
Harapanahalli
 
S
,
Hernandez
 
GV
,
Krpalkova
 
L
,
Riordan
 
D
,
Walsh
 
J
. Deep learning vs. traditional computer vision. In:
Arai
 
K
,
Kapoor
 
S
, editors.
Advances in computer vision
.
Cham
:
Springer International Publishing
;
2020
. p.
128
144
.

Parnami
 
A
,
Lee
 
M
. Learning from few examples: a summary of approaches to few-shot learning. arXiv 2203.04291. , 7 March 2022, preprint: not peer reviewed.

Paullada
 
A
,
Raji
 
ID
,
Bender
 
EM
,
Denton
 
E
,
Hanna
 
A
.
Data and its (dis)contents: a survey of dataset development and use in machine learning research
.
Patterns
.
2021
:
2
(
11
):
100336
.

Poncet
 
AM
,
Knappenberger
 
T
,
Brodbeck
 
C
,
Fogle
 
M
,
Shaw
 
JN
,
Ortiz
 
BV
.
Multispectral UAS data accuracy for different radiometric calibration methods
.
Remote Sens (Basel).
 
2019
:
11
(
16
):
1917
.

Probst
 
P
,
Wright
 
MN
,
Boulesteix
 
A
.
Hyperparameters and tuning strategies for random forest
.
WIREs Data Mining Knowl Discov
.
2019
:
9
:
e1301
.

Rasmussen
 
CB
,
Kirk
 
K
,
Moeslund
 
TB
.
The challenge of data annotation in deep learning—a case study on whole plant corn silage
.
Sensors
.
2022
:
22
(
4
):
1596
.

Ribani
 
R
,
Marengoni
 
M
.
A survey of transfer learning for convolutional neural networks. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T). Brazil: IEEE, Rio de Janeiro; 2019
. p.
47
57
.

Sager
 
C
,
Janiesch
 
C
,
Zschech
 
P
.
A survey of image labelling for computer vision applications
.
J Business Anal
.
2021
:
4
(
2
):
91
110
.

Salimans
 
T
,
Goodfellow
 
I
,
Zaremba
 
W
,
Cheung
 
V
,
Radford
 
A
,
Chen
 
X
,
Chen
 
X
.
Improved techniques for training GANs
.
Adv Neural Inf Process Syst
 
2016
:
29
:
2234
2242
.

Sandfort
 
V
,
Yan
 
K
,
Pickhardt
 
PJ
,
Summers
 
RM
.
Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks
.
Sci Rep
.
2019
:
9
(
1
):
16884
.

Sapoval
 
N
,
Aghazadeh
 
A
,
Nute
 
MG
,
Antunes
 
DA
,
Balaji
 
A
,
Baraniuk
 
R
,
Barberan
 
CJ
,
Dannenfelser
 
R
,
Dun
 
C
,
Edrisi
 
M
, et al.  
Current progress and open challenges for applying deep learning across the biosciences
.
Nat Commun
.
2022
:
13
(
1
):
1728
.

Selvaraju
 
RR
,
Cogswell
 
M
,
Das
 
A
,
Vedantam
 
R
,
Parikh
 
D
,
Batra
 
D
.
Grad-CAM: visual explanations from deep networks via gradient-based localization
.
Int J Comput Vis
.
2020
:
128
(
2
):
336
359
.

Short
 
R
,
Fukunaga
 
K
.
The optimal distance measure for nearest neighbor classification
.
IEEE Trans Inf Theory.
 
1981
:
27
(
5
):
622
627
.

Shorten
 
C
,
Khoshgoftaar
 
TM
.
A survey on image data augmentation for deep learning
.
J Big Data.
 
2019
:
6
(
1
):
60
.

Somerville
 
C
,
Youngs
 
H
,
Taylor
 
C
,
Davis
 
SC
,
Long
 
SP
.
Feedstocks for lignocellulosic biofuels
.
Science
.
2010
:
329
(
5993
):
790
792
.

Szegedy
 
C
,
Liu
 
W
,
Jia
 
Y
,
Sermanet
 
P
,
Reed
 
S
,
Anguelov
 
D
,
Erhan
 
D
,
Vanhoucke
 
V
,
Rabinovich
 
A
.
Going deeper with convolutions. In: Proc IEEE Conf Comput Vis Pattern Recognit
.
Los Alamitos, CA: IEEE Computer Society
;
2015
. p.
1
9
.

Teuwen
 
J
,
Moriakov
 
N
. Convolutional neural networks. In:
Handbook of medical image computing and computer assisted intervention
.
London, United Kingdom
:
Elsevier
;
2020
. p.
481
501
.

Tran
 
AT
,
Nguyen
 
CV
,
Hassner
 
T
.
Transferability and Hardness of Supervised Classification Tasks
. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea. New York: IEEE;
2019
. p.
1395
1405
.

Tseng
 
H-H
,
Yang
 
M-D
,
Saminathan
 
R
,
Hsu
 
Y-C
,
Yang
 
C-Y
,
Wu
 
D-H
.
Rice seedling detection in UAV images using transfer learning and machine learning
.
Remote Sens (Basel).
 
2022
:
14
(
12
):
2837
.

van Engelen
 
JE
,
Hoos
 
HH
.
A survey on semi-supervised learning
.
Mach Learn
.
2020
:
109
(
2
):
373
440
.

Varela
 
S
,
Pederson
 
TL
,
Leakey
 
ADB
.
Implementing spatio-temporal 3D-convolution neural networks and UAV time series imagery to better predict lodging damage in Sorghum
.
Remote Sens (Basel).
 
2022a
:
14
(
3
):
733
.

Varela
 
S
,
Zheng
 
X
,
Njuguna
 
JN
,
Sacks
 
EJ
,
Allen
 
DP
,
Ruhter
 
J
,
Leakey
 
ADB
.
Deep convolutional neural networks exploit high-spatial- and -temporal-resolution aerial imagery to phenotype key traits in Miscanthus
.
Remote Sens (Basel).
 
2022b
:
14
(
21
):
5333
.

Wang
 
AX
,
Tran
 
C
,
Desai
 
N
,
Lobell
 
D
,
Ermon
 
S
.
Deep transfer learning for crop yield prediction with remote sensing data. In: Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies. New York, NY, USA: Association for Computing Machinery; 2018
. p.
1
5
.

Wang
 
H
,
Fu
 
T
,
Du
 
Y
,
Gao
 
W
,
Huang
 
K
,
Liu
 
Z
,
Chandak
 
P
,
Liu
 
S
,
Van Katwyk
 
P
,
Deac
 
A
, et al.  
Scientific discovery in the age of artificial intelligence
.
Nature
.
2023
:
620
(
7972
):
47
60
.

Wang
 
J
,
Lan
 
C
,
Liu
 
C
,
Ouyang
 
Y
,
Qin
 
T
,
Lu
 
W
,
Chen
 
Y
,
Zeng
 
W
,
Yu
 
P
.
Generalizing to unseen domains: a survey on domain generalization
.
IEEE Trans Knowl Data Eng
.
2022
:
35
:
8052
8072
.

Wang
 
P
,
Fan
 
E
,
Wang
 
P
.
Comparative analysis of image classification algorithms based on traditional machine learning and deep learning
.
Pattern Recognit Lett.
 
2021
:
141
:
61
67
.

Wei
 
X
,
Yu
 
X
,
Liu
 
B
,
Zhi
 
L
.
Convolutional neural networks and local binary patterns for hyperspectral image classification
.
Eur J Remote Sens.
 
2019
:
52
(
1
):
448
462
.

Weinberger
 
KQ
,
Blitzer
 
J
,
Saul
 
L
.
Distance metric learning for large margin nearest neighbor classification
.
J Mach Learn Res.
 
2009
:
10
:
207
244
.

Weinstein
 
BG
.
A computer vision for animal ecology
.
J Anim Ecol.
 
2018
:
87
(
3
):
533
545
.

Wu
 
G
,
Miller
 
ND
,
de Leon
 
N
,
Kaeppler
 
SM
,
Spalding
 
EP
.
Predicting Zea mays flowering time, yield, and kernel dimensions by analyzing aerial images
.
Front Plant Sci.
 
2019
:
10
:
1251
.

Xie
 
J
,
Fernandes
 
SB
,
Mayfield-Jones
 
D
,
Erice
 
G
,
Choi
 
M
,
Lipka
 
AE
,
Leakey
 
ADB
.
Optical topometry and machine learning to rapidly phenotype stomatal patterning traits for maize QTL mapping
.
Plant Physiol.
 
2021
:
187
(
3
):
1462
1480
.

Yamashita
 
R
,
Nishio
 
M
,
Do
 
RKG
,
Togashi
 
K
.
Convolutional neural networks: an overview and application in radiology
.
Insights Imaging
.
2018
:
9
(
4
):
611
629
.

Yang
 
Q
. An Introduction to transfer learning. In:
Tang
 
C
,
Ling
 
CX
,
Zhou
 
X
,
Cercone
 
NJ
,
Li
 
X
, editors.
Advanced data mining and applications
.
Berlin, Heidelberg
:
Springer
;
2008
. p.
1
.

Zan
 
X
,
Zhang
 
X
,
Xing
 
Z
,
Liu
 
W
,
Zhang
 
X
,
Su
 
W
,
Liu
 
Z
,
Zhao
 
Y
,
Li
 
S
.
Automatic detection of maize tassels from UAV images by combining random forest classifier and VGG16
.
Remote Sens (Basel).
 
2020
:
12
(
18
):
3049
.

Zhao
 
Y
,
Zheng
 
B
,
Chapman
 
SC
,
Laws
 
K
,
George-Jaeggli
 
B
,
Hammer
 
GL
,
Jordan
 
DR
,
Potgieter
 
AB
.
Detecting Sorghum plant and head features from multispectral UAV imagery
.
Plant Phenomics
.
2021
:
2021
:
9874650
.

Zhao
 
Z
,
Wang
 
E
,
Kirkegaard
 
JA
,
Rebetzke
 
GJ
.
Novel wheat varieties facilitate deep sowing to beat the heat of changing climates
.
Nat Clim Chang
.
2022
:
12
(
3
):
291
296
.

Zhou
 
B
,
Khosla
 
A
,
Lapedriza
 
A
,
Oliva
 
A
,
Torralba
 
A
.
Learning Deep Features for Discriminative Localization
. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV. Institute of Electrical and Electronics Engineers (IEEE), Piscataway, NJ.
2016
; p.
2921
2929
. https:/doi.org/10.1109/CVPR.2016.319

Zhou
 
L
,
Pan
 
S
,
Wang
 
J
,
Vasilakos
 
AV
.
Machine learning on big data: opportunities and challenges
.
Neurocomputing (Amst).
 
2017
:
237
:
350
361
.

Author notes

The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (https://dbpia.nl.go.kr/plphys/pages/General-Instructions) is: Sebastian Varela (sv79@illinois.edu).

Conflict of interest statement. A patent on ESGAN has been filed by the University of Illinois Urbana-Champaign with A.D.B.L. and S.V. as inventors. The authors declare no conflict of interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.