Breaking the barrier of human-annotated training data for machine learning-aided plant research using aerial imagery

Author Notes

Abstract

Machine learning (ML) can accelerate biological research. However, the adoption of such tools to facilitate phenotyping based on sensor data has been limited by (i) the need for a large amount of human-annotated training data for each context in which the tool is used and (ii) phenotypes varying across contexts defined in terms of genetics and environment. This is a major bottleneck because acquiring training data is generally costly and time-consuming. This study demonstrates how a ML approach can address these challenges by minimizing the amount of human supervision needed for tool building. A case study was performed to compare ML approaches that examine images collected by an uncrewed aerial vehicle to determine the presence/absence of panicles (i.e. “heading”) across thousands of field plots containing genetically diverse breeding populations of 2 Miscanthus species. Automated analysis of aerial imagery enabled the identification of heading approximately 9 times faster than in-field visual inspection by humans. Leveraging an Efficiently Supervised Generative Adversarial Network (ESGAN) learning strategy reduced the requirement for human-annotated data by 1 to 2 orders of magnitude compared to traditional, fully supervised learning approaches. The ESGAN model learned the salient features of the data set by using thousands of unlabeled images to inform the discriminative ability of a classifier so that it required minimal human-labeled training data. This method can accelerate the phenotyping of heading date as a measure of flowering time in Miscanthus across diverse contexts (e.g. in multistate trials) and opens avenues to promote the broad adoption of ML tools.

Issue Section:

Research article > Breakthrough Technologies, Tools and Resources

Introduction

Artificial intelligence (AI) and machine learning (ML) present enormous opportunities for accelerating scientific discovery, especially in biological research where large-scale, complex problems are commonplace (Burke and Lobell 2017; Xie et al. 2021; Varela et al. 2022b; Wang et al. 2023). However, advanced AI/ML methods require substantial amounts of annotated data for training purposes and are highly context dependent; i.e. they do not perform reliably in contexts beyond that covered by the training data (Paullada et al. 2021; Wang et al. 2022). Biological research is an especially challenging use case for image classification problems because the appearance and function (i.e. phenotype) of an organism are the result of complex interactions between genotype, natural environment, and human intervention (Hayes et al. 2023). For example, the performance of a particular crop depends on variation in genotype, growing conditions, and management practices (Zhao et al. 2022; Cooper et al. 2023). This contrasts with everyday objects, which can take a variety of forms, but are fixed in time and space such that a given object (e.g. a teapot) will not change shape, texture, or color depending on the location and time at which it is imaged under standardized conditions. The potential for AI- and ML-enabled approaches to be applied to biological research has been demonstrated across many scales from cells to organs, organisms, communities, and ecosystems. However, this high contextual diversity means existing AI/ML tools will need to be retrained—at considerable cost and effort—for each new biological context in which they are to be used (Moen et al. 2019; Hesami et al. 2022), limiting their adoption. The increasing spatial, temporal, and spectral resolution of sensors, along with cloud computational processing, is all enhancing our ability to supply more sensor data for phenotyping (Lin et al. 2023). However, the greater spatial and temporal resolution of sensor data will only be fully exploited if it is matched by greater resolution in the human-annotated data used for training ML tools that find associations between the 2 data sets. Existing methods to minimize human effort in the production of annotated data include data augmentation (Shorten and Khoshgoftaar 2019), pseudo-labeling, and label propagation (Iscen et al. 2019; Gan et al. 2022). Meanwhile, active learning (Nagasubramanian et al. 2021), transfer learning (TL) (Tran et al. 2019), and semi-supervised learning (van Engelen and Hoos 2020) can reduce the need for annotated data in the training process. Recent research efforts to overcome these challenges include developing AI techniques that more rapidly generalize from few examples (Parnami and Lee 2022). However, while each of these approaches can deliver valuable benefits, the necessity for further innovation to address the intertwined issues of training effort and context dependency is clear (Alzubaidi et al. 2021; Sapoval et al. 2022; Ahmed et al. 2023).

This study proposes a generative adversarial learning strategy as an alternative to traditional supervised learning and TL, aiming to minimize the human-based supervision required for a computer vision tool. This involves exploiting the ability of a generative adversarial network (GAN) to learn the salient features of the data from unlabeled images captured with an aerial platform. It is anticipated that allowing the model to learn the underlying latent space representation in the data can be leveraged to enhance the model's discriminative ability in a classification task with minimal human assistance. A key feature of this approach is using a coinformative learning strategy between the unsupervised and supervised classifiers within the GAN. This is intended to allow learning of the salient features of the large, unlabeled image set to be complemented by the use of a smaller pool of labeled images to efficiently achieve the classification task. We describe this approach as an Efficiently Supervised GAN (ESGAN).

A case study of the proposed approach is performed by classifying thousands of diverse, field-grown Miscanthus genotypes as having produced panicles, or not, on a given date in a time course of imagery collected by an uncrewed aerial vehicle (UAV, or uncrewed aerial system, or drone). Biomass and valuable chemical compounds from dedicated bioenergy crops are expected to play a central role in the provision of more sustainable energy and bioproducts (Somerville et al. 2010; Martinez-Feria et al. 2022; Eckardt et al. 2023). Miscanthus sacchariflorus and Miscanthus sinensis are crossed to produce very productive, sterile hybrids (Heaton et al. 2008). Flowering time is a key trait influencing productivity and adaptation of Miscanthus to different growing regions (Jensen et al. 2011; Clifton-Brown et al. 2019). Flowering time in Miscanthus, like many other grass crops, can be assessed in terms of “heading date,” i.e. when panicles are outwardly visible in 50% of the culms that reach the top of the canopy (Li et al. 2006; Crowell et al. 2016; Clark et al. 2019; Wu et al. 2019). Repetitive visual inspections of thousands of individuals grown in extensive field trials are very labor intensive (Clark et al. 2019). Repeated assessment of a crop trial to assess when in a seasonal time course, a panicle is first observed then allows estimation of heading date. Increasing the frequency with which the crop is assessed increases the precision of heading date estimates and also increases labor and has motivated the development of ML-enabled remote sensing tools to identify reproductive organs and to assess if plants have reached developmental milestones (Wu et al. 2019; Liu et al. 2020; Fan et al. 2022). In Miscanthus, a fully supervised 3D-convolutional neural network (CNN) assessed heading date from UAV images (Varela et al. 2022b). However, the common challenges of context dependency demand for substantial training data and limit to generalization ability remain if such tools are to be widely adopted (Rasmussen et al. 2022). Therefore, this is just one of many phenotypes for which reducing the demand for manual supervision of ML tools will be valuable.

This study tests the ability of ESGAN to classify aerial images of individual plants of M. sacchariflorus and M. sinensis on the basis of panicles being visible or not, i.e. the most repeated and labor-demanding step in heading date determination. The performance of ESGAN is compared to various popular algorithms based on the fully supervised learning (FSL) paradigm and traditional TL with varying degrees of complexity, including K-nearest neighbor (KNN), random forest (RF), custom CNN, and ResNet-50. This analysis was repeated as the number of annotated images provided to train a given model was reduced from 3,137 (100%) to 32 (1%), while simultaneously providing ESGAN with access to the complete set of unannotated images (i.e. n = 3,137). The objective is to understand the trade-offs between predictive ability and the level of dependence on manual annotation for each of the algorithms. In addition, we test how ESGAN exploits its unique generative and adversarial learning strategy to leverage its own predictive ability. Finally, class activation visualization (Selvaraju et al. 2020) is used to understand how ESGAN exploits the information in the images to maximize its predictive ability.

Results

Benchmarks and ESGAN algorithms evaluation

As a baseline, all 5 model types were able to correctly classify whether plants have reached heading or not when provided with the full (100%) training data set of 3,137 images (Fig. 1, A, B, and H). The convolutional-based models CNN, ResNet-50, and ESGAN models all performed well (overall accuracy [OA] = 0.89 to 0.92, F1 score = 0.87 to 0.90) and had superior performance than the tabular methods of KNN and RF (OA = 0.78 to 0.79, F1 score = 0.73 to 0.76).

Figure 1.

Evaluation of heading detection in testing data. Performance of benchmarks and ESGAN algorithms under an increasing number of annotated samples via OA A) and F1 score B) metrics. Error bars represent the Sd of performance metrics after 3 training and testing iterations. Performance evaluation using ROC analysis is also presented for the same models under the same conditions C to H). See Materials and methods for explanation of metrics.

Open in new tab Download slide

All model types demonstrated some reduction in the ability to detect heading accurately as the amount of annotated training data was reduced, but to very different degrees. For ESGAN, the penalty for reducing the number annotated images used for training down to 1% of available data (32 images) was negligible in terms of OA (decline from 0.89 to 0.87), F1 score (decline from 0.87 to 0.85), and receiver operating characteristic (ROC) analysis (Fig. 1). TL using ResNet-50 was the next most robust method, maintaining performance as annotated training data were reduced to 10% (314 images), before being heavily penalized as the amount of annotated training data declined further (Fig. 1). CNN performed at an intermediate level, maintaining performance as annotated training data were reduced to 30% (941 images), before being heavily penalized as the amount of annotated training data decline further (Fig. 1). KNN and RF were less sensitive than CNN and ResNet-50 to reductions in the amount of annotated training data provided, but this only partially compensated for the poorer baseline performance of KNN and RF (Fig. 1).

When the amount of annotated data was most restricted (1% of data available for training), ESGAN's performance (OA = 0.87 to 0.89, F1 score = 0.85 to 0.87) was substantially better than all other models (OA = 0.43 to 0.75, F1 score = 0.16 to 0.72) (Fig. 1, A and B). This also agreed with ROC analysis, where ESGAN was the most effective model for correctly identifying the 2 image classes when fewer than hundreds of annotated images were available for training (Fig. 1, C and D).

Understanding the predictive improvement of ESGAN

The ability of ESGAN to accurately determine heading of plants from aerial imagery can be explained by the synergic contributions of ESGAN's generator and ESGAN's discriminator submodels. The ability of the ESGAN generator to improve the visual representations of “fake” images was notable during the training process (Fig. 2, A, B, D, and E). The initial attempts of the ESGAN generator to generate images produced very noisy and unrealistic representations of Miscanthus plants (Fig. 2, A and D). It was notable that the ESGAN generator submodel progressively learned to better match the RGB color intensity and spatial distribution of pixels of the real images turning them into very realistic representations of plants (Fig. 2, B and E). This improvement was in agreement with the increasing performance of the ESGAN discriminator (Fig. 2, C and F) along the successive minibatch steps of training, where the ability of this submodel to identify plants with panicles consistently improved regardless of whether very few (e.g. 32 images, Fig. 2C) or many (Fig. 2F) annotated training images were provided.

Figure 2.

Visual representation of “fake” images generated by the ESGAN generator during modeling implementation at early (400) A, D) and advanced (9,800) B, E) training steps. Evaluation of heading detection by the ESGAN discriminator-supervised classifier at early (400) and advanced (9,800) training steps under limited (1%) C) and large (80%) F) number of annotated samples.

Open in new tab Download slide

Explaining ESGAN's learning via Grad-CAM

Since the ultimate goal of this study was to maximize the ability of the ESGAN discriminator-supervised classifier to accurately determine the heading status of plants, gaining insights and interpretability about the learning process of this classifier was a key component of the analysis. When interpreting the learning process of this model via Gradient-weighted Class Activation Mapping (Grad-CAM) to highlight which parts of an image contribute the most to a model's decision (Selvaraju et al. 2020), it was notable that the model successfully focused on plant pixels versus background pixels and varied its activation levels depending on the class of image being considered. For plants without visible panicles (Fig. 3A), higher activation regions (yellow) were visibly located over the green areas of the plant, this was especially notable over the upper leaves, while lower leaves and background regions (i.e. soil) were assigned with lower (blue) activation level (Fig. 3C), meaning they were less informative. For the class of plant that had reached heading (Fig. 3B), higher activation was particularly noticeable over the regions of panicles (i.e. silver-white objects) of the plants, while the model assigned lower activation level to vegetative tissues (Fig. 3D).

Figure 3.

Visualization of examples real RGB images and Grad-CAM activation maps. Example preheading A) plant class and the corresponding activation map C) extracted from ESGAN D supervised classifier. Example postheading B) plant class and the corresponding activation map D). Activation levels in the images are represented on a 0 to 255 scale.

Open in new tab Download slide

Evaluation of labor requirements for ESGAN versus human evaluation of heading status

The combined Miscanthus breeding trials studied here featured 3,040 plots, including 12,400 individual plants at the time of establishment (1 per plot for M. sacchariflorus and 10 per plot for M. sinensis). Heading status of each plant was assessed on 3 occasions. Visual inspection by humans walking through the trials, including recording of data on an electronic device, required approximately 10.5 person-seconds per plant or 36 person-hours in total on each occasion that phenotyping was performed (Table 1). By comparison, the time demand could be reduced >8-fold to 4.33 person-hours in total, or ∼1.2 s per plant, when acquiring images by UAV and analyzing them with ESGAN (Table 1). This reduction in time commitment reduces labor requirements below the threshold where, weather permitting, a single person could maximize the accuracy of heading data estimates by performing phenotyping on a daily basis.

Table 1.

Open in new tab

Description of activities and time required to phenotype the heading status of Miscanthus breeding trials by traditional visual inspection on the ground versus UAV imaging plus analysis by ESGAN

Visual inspection by humans on the ground		UAV imaging and ESGAN analysis
Activity	Time	Activity	Time
In-field evaluation and data recording on an electronic device.	36 h	Flight planning and execution	1 h 20 min
		Image processing	2 h 40 min
		Image chip generation and ESGAN predictive inference	20 min
	Total 36 h		Total 4 h 20 min

Visual inspection by humans on the ground		UAV imaging and ESGAN analysis
Activity	Time	Activity	Time
In-field evaluation and data recording on an electronic device.	36 h	Flight planning and execution	1 h 20 min
		Image processing	2 h 40 min
		Image chip generation and ESGAN predictive inference	20 min
	Total 36 h		Total 4 h 20 min

Data correspond to the effort required to phenotype the 3 trials (3,040 plots) in this study on 1 occasion, i.e. at a single point in a seasonal time course.

Table 1.

Open in new tab

Description of activities and time required to phenotype the heading status of Miscanthus breeding trials by traditional visual inspection on the ground versus UAV imaging plus analysis by ESGAN

Visual inspection by humans on the ground		UAV imaging and ESGAN analysis
Activity	Time	Activity	Time
In-field evaluation and data recording on an electronic device.	36 h	Flight planning and execution	1 h 20 min
		Image processing	2 h 40 min
		Image chip generation and ESGAN predictive inference	20 min
	Total 36 h		Total 4 h 20 min

Visual inspection by humans on the ground		UAV imaging and ESGAN analysis
Activity	Time	Activity	Time
In-field evaluation and data recording on an electronic device.	36 h	Flight planning and execution	1 h 20 min
		Image processing	2 h 40 min
		Image chip generation and ESGAN predictive inference	20 min
	Total 36 h		Total 4 h 20 min

Data correspond to the effort required to phenotype the 3 trials (3,040 plots) in this study on 1 occasion, i.e. at a single point in a seasonal time course.

Before ESGAN can be deployed to analyze UAV imagery, it must be trained on human-annotated images. The number of training images annotated by in-field, human phenotyping needed to maximize how accurately plants were classified as having reached heading or not was substantially fewer for ESGAN (∼32 images) than for TL by ResNet-50 (∼314) or a traditional, fully supervised CNN (∼941 images). Based on the average time to phenotype each plant, this means that the time required to collect annotation data in each new context that a model would be used decreases by an order of magnitude for ESGAN relative to TL and CNN (Fig. 4A).

Figure 4.

Evaluation of A) time for acquiring annotation data for training for models that accurately classify images (OA > 0.85) and B) training time for each model relative to the number of annotated samples in the training data.

Open in new tab Download slide

In addition, the training time for ESGAN varied from ∼750 to 900 s depending on the number of annotated samples analyzed. This was 3- to 4-fold slower than for other learning methods (Fig. 4B). However, this increase in computational time is small compared to the gains in efficiency with respect to fieldwork (Fig. 4A).

Discussion

This study successfully demonstrated that an ESGAN approach can substantially reduce the amount of human-annotated training data needed to accurately perform an image classification task. Only tens of human-annotated images were needed to achieve high levels of accuracy in detecting plants that had reaching heading, or not, even when the problem was presented in the challenging context of a large population of Miscanthus genotypes, which feature a wide diversity of visual appearance both before and after heading. By contrast, hundreds of human-annotated images were needed to train a TL tool (ResNet-50), and thousands of annotated images were needed to train a fully supervised CNN. Meanwhile, KNN and RF were not able to classify images with high levels of accuracy, even when provided with thousands of training images. These findings highlight how a generative and adversarial learning strategy can provide an efficient solution to the common problem of needing large amounts of annotated training data for high-performing FSL DL approaches. This is a particularly significant discovery for the many potential applications of computer vision, such as high-throughput phenotyping in crop breeding, where frequent retraining of a DL model is needed to cope with strong context dependency of outcomes. The time required to acquire imagery by UAV and perform analysis with the ESGAN tool was ∼8-fold less than the time required for people to visually assess and record the heading status of Miscanthus while walking through the field trials. The time required to train any of the ML models is trivial relative to the time required for data acquisition. Combined with reducing the requirement for training data by 1 to 2 orders of magnitude by using ESGAN versus FSL or TL, this represents a major reduction in the effort needed to develop and use custom-trained ML models for phenotyping heading date in trials involving other locations, breeding populations, or species. For the Miscanthus breeding program at UIUC, the reduction in labor on each occasion the heading status of the breeding trials is assessed, from 36 to 4.33 person-hours, creates the opportunity to increase the frequency of assessment from once per week to once every 2 or 3 d, and thereby increase the accuracy of heading date estimates (Varela et al. 2022a).

The power of ESGAN is valuable to research in the biological science domain, particularly at the intersection of remote sensing, precision agriculture, and plant breeding. The integration of automated data collection based on noncontact sensors and ESGAN can provide a cost-effective solution for exploiting large volumes of unannotated inputs, which can be collected at relatively low cost using remote sensing platforms. It can reduce dependence on large annotation data sets while achieving performance equivalent to traditional FSL approaches. Making these advances in a highly productive perennial grass, such as Miscanthus, is particularly important and challenging because these crops are more difficult to phenotype, i.e. highly segregating outbred populations with each individual genetically unique, and voluminous perennial plants that grow larger each year make field screening by humans on the ground more difficult and time-consuming than in annual, short-stature crops (Varela et al. 2022b). Implementing this ESGAN-enabled strategy may allow breeders to grow and evaluate larger populations in more locations as a means to accelerate crop improvement (Lewandowski et al. 2016) but at lower cost given the reduced dependence on manual annotation. It will be interesting to test how the ESGAN applied can be transferred to assess heading in other important crops including maize (Zea mays), sorghum (Sorghum bicolor), rice (Oryza sativa), wheat (Triticum aestivum), and switchgrass (Panicum virgatum), which also have panicles visible at the top of the canopy. The focus would shift to supplying a reduced number of highly quality and strategic annotations, while relying on the generative and adversarial element of the ESGAN to reduce the gap in predictive ability instead of depending on large data collection campaigns required for robust FSL implementations.

Previous studies have reported the use of high spatial resolution remote sensing imagery to detect reproductive organs in plants using pixel-wise classification (Kurtulmuş and Kavdir 2014; Kumar et al. 2021) and morphological operations (Zhao et al. 2021). While these studies demonstrated relevant advances for rapid detection of maize tassels and sorghum panicles, they heavily relied on manual supervision, for example, to determine optimal features for the supervised classifier or morphological filtering. This makes these approaches more highly context dependent, where they would require continuous retraining when exposed to new cases (i.e. a different geographic location or year of the breeding trial in which environmental conditions alter crop appearance), which can present important challenges for successful scalability (Zhou et al. 2017). More recently, convolutional-based deep learning has been successfully implemented to detect reproductive organs of plants via classification (Zan et al. 2020) and object detection (Liu et al. 2020). The convolutional operation of the algorithm enables the model to fully exploit the information in the image, whereas both the intensity of the signal and the spatial arrangement of pixels in the image can be informative features to characterize the target trait (Yamashita et al. 2018). This also enables fully automatized feature extraction that provides a clear advantage over traditional machine learning, which heavily relies on expert knowledge and manual feature engineering to identify meaningful features (O’Mahony et al. 2020). Nevertheless, the cost of creating a large volume of annotated data sets can be operationally and financially unfeasible in numerous applications (Weinstein 2018; Lotter et al. 2021; Sager et al. 2021). Previous efforts to reduce dependency on training data have successfully integrated remote and proximal sensing with TL for plant species recognition (Letsoin et al. 2022), seedling detection (Tseng et al. 2022), and disease detection (Kamal et al. 2019). For example, the use of TL significantly reduced the computational time and resources needed when compared to implementing custom CNNs from scratch (Jha et al. 2019). While TL has proven an effective approach, the similarity between the original and target tasks can bring challenges for successful transferability; thus, additional data are often needed (Letsoin et al. 2022) to improve the generalization ability of TL. Moreover, the relationship between transferability, data annotation size, and predictive ability of TL has not been extensively tested in plant science applications. Therefore, the implementation of robust analytical approaches that directly address this challenge and accurately determine differences in the phenological characteristics of individuals in a population with diverse origins can be of significant interest for a challenging task that is typically done by manual inspection (Dong et al. 2021) and requires copious amount of manual annotation for robust FSL implementations (Varela et al. 2022a).

ESGAN clearly outperformed FSL models when only tens of training images were provided. Overall, this highlights the particular ability of ESGAN to exploit unannotated imagery to produce meaningful improvements for more accurate determination of the heading status under minimal annotation. This can be attributed to ESGAN's ability to effectively enrich the latent space representation, which is crucial for classifiers in the discriminator to accurately distinguish between target classes and outperform other convolutional-based benchmark models. A previous study also used a GAN for discrimination of crops versus weeds (Khan et al. 2021) in high spatial resolution aerial imagery, but the reductions in demand for human-annotated training data were significantly less than achieved with ESGAN. This difference may reflect variations in the particular design of the ESGAN's discriminator. ESGAN benefits from using 2 CNNs (supervised and nonsupervised classifiers) that share weights, allowing synergic feature matching even when annotations are severely restricted. Specifically, the architecture design and training sequence of ESGAN ensure that weight updates in 1 classifier affect the other one (Fig. 7B), facilitating feature matching. This design and sequence of steps during training enable the model to synergistically exploit both types of data sources (i.e. annotated and unannotated), providing a clear advantage over the FSL and traditional TL strategies. Even though the main goal of this study was to maximize discriminator-supervised classifier performance, the generative component of the algorithm showed a significant improvement in the quality of the visual representation of Miscanthus plants during the learning process (Fig. 2, A, B, D, and E). This allowed synergistic gains in the performance of the ESGAN discriminator and ESGAN generator as gradient updates and loss function information passed between submodels.

The dependence of the CNN model on voluminous amounts of annotated images was strong. This was not surprising and agreed with previous studies noting the importance of large and high-quality data sets for optimal performance of FSL algorithms (Wang et al. 2021). This constraint was also evident, although to a lesser degree, when using the TL strategy. This demonstrated that the TL strategy was capable of exploiting prior knowledge, but the dependence on annotated images was consistently larger than for ESGAN.

The generally poorer performance of tabular-based algorithms (KNN and RF) versus CNN-based methods (ESGAN, TL, and CNN) suggested that the use of spectral features summarized over an area of interest via statistical descriptors was a suboptimal solution for the effective determination of heading status at the image level. This can in part be explained by the fact that inflorescences of the plants appeared as small silver color objects in the images, and it is logical to argue that the use of spectral descriptors extracted and summarized over an area of interest (i.e. whole image chip) into a unique tabular value may only partially capture these patterns in the image. The capacity of convolutional-based approaches to automatically map spatial-dependent patterns in the images and use them as informative features (Wei et al. 2019) can explain the superior ability of convolutional-based approaches.

Grad-CAM showed that the algorithm prioritized information gain from areas of the image occupied by inflorescences and vegetative tissue as a means to differentiate each class without the need for manual supervision to identify regions of interest. This extends the degree to which expert supervision was not needed during implementation of the analysis. This is particularly important in biological systems, such as crop breeding, where high levels of phenotypic diversity from genetic and environmental sources occur, which would otherwise limit the broad application of existing AI tools.

By reducing the dependence from manual annotation, the traditional requirement for exhaustive field-wide surveying can be alleviated also to determine the heading dynamics. This has important implications for optimizing the operational costs associated with phenotyping trials. Rather than conducting comprehensive surveys of the entire field at each round of evaluations, surveying could focus on representative sections to optimize the operational cost. Complementing these targeted ground surveys with aerial surveys would further enhance temporal coverage by better distributing the operational cost associated, without compromising accuracy in heading status predictions and reducing the cost of capturing finer temporal dynamics. ESGAN's strong predictive performance even with reduced data availability suggests that this hybrid approach could maintain high levels of accuracy across time points, offering a practical, cost-efficient, and scalable alternative for large-scale phenotyping in agricultural research.

Conclusion

The generative-discriminative nature of the proposed ESGAN approach, which is characterized by an adversarial and synergic training of neural networks, presents a promising avenue for reducing the dependence on human supervision and enhancing the model's capacity for generalization through more efficient incorporation of contextual information. In the case study presented, this meant that heading detection in plants could be effectively determined using high-spatial-resolution aerial imagery with only very limited (tens) of human-annotated training images. This represents a significant potential reduction of manual annotation given by ESGAN compared to traditional FSL, all with negligible penalization. These outcomes are valuable for designing future strategies to optimize the integration of manual field screening efforts and aerial data collection. More broadly, this work could address the need for advanced modeling techniques that can produce both robust accuracy while reducing the operational cost of collecting time-consuming annotated data for many computer vision problems in plant science applications.

Materials and methods

Field trials

Data were collected from 3 Miscanthus diversity trials located at the University of Illinois Energy Farm, Urbana (40.06722°N, 88.19583°W). The trials were planted in the spring of 2019. This study focused on the second year (2020) of their establishment, which is the first growing season in which the Miscanthus trials are typically phenotyped. The broader aims of the breeding program include assessment of overwintering survival and evaluation of germplasm adapted to a wide range of latitudes and environments. Since plants that were lost to lethal winter temperatures were randomly distributed within the trials, all locations were phenotyped by humans and UAV imaging regardless of survival status. Not all germplasm experienced the photoperiod necessary to achieve a vegetative-reproductive transition and achieve heading at this location.

The M. sacchariflorus trial included 2,000 entries as single-plant plots in 4 blocks, each block including 58 genetic backgrounds (half-sib families) (Varela et al. 2022b). The size of the trial was 79 m long × 97 m wide, and each plot (plant) was 1.83 × 1.83 m size.

One M. sinensis trial included germplasm from South Japan while the other included germplasm from Central Japan (Clark et al. 2015). Each of these 2 trials included 2 blocks, with 130 plots per block. Each plot contained seedlings from a single half-sib family, with 10 plants at a spacing of 0.91 m, requiring transplant of 10,400 individuals in total at the start of the trial. In the M. sinensis South Japan trial, there were 124 families, and in the Central Japan trial, there were 117 families. Therefore, a few families were planted in more than 1 plot per block to avoid leaving empty space. The size of the field that included both of the M. sinensis trials was 115 m long × 121 m wide.

Trait of interest and ground truthing

Every plant in both single-plant and multi-plant plots was phenotyped individually through observation on the ground by an expert evaluator to determine if it had produced panicles or not. A plant was considered to have reached heading once the culms that contribute to the canopy height have 50% panicles that had emerged ≥1 cm beyond the flag leaf sheath. Data were recorded to separately track plants that died or never reached heading. Examples of plants with emerging panicles imaged at the ground level and by UAV are shown in Fig. 5. The M. sacchariflorus trial was inspected on day of the year (DOYs) 248, 262, and 276 and the M. sinensis on DOYs 245, 265, and 280. This matched as close as possible (i.e. depending on optimal weather conditions) the dates of UAV data collection.

Figure 5.

Example cases of plant with emerging inflorescences from ground A), plants not yet heading B, C), and plants after heading D, E) from UAV imagery collected in the 2020 season.

Open in new tab Download slide

Aerial data acquisition and imagery preprocessing

A Matrice 600 Pro hexacopter (DJI, Shenzhen, China) UAV equipped with a Gremsy T1 gimbal (Gremsy, Ho Chi Minh, Vietnam) mounted with a multispectral RedEdge-M sensor (MicaSense, Seattle, WA, USA) was utilized for aerial data collection. The sensor included 5 spectral bands in the blue (465 to 485 nm), green (550 to 570 nm), red (663 to 673 nm), rededge (712 to 722 nm), and near-infrared (820 to 860 nm) regions of the electromagnetic spectrum. Flights were conducted 3 times (DOYs 247, 262, and 279) in the season corresponding to the period when most inflorescences emerge. The aerial data were collected under clear sky conditions around ±1 h from solar noon to ensure consistent reflectance values across days of data collection. The flight altitude was 20 m above ground level, resulting in a ground sampling distance of 0.8 cm/pixel. Flight settings included 90% forward and 80% side overlapping during data acquisition to ensure high-quality image stitching during postprocessing steps. Ten black and white square panels (70 cm × 70 cm) were distributed in the trials as ground control points (GCPs). An real-time kinematic survey was done using a Trimble R8 global navigation satellite system integrated with CORS-ILUC local station to survey the GCPs to ensure consistent spatial extraction of the image chips between days of data collection. A MicaSense calibration panel was imaged on the ground before and after each of the flights for spectral calibration of the images via an empirical procedure (Poncet et al. 2019). Images were imported into Metashape version 1.7.4 (Agisoft, St. Petersburg, Russia) to generate calibrated surface reflectance multispectral orthophotos. Image processing and analysis were performed with a i9-12900H processor, with 14 cores 32GB RAM, and a NVIDIA GeForce RTX 3080 16GB GPU. The orthophotos from each of the 3 sampling dates were resampled to a common 0.8-cm/pixel resolution and stacked into a 3-band RGB (i.e. red, green, blue bands) raster stack object. Further steps in the analysis consider only the RGB bands of the multispectral sensor for the following reasons: (i) RGB has proven to be highly sensitive and competitive with the rededge and near-infrared spectral regions of the electromagnetic spectrum for monitoring heading in Miscanthus (Varela et al. 2022b). (ii) The use of RGB bands allowed testing of TL as potential alternative approach into the analysis. Image chips for each plot/plant were generated by clipping the stacked orthophoto objects using a polygonal shapefile that includes each plot polygon of the trials. The resulting image chips containing the 3 dates of RGB bands were further split into single date matrix arrays in Python for further analysis. The size of the image chips was 108 pixels × 108 pixels × 3 RGB bands per date.

Data set

After accounting for plants that died due to lethal winter temperatures or never reached heading, a subset of 1,309 genetically diverse plants were identified for which ground truth data and UAV imagery were available on each of the 3 sampling dates during the growing season. This resulted in a data set of 3,921 instances of single-plant images and associated heading status.

Algorithms

KNN and RF

KNN is an extensively used algorithm for pattern classification (Weinberger et al. 2009). The proximity distance between individuals is used to determine class discrimination in a population. The core concept is that the closer the individuals are in the feature space, the higher probability of belonging to the same class (Short and Fukunaga 1981). The advantage of this method is the reduced number of parameters and fast computation, while the downside is the sensitivity to irrelevant features and difficulty for determining the optimal value of the parameter number of neighbors. In our study, after preliminary experimentation, parameter number of neighbors was set equal to 10.

RF is a versatile nonparametric algorithm that has been broadly used in classification tasks (Belgiu and Drăguţ 2016). It exploits bagging and feature randomness to build an ensemble of trees in which prediction by committee tends to be more accurate than in any of the individual trees (Breiman 2001). RF is straightforward to use and requires simple hyperparameter tuning to deliver high predictive performance. Another advantage of this algorithm is that it does not assume normal distribution of data or any form of association between the predictors and the response variable (Probst et al. 2019). Furthermore, as an ensemble of trees, RF is highly capable for managing overfitting. For implementation, parameters number of estimators and maximum depth of trees were optimized via GridSearchCV function in Python.

The KNN and RF algorithms require tabular features as inputs for modeling implementation. Tabular-based features were generated from the image chips using Numpy Stats functions in Python. Statistical descriptors median, range, Sd, percentile 75, percentile 95, and percentile 99 values were utilized to extract tabular feature values from the RGB bands of each of the image chips (based on structural and multispectral bands not contributing additional explanatory power in prior assessment of heading by Miscanthus in UAV images; Varela et al. 2022b). This process generated a total number of 18 features that were further used as inputs of the algorithms to determine the heading status of each of the plants (i.e. image chip level).

Custom CNN

CNN is a deep learning technique successfully utilized for image analysis (Teuwen and Moriakov 2020). The architecture of the algorithm consists of a series of hidden layers that map the input images to output values (Yamashita et al. 2018). The core component of the algorithm is the convolution operation, where a set of trainable kernels are applied to the input image to automatically generate a set of spatial features that best describe the target predictor. The model learns basic features in the first layers and more complex feature representations at deeper layers iteratively (i.e. via gradient loss and backpropagation) (Szegedy et al. 2015). The typical architecture of the algorithm includes a backbone feature generator and classifier or regressor head. In this study, the backbone feature extractor of the custom CNN includes 6 convolutional layers all including maximum pooling and batch normalization. Convolutional Layers 2, 4, and 6 additionally consider 40% features dropout, flattening layer. Then, the backbone feature extractor also included a fully connected layer, batch normalization, and 50% feature dropout. Finally, the classifier head of the CNN includes a sigmoid activation layer that delivers predictions as normalized probability distribution values (i.e. with panicles or without panicles) for each image chip. After preliminary experimentation, the number of features in each layer was set to 32, 32, 64, 64, 128, and 128. Zero padding, stride equal to 1 with no overlapping, and rectified linear unit (ReLU) activation function were also considered in the architecture design. The CNN's kernel filter size was set to 3 pixels × 3 pixels × 3 RGB bands, and max pooling was set equal to 2 following each convolution. Binary cross entropy was utilized as loss function of the classifier head of the neural network.

ResNet-50

TL is a deep learning technique that exploits stored knowledge gained while solving 1 problem that can be then applied to solve a different but related task (Yang 2008). This prior knowledge is stored in large neural networks and then transferred to solve a target task (Ribani and Marengoni 2019). This implies several advantages over training a custom CNN from scratch, e.g. reduction in computational resources and latency for delivering predictions (Jha et al. 2019), and boost in predictive ability over the target task (Wang et al. 2018). Deep neural networks trained on ImageNet data set have reported state-of-the-art performance in TL applications (Kornblith et al. 2019). ResNet-50 (Deng et al. 2009) is a deep 50-layer neural network specifically designed to exploit residual connections between convolutional layers trained on large ImageNet data set. This ensures that weights learned from previous layers do not vanish during backpropagation (Noh 2021), which represents an advantageous trick in the design that enables the use of a large number (i.e. deeper) of layers in the architecture of the network. For implementation, we followed the steps suggested by (Kornblith et al. 2019), where the strategy is as follows: (i) remove the original head of the pretrained neural network, (ii) add a custom binary classifier head, and (iii) fine-tune the top 5 layers while keeping bottom layers frozen. ResNet-50's pretrained weights and biases were imported from Keras (Chollet ()). The original image chips were resampled to a 128-pixel size to fit the input image size of the ResNet-50 network.

ESGAN

The core concept of GAN involves training deep generative networks based on game theory (Goodfellow et al. 2020). The model contains 2 CNN submodels: (i) a generator (G) and (ii) a discriminator (D) that are trained in an adversarial manner. Both G and D are trained to optimize the results, where the goal of G is to mislead D, and the goal of D is to distinguish between fake images generated by G and real images collected with the UAV. GAN has been successfully implemented for image generation (Ahmad et al. 2022), augmentation (Sandfort et al. 2019), and classification (He et al. 2017) tasks. Adapting GANs to the semi-supervised context by forcing the discriminator network to output class labels was discussed in Odena (2016). We explore this direction in the context of plant science applications. During the training process, the data generated by the generator (G) is used to train the discriminator (D). This process enabled D not only to distinguish between real and fake data but also to identify whether a plant has reached the heading stage (Fig. 6). Therefore, it is expected that D can learn features that allow for the discrimination of images with plants prior to, or after, heading using much less human-annotated training data than in FSL.

Figure 6.

Diagram of ESGAN and data workflow including the generator (G) and discriminator (D) submodels utilized to assess flowering status.

Open in new tab Download slide

ESGAN was implemented following the steps suggested in Salimans et al. (2016), while creating separated classifiers for supervised and unsupervised D (Fig. 7). First, the D supervised classifier is implemented to infer the 2 classes (i.e. plants prior to heading or after heading) from real images using Softmax activation function. Then, the supervised D produces predictive outputs for each image (i.e. between 0 and 1), which represent the normalized probability of the image belonging to the 2 image classes. The D unsupervised classifier is implemented by taking D supervised prior to Softmax activation (i.e. D supervised backbone feature extractor) and reusing its feature extraction layers weights. It then calculates the normalized sum of exponential outputs (i.e. between 0 and 1) via a custom function, which represents the probability of the image being real or fake. This means that updates to one of the classifier models will impact both models.

The supervised loss function (L_{D supervised}) is defined as the negative log probability of y when the correct class is allocated by x. L_{D supervised} focuses on correctly classifying input images to given labels.

L_{D supervised} = - E_{x, y \sim P_{data} (x, y)} lo g_{P_{model}} (y | x, y < K + 1) .

(1)

Unlabeled image loss functions constitute the unsupervised loss function (L_{D unsupervised}). $P_{model}$ (y = K + 1|x) represents the probability that x is fake (i.e. traditional GAN), corresponding to 1 − D(x) of GAN architecture. X_u denotes unannotated data samples. The unannotated real images are classified to one of the K classes by the first term of L_{D unsupervised}. The second term in the L_{D unsupervised} classifies the images generated by the G as K + 1 (fake).

L_{D unsupervised} = - E_{x, y \sim P_{data} (X_{u})} \log (1 - P_{model} (y = K + 1 | X_{u})) - E_{x \sim G (z)} l o g_{P_{model}} (y = K + 1) | x) .

(2)

By minimizing L_{D supervised} and L_{D unsupervised}, the classifiers are trained with gradient descent. D weights are stochastically updated by their gradient (Equation 3) at each training step via gradient descent of Equations (1) and (2).

\nabla_{θ_{d}} 1 / m \sum_{i = 1}^{m} - \log σ (x^{(i)}) y^{(i)} - \log D (x_{u}^{i}) - \log (1 - D (G (z^{(i)}))) .

(3)

For all m samples in a minibatch, σ(x)j = $P_{model}$ (y = j|x) (SoftMax function) was applied at the output of D supervised. After some preliminary experimentation, a 72-pixel image size was used as inputs for CNN and ESGAN given the negligent penalization in predictive performance but significant saving on computational time.

Figure 7.

Diagram of ESGAN architecture. Components include: G A) and D B) submodels with the corresponding inputs (left vector and cuboid), hidden layers (center cuboids and vectors), and outputs (i.e. right cuboid as fake image in G and classes predictions in D).

Open in new tab Download slide

Balanced sampling between annotated and unannotated images at each minibatch iteration was considered to ensure consistent performance of ESGAN during training. G initializes with a latent vector (Fig. 7A, orange vector) as input, which is then reshaped (Fig. 7A, green cuboid) and upscaled through 2 deconvolution (i.e. transpose convolution) operations (Fig. 6A, blue cuboids) into a fake image (Fig. 7A, yellow cuboid) that ensures match to the size of real images (Fig. 7A, yellow cuboid) as output of G. D inputs both real and fake 72 × 72 × 3 (Fig. 7B, orange cuboid) images. It is then followed by 4 convolutional operations and max pooling layers of size 2, followed by a flattening layer and 40% features drop out. The size of the convolutional kernel was 3 × 3 and the Leaky ReLU activation function was applied to all the layers of G and D, except for the output of G, which used the Tanh function. The standard Adam optimizer and learning rate equal to 0.0001 were employed in G and D submodels. The size of the convolutional kernel was 3 × 3. Each classifier could predict the input data to a label y from 2 K classes (plants with or without panicles) or to a fake sample (k + 1 class).

Algorithm implementation and metrics

KNN and RF were implemented using Scikit-learn library, while CNN, ResNet-50, and ESGAN were implemented in Keras, both in Python version 3.9.16. Each model fitting was iterated 3 times using a random training and testing partition to ease the convergence of the models’ prediction metrics. The number of image chips with the corresponding ground truth data was 3,921; 2,021 images came from the M. Sacchariflorus trial and 1,900 came from the M. sinensis trials. The full data set was split (80:20) into training and testing data sets. The training data set was split further (80:20) into training and validation data sets. The validation data set was used to optimize the models’ performance and prevent overfitting during training. CNN and ResNet-50 were trained for up to 300 epochs, while ESGAN was trained for up to 1,000 epochs. Early stopping strategies were incorporated in these 3 models to prevent overfitting, optimize performance, and reduce computational time. The test data set was utilized to expose the models to unseen data to evaluate the generalization ability of the models. As the number of annotated images used for training was altered to generate the 8 sample size cases (Fig. 1, A and B), the number of test images was held constant at the equivalent of 20% of the full data set. This ensures that all models are evaluated on the same test set and size. Training was implemented on batches.

Description of batch training loop of ESGAN is as follows:

1) train D with frozen G weights (i.e. nontrainable):
- a) train supervised classifier with annotated images, and determine L_{D supervised,} gradient computation, and D weights update.
- b) train unsupervised classifier with real images, and determine L_{D unsupervised}, gradient computation, and D weights update.
- c) train unsupervised classifier with fake images, and determine L_{D unsupervised}, gradient computation, and D weights update.
2) train G with frozen D weights (i.e. nontrainable):
- a) feed D with G fake images.
- b) extract features and calculate G loss.
- c) update G weights by their gradient.

The OA, F1 score, and ROC curve analysis were utilized as performance metrics of the models on classifying heading status (i.e. with or without visible panicles). OA and F1 score metrics are described in Equations (5) and (6):

OA = \frac{TP + TN}{TP + FN + FP + TN},

(4)

F 1 score = \frac{TP}{TP + \frac{1}{2} (FP + FN)} .

(5)

In Equations (5) and (6), true positive (TP) is defined as plants with panicles correctly classified as plants with panicles. True negative (TN) is defined as plants without panicles correctly classified as plants without panicles. False positive (FP) is defined as plants without panicles (i.e. ground truth) incorrectly classified as plants with panicles (i.e. positive class). False negative (FN) is defined as plants with panicles (i.e. ground truth) incorrectly classified as plants without panicles (i.e. negative class).

ROC analysis is especially useful for assessing models where the output is a probability score that can be thresholded to produce binary decisions. The technique involves plotting the ROC curve, which is a graphical representation of a classifier's diagnostic ability between TP rate and FP rate at various threshold settings. The area under the ROC curve quantifies the overall ability of the classifier to discriminate between positive and negative classes (Fawcett 2006).

Grad-CAM is a technique used in deep learning to visualize which parts of an image contribute the most to a model's decision. It highlights the regions of an input image that were more important for making a specific prediction (Selvaraju et al. 2020). The technique was implemented to improve the interpretation of the ESGAN D supervised classifier's learning process. The visualizing technique highlights the importance of different regions of the image in the output prediction by projecting back the weights of the output layer onto the convolutional feature maps (Zhou et al. 2016). As recommended by Chollet (), the following steps were used to generate the class activation maps. First, the ESGAN D supervised classifier mapped the input image to the activations of the last convolution layer as well as the output predictions. The gradient of the predicted value for the input image with respect to the activations of the last convolution layer was computed. Each image channel in the feature map array was weighed by how important this channel was with regard to the predicted value, and then all the channels were summed to generate the corresponding activation map array. The Grad-CAM activation map provided a measure of how strongly portions of the image contributed to the predictions made by the ESGAN D supervised classifier visualized in a 0 to 255 scale map array.

Acknowledgments

We thank Tim Mies at the Energy Farm at University of Illinois for technical assistance.

Author contributions

S.V. and A.D.B.L. conceived the study, interpreted the data, and wrote the manuscript. E.S., X.Z., and J.N. established, maintained, and collected the ground truthing data in the field trials. J.R. and D.A. collected the aerial data. S.V. implemented the data pipeline in the study.

Funding

This work was funded by the DOE Center for Advanced Bioenergy and Bioproducts Innovation (U.S. Department of Energy, Office of Science, Biological and Environmental Research Program under award number DE-SC0018420), Artificial Intelligence for Future Agricultural Resilience, Management, and Sustainability Institute (Agriculture and Food Research Initiative (AFRI) grant no. 2020-67021-32799/project accession no.1024178 from the USDA National Institute of Food and Agriculture), and a generous gift from Tito's Handmade Vodka. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the U.S. Department of Energy.

Data availability

The data sets used and coding implementation during the current study are publicly available from the online Illinois Databank at https://doi.org/10.13012/B2IDB-8462244_V2 and GitHub repository https://github.com/pixelvar79/ESGAN-Flowering-Detection-paper.

References

Ahmad

Ali

Shah

Azmat

A new generative adversarial network for medical images super resolution

Sci Rep

2022

(

9533

10.1038/s41598-022-13658-4

Ahmed

Alam

MDSB

Hassan

Rozbu

Ishtiak

Rafa

Mofijur

Shawkat Ali

ABM

Gandomi

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Artif Intell Rev

2023

(

13521

–

13617

10.1007/s10462-023-10466-8

Google Scholar

Crossref

WorldCat

Alzubaidi

Zhang

Humaidi

Al-Dujaili

Duan

Al-Shamma

Santamaría

Fadhel

Al-Amidie

Farhan

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

J Big Data.

2021

(

10.1186/s40537-021-00444-8

Belgiu

Drăguţ

Random forest in remote sensing: a review of applications and future directions

ISPRS J Photogramm Remote Sens.

2016

114

–

10.1016/j.isprsjprs.2016.01.011

Google Scholar

Crossref

WorldCat

Breiman

Random forests

Mach Learn.

2001

(

–

10.1023/A:1010933404324

Google Scholar

Crossref

WorldCat

Burke

Lobell

Satellite-based assessment of yield variation and its determinants in smallholder African systems

Proc Natl Acad Sci U S A.

2017

114

(

2189

–

2194

10.1073/pnas.1616919114

Chollet

. Keras applications API. Keras 2015a [accessed 2024 Jan 6]. https://keras.io/api/applications/.

Chollet

. Grad-CAM class activation visualization. Keras 2015b [accessed 2024 Jan 6]. https://keras.io/examples/vision/grad_cam/.

Clark

Dwiyanti

Anzoua

Brummer

Ghimire

Głowacka

Hall

Heo

Jin

Lipka

, et al.

Biomass yield in a genetically diverse Miscanthus sinensis germplasm panel evaluated at five locations revealed individuals with exceptional potential

Glob Change Biol Bioenergy.

2019

(

1125

–

1145

Clark

Stewart

Nishiwaki

Toma

Kjeldsen

Jørgensen

Zhao

Peng

Yoo

Heo

, et al.

Genetic structure of Miscanthus sinensis and Miscanthus sacchariflorus in Japan indicates a gradient of bidirectional but asymmetric introgression

J Exp Bot.

2015

(

4213

–

4225

Clifton-Brown

Schwarz

K-U

Awty-Carroll

Iurato

Meyer

Greef

Gwyn

Mos

Ashman

Hayes

, et al.

Breeding strategies to improve Miscanthus as a sustainable source of biomass for bioenergy and biorenewable products

Agronomy

2019

(

673

10.3390/agronomy9110673

Google Scholar

Crossref

WorldCat

Cooper

Powell

Gho

Tang

Messina

Extending the breeder's equation to take aim at the target population of environments

Front Plant Sci.

2023

1129591

10.3389/fpls.2023.1129591

Crowell

Korniliev

Falcão

Ismail

Gregorio

Mezey

McCouch

Genome-wide association and high-resolution phenotyping link Oryza sativa panicle traits to numerous trait-specific QTL clusters

Nat Commun

2016

(

10527

Deng

Dong

Socher

L-J

Fei-Fei

ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009. p. 248–255. Piscataway, NJ: IEEE

Dong

Clark

Jin

Anzoua

Bagmet

Chebukin

Dzyubenko

Ghimire

Heo

, et al.

Managing flowering time in Miscanthus and sugarcane to facilitate intra- and intergeneric crosses

PLoS One

2021

(

e0240390

10.1371/journal.pone.0240390

Eckardt

Ainsworth

Bahuguna

Broadley

Busch

Carpita

Castrillo

Chory

DeHaan

Duarte

, et al.

Climate change challenges, plant science solutions

Plant Cell.

2023

(

–

10.1093/plcell/koac303

Fan

Zhou

Wang

de Leon

Kaeppler

Lima

Zhang

Estimation of maize yield and flowering time using multi-temporal UAV-based hyperspectral data

Remote Sens (Basel).

2022

(

3052

Fawcett

An introduction to ROC analysis

Pattern Recognit Lett.

2006

(

861

–

874

10.1016/j.patrec.2005.10.010

Google Scholar

Crossref

WorldCat

Gan

Zhu

Guo

Zou

Deep semi-supervised learning with contrastive learning and partial label propagation for image data

Knowl Based Syst.

2022

245

108602

10.1016/j.knosys.2022.108602

Google Scholar

Crossref

WorldCat

Goodfellow

Pouget-Abadie

Mirza

Warde-Farley

Ozair

Courville

Bengio

Generative adversarial networks

Commun ACM

2020

(

139

–

144

Hayes

Chen

Powell

Dinglasan

Villiers

Kemper

Hickey

Advancing artificial intelligence to help feed the world

Nat Biotechnol

2023

(

1188

–

1189

10.1038/s41587-023-01898-2

Liu

Wang

Generative adversarial networks-based semi-supervised learning for hyperspectral image classification

Remote Sens (Basel).

2017

(

1042

Heaton

Dohleman

Long

Meeting US biofuel goals with less land: the potential of Miscanthus

Glob Chang Biol.

2008

(

2000

–

2014

10.1111/j.1365-2486.2008.01662.x

Google Scholar

Crossref

WorldCat

Hesami

Alizadeh

Jones

AMP

Torkamaneh

Machine learning: its challenges and opportunities in plant system biology

Appl Microbiol Biotechnol

2022

106

(

9–10

3507

–

3530

10.1007/s00253-022-11963-6

Iscen

Tolias

Avrithis

Chum

. Label propagation for deep semi–supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA: IEEE; 2019. p. 5070–5079.

10.1109/CVPR.2019.00521

Crossref

Jensen

Farrar

Thomas-Jones

Hastings

Donnison

Clifton-Brown

Characterization of flowering time diversity in Miscanthus species

Glob Change Biol Bioenergy.

2011

(

387

–

400

10.1111/j.1757-1707.2011.01097.x

Google Scholar

Crossref

WorldCat

Jha

Choudhary

Tavazza

Liao

W-K

Choudhary

Campbell

Agrawal

Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning

Nat Commun

2019

(

5316

10.1038/s41467-019-13297-w

Kamal

Yin

Transfer learning for fine-grained crop disease classification based on leaf images

In: Proceedings of the 2019 10th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS). New York (NY): Institute of Electrical and Electronics Engineers (IEEE)

2019

. p.

–

10.1109/WHISPERS.2019.8921213

OpenURL Placeholder Text

WorldCat

Crossref

Khan

Tufail

Khan

Iqbal

Alam

A novel semi-supervised framework for UAV based crop/weed classification

PLoS One

2021

(

e0251008

10.1371/journal.pone.0251008

Kornblith

Shlens

Do better ImageNet models transfer better?

In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach (CA): Institute of Electrical and Electronics Engineers (IEEE)

2019

. p.

2656

–

2666

10.1109/CVPR.2019.00277

OpenURL Placeholder Text

WorldCat

Crossref

Kumar

Desai

Balasubramanian

Rajalakshmi

Guo

Balaji Naik

Balram

Desai

Efficient maize tassel-detection method using UAV based remote sensing

Remote Sens Appl.

2021

100549

10.1016/j.rsase.2021.100549

Google Scholar

Crossref

WorldCat

Kurtulmuş

Kavdir

I˙

Detecting corn tassels using computer vision and support vector machines

Expert Syst Appl.

2014

(

7390

–

7397

10.1016/j.eswa.2014.06.013

Google Scholar

Crossref

WorldCat

Letsoin

SMA

Purwestri

Rahmawan

Herak

Recognition of sago palm trees based on transfer learning

Remote Sens (Basel).

2022

(

4932

Lewandowski

Clifton-Brown

Trindade

van der Linden

Schwarz

K-U

Müller-Sämann

Anisimov

Chen

C-L

Dolstra

Donnison

, et al.

Progress on optimizing Miscanthus biomass production for the European Bioeconomy: results of the EU FP7 project OPTIMISC

Front Plant Sci.

2016

1620

10.3389/fpls.2016.01620

Huang

Heinrichs

Ganal

Röder

Analysis of QTLs for yield components, agronomic traits, and disease resistance in an advanced backcross population of spring barley

Genome

2006

(

454

–

466

Lin

Duan

Lyv

Jin

Bian

Liu

Methodological evolution of potato yield prediction: a comprehensive review

Front Plant Sci

2023

1214006

10.3389/fpls.2023.1214006

Liu

Cen

Che

Detection of maize tassels from UAV RGB imagery with faster R-CNN

Remote Sens (Basel).

2020

(

338

Lotter

Diab

Haslam

Kim

Grisot

Onieva

Boyer

Boxerman

, et al.

Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach

Nat Med

2021

(

244

–

249

10.1038/s41591-020-01174-9

Martinez-Feria

Basso

Kim

Boosting climate change mitigation potential of perennial lignocellulosic crops grown on marginal lands

Environ Res Lett

2022

(

044004

10.1088/1748-9326/ac541b

Google Scholar

Crossref

WorldCat

Moen

Bannon

Kudo

Graf

Covert

Van Valen

Deep learning for cellular image analysis

Nat Methods

2019

(

1233

–

1246

10.1038/s41592-019-0403-1

Nagasubramanian

Jubery

Fotouhi Ardakani

Mirnezami

Singh

Sarkar

Ganapathysubramanian

How useful is active learning for image-based plant phenotyping?

Plant Phenome J.

2021

(

e20020

Noh

S-H

Performance comparison of CNN models using gradient flow analysis

Informatics

2021

(

10.3390/informatics8030053

Google Scholar

Crossref

WorldCat

Odena

. Semi-supervised learning with generative adversarial networks. arXiv 1606.01583.

10.48550/arXiv.1606.01583

, 5 June 2016, preprint: not peer reviewed.

O’Mahony

Campbell

Carvalho

Harapanahalli

Hernandez

Krpalkova

Riordan

Walsh

. Deep learning vs. traditional computer vision. In:

Arai

Kapoor

, editors.

Advances in computer vision

Cham

Springer International Publishing

;

2020

. p.

128

–

144

Parnami

Lee

. Learning from few examples: a summary of approaches to few-shot learning. arXiv 2203.04291.

10.48550/arXiv.2203.04291

, 7 March 2022, preprint: not peer reviewed.

Paullada

Raji

Bender

Denton

Hanna

Data and its (dis)contents: a survey of dataset development and use in machine learning research

Patterns

2021

(

100336

10.1016/j.patter.2021.100336

Poncet

Knappenberger

Brodbeck

Fogle

Shaw

Ortiz

Multispectral UAS data accuracy for different radiometric calibration methods

Remote Sens (Basel).

2019

(

1917

Probst

Wright

Boulesteix

Hyperparameters and tuning strategies for random forest

WIREs Data Mining Knowl Discov

2019

e1301

Rasmussen

Kirk

Moeslund

The challenge of data annotation in deep learning—a case study on whole plant corn silage

Sensors

2022

(

1596

Ribani

Marengoni

A survey of transfer learning for convolutional neural networks. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T). Brazil: IEEE, Rio de Janeiro; 2019

. p.

–

Sager

Janiesch

Zschech

A survey of image labelling for computer vision applications

J Business Anal

2021

(

–

110

10.1080/2573234X.2021.1908861

Google Scholar

Crossref

WorldCat

Salimans

Goodfellow

Zaremba

Cheung

Radford

Chen

Improved techniques for training GANs

Adv Neural Inf Process Syst

2016

2234

–

2242

Google Scholar

OpenURL Placeholder Text

WorldCat

Sandfort

Yan

Pickhardt

Summers

Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks

Sci Rep

2019

(

16884

10.1038/s41598-019-52737-x

Sapoval

Aghazadeh

Nute

Antunes

Balaji

Baraniuk

Barberan

Dannenfelser

Dun

Edrisi

, et al.

Current progress and open challenges for applying deep learning across the biosciences

Nat Commun

2022

(

1728

10.1038/s41467-022-29268-7

Selvaraju

Cogswell

Das

Vedantam

Parikh

Batra

Grad-CAM: visual explanations from deep networks via gradient-based localization

Int J Comput Vis

2020

128

(

336

–

359

10.1007/s11263-019-01228-7

Google Scholar

Crossref

WorldCat

Short

Fukunaga

The optimal distance measure for nearest neighbor classification

IEEE Trans Inf Theory.

1981

(

622

–

627

10.1109/TIT.1981.1056403

Google Scholar

Crossref

WorldCat

Shorten

Khoshgoftaar

A survey on image data augmentation for deep learning

J Big Data.

2019

(

10.1186/s40537-019-0197-0

Google Scholar

Crossref

WorldCat

Somerville

Youngs

Taylor

Davis

Long

Feedstocks for lignocellulosic biofuels

Science

2010

329

(

5993

790

–

792

10.1126/science.1189268

Szegedy

Liu

Jia

Sermanet

Reed

Anguelov

Erhan

Vanhoucke

Rabinovich

Going deeper with convolutions. In: Proc IEEE Conf Comput Vis Pattern Recognit

Los Alamitos, CA: IEEE Computer Society

;

2015

. p.

–

10.1109/CVPR.2015.7298594

OpenURL Placeholder Text

WorldCat

Crossref

Teuwen

Moriakov

. Convolutional neural networks. In:

Handbook of medical image computing and computer assisted intervention

London, United Kingdom

Elsevier

;

2020

. p.

481

–

501

Tran

Nguyen

Hassner

Transferability and Hardness of Supervised Classification Tasks

. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea. New York: IEEE;

2019

. p.

1395

–

1405

10.1109/ICCV.2019.00148

OpenURL Placeholder Text

WorldCat

Crossref

Tseng

H-H

Yang

M-D

Saminathan

Hsu

Y-C

Yang

C-Y

D-H

Rice seedling detection in UAV images using transfer learning and machine learning

Remote Sens (Basel).

2022

(

2837

van Engelen

Hoos

A survey on semi-supervised learning

Mach Learn

2020

109

(

373

–

440

10.1007/s10994-019-05855-6

Google Scholar

Crossref

WorldCat

Varela

Pederson

Leakey

ADB

Implementing spatio-temporal 3D-convolution neural networks and UAV time series imagery to better predict lodging damage in Sorghum

Remote Sens (Basel).

2022a

(

733

Varela

Zheng

Njuguna

Sacks

Allen

Ruhter

Leakey

ADB

Deep convolutional neural networks exploit high-spatial- and -temporal-resolution aerial imagery to phenotype key traits in Miscanthus

Remote Sens (Basel).

2022b

(

5333

Wang

Tran

Desai

Lobell

Ermon

Deep transfer learning for crop yield prediction with remote sensing data. In: Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies. New York, NY, USA: Association for Computing Machinery; 2018

. p.

–

Wang

Gao

Huang

Liu

Chandak

Liu

Van Katwyk

Deac

, et al.

Scientific discovery in the age of artificial intelligence

Nature

2023

620

(

7972

–

10.1038/s41586-023-06221-2

Wang

Lan

Liu

Ouyang

Qin

Chen

Zeng

Generalizing to unseen domains: a survey on domain generalization

IEEE Trans Knowl Data Eng

2022

8052

–

8072

10.1109/TKDE.2022.3178128

Google Scholar

Crossref

WorldCat

Wang

Fan

Wang

Comparative analysis of image classification algorithms based on traditional machine learning and deep learning

Pattern Recognit Lett.

2021

141

–

10.1016/j.patrec.2020.07.042

Google Scholar

Crossref

WorldCat

Wei

Liu

Zhi

Convolutional neural networks and local binary patterns for hyperspectral image classification

Eur J Remote Sens.

2019

(

448

–

462

10.1080/22797254.2019.1634980

Google Scholar

Crossref

WorldCat

Weinberger

Blitzer

Saul

Distance metric learning for large margin nearest neighbor classification

J Mach Learn Res.

2009

207

–

244

Google Scholar

OpenURL Placeholder Text

WorldCat

Weinstein

A computer vision for animal ecology

J Anim Ecol.

2018

(

533

–

545

10.1111/1365-2656.12780

Miller

de Leon

Kaeppler

Spalding

Predicting Zea mays flowering time, yield, and kernel dimensions by analyzing aerial images

Front Plant Sci.

2019

1251

10.3389/fpls.2019.01251

Xie

Fernandes

Mayfield-Jones

Erice

Choi

Lipka

Leakey

ADB

Optical topometry and machine learning to rapidly phenotype stomatal patterning traits for maize QTL mapping

Plant Physiol.

2021

187

(

1462

–

1480

10.1093/plphys/kiab299

Yamashita

Nishio

RKG

Togashi

Convolutional neural networks: an overview and application in radiology

Insights Imaging

2018

(

611

–

629

10.1007/s13244-018-0639-9

Yang

. An Introduction to transfer learning. In:

Tang

Ling

Zhou

Cercone

, editors.

Advanced data mining and applications

Berlin, Heidelberg

Springer

;

2008

. p.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Zan

Zhang

Xing

Liu

Zhang

Liu

Zhao

Automatic detection of maize tassels from UAV images by combining random forest classifier and VGG16

Remote Sens (Basel).

2020

(

3049

Zhao

Zheng

Chapman

Laws

George-Jaeggli

Hammer

Jordan

Potgieter

Detecting Sorghum plant and head features from multispectral UAV imagery

Plant Phenomics

2021

9874650

10.34133/2021/9874650

Zhao

Wang

Kirkegaard

Rebetzke

Novel wheat varieties facilitate deep sowing to beat the heat of changing climates

Nat Clim Chang

2022

(

291

–

296

10.1038/s41558-022-01305-9

Google Scholar

Crossref

WorldCat

Zhou

Khosla

Lapedriza

Oliva

Torralba

Learning Deep Features for Discriminative Localization

. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV. Institute of Electrical and Electronics Engineers (IEEE), Piscataway, NJ.

2016

; p.

2921

–

2929

. https:/doi.org/10.1109/CVPR.2016.319

OpenURL Placeholder Text

WorldCat

Zhou

Pan

Wang

Vasilakos

Machine learning on big data: opportunities and challenges

Neurocomputing (Amst).

2017

237

350

–

361

10.1016/j.neucom.2017.01.026

Google Scholar

Crossref

WorldCat

Author notes

The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (https://dbpia.nl.go.kr/plphys/pages/General-Instructions) is: Sebastian Varela (sv79@illinois.edu).

Conflict of interest statement. A patent on ESGAN has been filed by the University of Illinois Urbana-Champaign with A.D.B.L. and S.V. as inventors. The authors declare no conflict of interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
April 2025	437
May 2025	46

Article Contents

Breaking the barrier of human-annotated training data for machine learning-aided plant research using aerial imagery

Abstract

Introduction

Results

Benchmarks and ESGAN algorithms evaluation

Understanding the predictive improvement of ESGAN

Explaining ESGAN's learning via Grad-CAM

Evaluation of labor requirements for ESGAN versus human evaluation of heading status

Discussion

Conclusion

Materials and methods

Field trials

Trait of interest and ground truthing

Aerial data acquisition and imagery preprocessing

Data set

Algorithms

KNN and RF

Custom CNN

ResNet-50

ESGAN

Algorithm implementation and metrics

Acknowledgments

Author contributions

Funding

Data availability

References

Author notes

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Breaking the barrier of human-annotated training data for machine learning-aided plant research using aerial imagery

Abstract

Introduction

Results

Benchmarks and ESGAN algorithms evaluation

Understanding the predictive improvement of ESGAN

Explaining ESGAN's learning via Grad-CAM

Evaluation of labor requirements for ESGAN versus human evaluation of heading status

Discussion

Conclusion

Materials and methods

Field trials

Trait of interest and ground truthing

Aerial data acquisition and imagery preprocessing

Data set

Algorithms

KNN and RF

Custom CNN

ResNet-50

ESGAN

Algorithm implementation and metrics

Acknowledgments

Author contributions

Funding

Data availability

References

Author notes

Citations

Views

Altmetric

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only