Yunqin Zhng ,Deqin Xio, *,Youfu Liu ,Huilin Wu
a College of Mathematics and Informatics,South China Agricultural University,Guangzhou 510642,Guangdong,China
b Guangzhou National Modern Agricultural Industry Science and Technology Innovation Center,Guangzhou 511458,Guangdong,China
Keywords:Improved Faster R-CNN Rice spike detection Rice spike count Developmental stage identification
ABSTRACT Spike development directly affects the yield and quality of rice.We describe an algorithm for automatically identifying multiple developmental stages of rice spikes(AI-MDSRS)that transforms the automatic identification of multiple developmental stages of rice spikes into the detection of rice spikes of diverse maturity levels.The scales vary greatly in different growth and development stages because rice spikes are dense and small,posing challenges for their effective and accurate detection.We describe a rice spike detection model based on an improved faster regions with convolutional neural network(Faster R-CNN).The model incorporates the following optimization strategies:first,Inception_ResNet-v2 replaces VGG16 as a feature extraction network;second,a feature pyramid network (FPN) replaces single-scale feature maps to fuse with region proposal network (RPN);third,region of interest (RoI) alignment replaces RoI pooling,and distance-intersection over union (DIoU) is used as a standard for non-maximum suppression (NMS).The performance of the proposed model was compared with that of the original Faster R-CNN and YOLOv4 models.The mean average precision (mAP) of the rice spike detection model was 92.47%,a substantial improvement on the original Faster R-CNN model (with 40.96% mAP) and 3.4%higher than that of the YOLOv4 model,experimentally indicating that the model is more accurate and reliable.The identification results of the model for the heading-flowering,milky maturity,and full maturity stages were within two days of the results of manual observation,fully meeting the needs of agricultural activities.
Accurate observation of the developmental stages of rice spikes(DSRS) can guide precise management and control aimed at achieving high rice quality and yield [1].Observation of rice spike developmental stages has long been performed manually,requiring observers to conduct on-site sampling and observation [2].However,this process is time-consuming,labor-intensive,and inefficient.Moreover,owing to the similarity among plants and the subjectivity of observers,it is very difficult to count rice spikes accurately in a large area.For these reasons,the accuracy of rice spike development records is limited [3].There is an urgent need to develop an automatic identification algorithm for DSRS.
With the maturation and popularization of computer vision technology,an increasing number of researchers have continuously observed a small area in a field that can represent the entire field to yield the DSRS.They acquire image sequences,examine the morphological features of rice spikes in the image,and identify DSRS based on the evolution law of morphological features.Bai et al.[4] collected the front down view image in a rice field,used color information to extract the coverage of the rice spike region,and judged whether the rice spike had entered the heading stage based on the change in coverage.Cao et al.[5] used the frontdown view image collected in the rice field as the object,used color information to extract the rice spike region,calculated the spike curvature of the rice spike angle detection area,and determined whether the rice spike had entered the milky maturity(MM)stage based on the spike curvature.Soontranon et al.[6] used a vegetation index to monitor the growth stage of rice on a small scale based on the shape model fitting (SMF) method and roughly divided the growth period of rice into seedling,tillering,heading,and harvest stages using vegetation index graph analysis.However,the current automatic identification methods for DSRS features have been studied for only a single critical stage,and the rice spike area must be segmented,an operation that is greatly affected by windy weather or complex scenario and is not practical.
Rice spikes exhibit varying morphological features such as size,shape,and color during the growth stages from heading to harvest.During the heading-flowering (HF) stage,rice spikes are dotted with small white glume flowers;at the MM stage,there are no glume flowers,but the spikes are bent,drooping,or divergent,and turn yellow at the full maturity (FM) stage.According to the Specifications for Agrometeorological Observation-Rice [2],a developmental stage can be recognized when 10%of plant individuals have entered the stage[7].Thus,the automatically identifying multiple developmental stages of rice spikes (AI-MDSRS) can be transformed into the detection of rice spikes at multiple developmental stages by identifying sufficient rice spikes.Automatic counting of spikes using the spike target detection algorithm allows effective and efficient identification of rice developmental stages.
There have been some advances in automatically counting plant spikes based on digital images.The operation is divided into two main categories: segmentation methods based on color [8] and texture [9],and classification methods based on pixel-level color feature candidate regions [10] and superpixel fusion to generate candidate regions [11].Although these methods can detect plant spikes,their accuracy still requires improvement.
Deep learning is a new type of high-precision target detection method that is widely used in agricultural applications[12].These applications include detection and counting of corn kernels [13]and rice spikes and kernels,plant leaf identification and counting[14],and wheat spike detection and counting [15,16].There have also been advances in research on rice spike counting.Duan et al.[17]collected rice plant images from multiple angles and proposed a method for automatically determining spike number in potted rice plants.Xu et al.[18] proposed a robust rice spike-counting algorithm based on deep learning and multiscale hybrid windows,enhancing rice spike features to detect and calculate the number of small-scale rice spikes in large-area scenes.However,existing automatic counting algorithms for rice spikes are limited to detecting the spikes at a specific developmental stage and still stays in this part,and there has been no further application research based on actual scenarios.No study has developed an algorithm for detecting rice spike targets based on multiple developmental stages,and further developmental stage identification has been completed.
Aiming at the multi-scale problem of rice spike detection at different developmental stages,especially the detection of small target rice spikes,a new automatic identification algorithm based on improved Faster R-CNN for MDSRS was proposed in this paper.Based on improved Faster R-CNN model,the algorithm realizes the automatic extraction of image features at different developmental stages of rice spikes,and accurately detect the corresponding developmental stage of rice spikes using the real-time rice image sequence acquired by ground monitoring equipment.
The rice variety used in this study was JinNongSiMiao.We used the rice growth monitoring station to take regularly rice images in two rice planting scenarios: potted and field rice (Fig.1A).
In the pot scenario,rice was planted in pots of diameter approximately 40 cm,and each pot was divided into three holes for transplanting seedlings with a hole spacing of 10-12 cm,which met planting standards for planting interval and spacing.Nine pots were enclosed in a monitoring area to simulate an area blocks of three rows and three columns in the field environment.Because of the height limitation of the greenhouse,cameras 1 and 2 (DS-2DC4420IW-D,Hikvision),which were fixed on a beam 2.5 m above the ground,captured images of the potted rice from an overhead perspective.The resolution of the images taken was 1920 × 1080 pixels.There was negligible difference between the images taken by the two cameras;only nine pots of rice could be seen under one camera,and the size of the monitoring area was approximately 1.2 m2.Two cameras were set up to increase the dataset,a measure beneficial for improving the performance of the training model.In addition,during testing,it is equivalent to repeating the test for the same period,verifying the reliability of the model and reducing the test error.In the large field scenario,the cameras(DS-2DC7423IW-A,Hikvision) of field plot 1 and field plot 2 were mounted on a crossbar 2.5 m above the ground,and the images taken were the front lower view with a resolution of 1280 × 720 pixels.The actual area S of the images taken by the two field cameras was estimated from reference discs with areas of 9.89 and 2.43 m2(Fig.1B,C).
In the study area,double-season rice,including early and late rice,is grown annually.Images were acquired at regular intervals by the rice growth monitoring station for the rice monitoring areas in the greenhouse pot and field.The images were collected hourly from 8:00 to 17:00 every day.The image sequences acquired in the above two scenarios from 2019 to 2021 are presented here,and the critical developmental stages of rice spikes were recorded by professionals(Table S1).The first 1-8 sequences are image sequences of potted rice,where markers I and II in the camera column indicate that the transplanting stages of rice in the two camera monitoring areas are different and identical,respectively.The last 9-16 sequences are image sequences of rice in the field,including two different field plots.
To identify the three key developmental stages of rice spikes,according to the different morphological features of rice spikes in different growth and developmental stages,we used the annotation software‘‘LabelImg”to manually label three types of rice spike different maturity levels by recording the coordinates of the smallest outer rectangle of the spikes: the HF stage (rice spike maturity is ripe 1 level),with small white glume flowers dotted on rice spikes;the MM stage (rice spike maturity is ripe 2 level),with no glume flowers and rice spikes bending and drooping,with some diverging;and the FM stage (rice spike maturity is ripe 3 level),with rice spikes turning yellow (Fig.S1A-C).
Image sequences collected in greenhouse pots and fields from 2019 to 2021 were divided.Deep learning networks were trained using four image sequences from 2019 to build rice spike target detection models for multiple developmental stages in the two scenarios.The remaining four image sequences were used to verify the effectiveness of the automatic identification algorithm for MDSRS.During the 2019 trial,rice spike images with high similarity were screened out,and 1600 original images were obtained in a greenhouse potted scenario;of these,573 images were obtained for the HF stage (class I),584 for the MM stage (class II),and 443 for the FM stage (class III).A total of 720 original images were obtained for the field scenarios,including field plots 1 and 2,of which 168 were obtained for the HF stage (class I),112 for the MM stage(class II),and 80 for the FM stage(class III).To construct a multi-class rice spike target detection model based on deep learning,we randomly divided these datasets into training,validation,and test sets with membership proportions of respectively 0.60,0.25,and 0.15 relative to the full number of images,for training,validating,and testing the rice spike target detection model(Table S2).
Fig.1.Rice image acquisition sites and actual monitoring-area estimation.(A)Experimental site of rice image acquisition in two scenarios.(B)Area of rice monitoring area in the field.Field plot 1 monitoring area S1=S0/L1,the proportion of reference disc pixels in field plot 1 image L1=1/356.52.Field plot 2 monitoring area S2=S0/L2,the proportion of reference disc pixels in field plot 2 image L2=1/87.65.The reference disc area S0=277.45 cm2.
The AI-MDSRS algorithm consists of three main parts: (1)expansion of image data,(2) construction and training of a rice spike detection model,and (3) establishment of a correlation between image spike density and developmental data based on the rice spike detection model (Fig.2).The following subsections present a specific description of the AI-MDSRS algorithm.
During trials,the image data obtained are often much less abundant than those meeting the training requirements of deep learning models.To solve this problem,it is generally necessary to perform image enhancement and data expansion on the training-set samples.The larger the scale of the data and the higher the quality,the better is the generalization ability [21].
Owing to the uncertainty of weather changes,image lighting variation is relatively large.In this study,the contrast transform was used to process the original images to enrich the training set of rice spike images and avoid overfitting (Fig.S2).In the HSV(hue,saturation,value) color space of the image,the saturation S and luminance V components were changed,keeping the hue H constant,and the S and V components of each pixel were subjected to an exponential operation (an exponential factor between 0.25 and 1.5,with an adjustment step of 0.25)to increase the illumination variation.Alternatively,data expansion is a common method for extending the variability of the training data by artificially scaling up the dataset via transformations that preserve labels.In this study,we used three typical data-augmentation techniques to expand the dataset: translation,rotation,and random cropping(Fig.S2).
The essence of automatic rice spike counting is the identification and location of rice spikes,an operation that is consistent with target recognition and location solved by target detection,thus transforming the rice spike-counting problem into a rice spike target-detection problem.
The detection methods for image target detection can be divided into two categories.One is a two-stage target detection method based on candidate regions,which first identifies candidate frames of interest and then performs category score prediction and bounding box regression for the targets in the candidate bounding boxes.The second is a one-stage target detection method,which achieves target localization while predicting the target category and belongs to the detection method of integrated convolutional networks.Faster R-CNN [19] is representative of a two-stage target detection network,characterized by low rates of recognition error rate and missed recognition.Unlike one-stage target detection networks such as YOLOv4 [20],the prominent feature of these network models is their speed,but their accuracy is slightly lower than that of two-stage target detection networks.In the present study,the Faster R-CNN network model was selected to identify and localize rice spikes in order to reduce the rice spike counting error.
Fig.2.General structure of AI-MDSRS algorithm.
The Faster R-CNN network structure is divided into three parts:a convolutional network for feature extraction,a region proposal network (RPN) for generating candidate boxes,and a detection subnetwork.The VGG16 network adopted by Faster R-CNN and the settings of the anchor box specifications are biased towards large targets.Some constraints influence the detection of small targets.The features of small targets are sparse and easily lost,and the feature extraction process is different from that of large targets,making the algorithm unsuitable for multiscale target-detection problems.In the study,considering the problems of highprecision Faster R-CNN network in detecting small target objects[22],combined with the problems of small and dense rice spike,and large span of rice spike scales in different growth stages,we improved the Faster R-CNN network pertinently (Fig.S3).Specific improvement strategies are described in the following subsections.
3.3.1.Inception-ResNet-v2 feature extraction network
Replacing the feature extraction structure is the most common way to improve Faster R-CNN networks,such as by replacing them with new networks such as ResNet [23] and DeseNet [24] and replacing the backbone network with lightweight networks such as MobileNet [25] and SqueezeNet [26] for easy mobile applications.To solve the detection problem of scale difference of rice spikes and smaller rice spikes in images,this paper intercepts the original Inception-Resnet-V2 network as the Inception-Resnet-C module,which is used as the feature extraction network for Faster R-CNN.The stem module is shown in Fig.S4A.
(1) Inception structure.
The inception network (GoogLeNet) [27] is a milestone in the development of convolutional neural network (CNN) classifiers.Before the emergence of inception,the most popular CNNs simply stacked more convolutional layers to improve accuracy by deepening the layers and depth of the network.However,it also incurs huge computational consumption and overfitting problems.GoogLeNet is characterized by the use of an inception structure,as shown in Fig.S4B.First,several convolutional or pooling operations are performed in parallel on the input image using 1 × 1,3×3,and 5×5 convolutional kernels to extract features to obtain several kinds of information from the input image.Feature fusion is then performed using concatenation operations to yield better image representations.Targets with more global information distribution prefer larger convolutional kernels,whereas targets with more local information distribution prefer smaller convolutional kernels.This problem is solved by concatenating filters of different sizes in the same layer,widening the network.To reduce the computational cost of the larger(5×5)convolution kernel,a 1×1 convolution was added to the later inception structure to reduce dimensionality.The 1 × 1 convolution also increases nonlinearity while maintaining the original structure,so that the deeper the network,the more high-dimensional features of the image are represented.
(2) ResNet module.
Increasing the depth of the network will cause the gradient to disappear or explode and even reduce accuracy.For this reason,the residual network was proposed.Its core aim was to solve the degradation problem caused by increasing the network depth,making it feasible to improve network performance by simply increasing network depth.The residual structure is illustrated in Fig.S4C,where the input tensor is x and the learning residual function is F(x)=H(x)-x.When the model accuracy reaches saturation,the training goal of the redundant network layers approximates the residual result to zero,that is,F(x)=0,to achieve a constant mapping so that training accuracy does not degrade as the network deepens.
3.3.2.Feature pyramid networks in RPN
This study used a feature pyramid network (FPN) instead of single-scale feature maps to adapt to the RPN network,which shares the C2-C5 convolutional layer with the improved Faster R-CNN detection network.The region of interest (RoI) and region score were obtained on all feature maps through the RPN and FPN.The region with the highest score was used as the candidate region for the various types of rice spikes.The prediction feature layer [P2,P3,P4,P5,P6] in the top-down transmission module of the improved Faster R-CNN model has receptive fields of multiple scales and can detect target rice spikes of multi-scale.
As shown in Fig.S4D,a network header was attached to each layer of the feature pyramid.It was implemented using a 3 × 3 convolutional layer,followed by two 1×1 convolutions for classification and regression.Because the head slides densely at each position of each pyramid layer,multi-scale anchor boxes are not required at a specific layer.Instead,a single-scale anchor box is assigned to each layer.In this study,according to the particularity of the target rice spike scale,the scale of the anchor boxes corresponding to the prediction feature layer [P2,P3,P4,P5,P6] was set to {162,322,642,1282,2562},and using three aspect ratios{1:1,2:1,1:2},there were 15 anchor boxes in the pyramid.
(1) RoI Align module.
The RoI pooling layer in Faster R-CNN can map the candidate frames generated by the RPN to the feature map output from the shared convolution layer,obtain the RoI of the candidate regions on the shared feature map,and generate a fixed-size RoI.This process requires two quantization operations(rounding floating point numbers),which cause the candidate boxes to deviate from the position returned at the beginning,and the RoI mapping of the feature space to the original map will have a large deviation,affecting the detection accuracy.Because of the small size of the rice spikes in this study and their growth and development,the rice spikes in the image will become dense.For small and dense target objects,the extraction accuracy of the RoI is particularly critical.In this study,we used RoI Align to cancel the quantization operation and used bilinear interpolation to obtain pixel values with floating-point coordinates to solve the problem of region mismatch caused by two quantizations in the RoI pooling operation [27].
If ROI Align is used with an FPN,the RoI of different scales must be assigned to the pyramid layers.The pyramid layer with the most suitable size is selected to extract the RoI feature block.Taking the input 224 × 224 pixel image as an example,the RoI of width w and height h(on the input image)is assigned to the pyramid layer Pk,and the calculation formula (1) is as follows:
where k is the feature pyramid layer,k0represents the target layer mapped to w×h=2242RoI,and k0was set to 5 in this study.Formula (1) indicates that if the scale of the RoI becomes smaller(such as 1/2 of 224),it should be mapped to a finer layer (such as k=4).
(2) Non-maximum suppression method.
Non-maximum suppression (NMS) is a necessary postprocessing step in target detection.In the original NMS,the intersection over union(IoU)indicator was used to suppress redundant bounding boxes and leave the most accurate bounding box.The calculation form is shown in formulas (2) and (3).Because the IoU-NMS method considers only the overlapping area,it often results in false suppression,particularly when the ground-truth box contains the bounding box.In this study,we used distance-IoU (DIoU) as the standard for NMS,and the DIoU_NMS method addresses the problems of IoU by considering not only the overlap area but also the centroid distance,somewhat increasing the speed and accuracy of the bounding-box regression,and its calculation form is shown in formulas (4) and (5).As shown in Fig.S5,A and B indicate that the two bounding boxes (blue) have the same size and that the IoU is also the same.The IoU-NMS method could not distinguish between intersections with the ground-truth box(red).For this case,C and D calculate the difference between the minimum bounding rectangle (yellow) and the union (yellow block) of the intersection and increase the measurement of the intersection scale to distinguish the relative position relationship.For a situation in which the ground-truth box contains the bounding box,the DIoU_NMS method directly measures the Euclidean distance between the center points of the two boxes,as shown in E.In particular,when the distances between the center points of the two bounding boxes are equal,the scale information of the aspect ratio of the bounding boxes is considered,as shown in F and G.
where B and Bgtdenote the bounding and ground-truth boxes,respectively.
where siis the classification confidence,ε is the conventional NMS threshold,and M is the box with the highest confidence level.
where b and bgtdenote the centroids of B and Bgt,respectively,ρ(.)is the Euclidean distance,and c is the diagonal length of the box that minimally encloses B and Bgt.
where siis the classification confidence,ε is the DIoU_NMS threshold,and M is the box with the highest confidence level.
3.3.3.Offline training of the model
Before the rice spike detection network was trained,it was initialized and pre-trained on the ImageNet dataset and then learned and fine-tuned on its own dataset.The experimental running environment was Intel i9-9900 K CPU,3.40 GHz,64 GB RAM,NVIDIA 2080ti GPU,Ubuntu 16.04 LTS operating system,CUDA version10.0,and TensorFlow 14.0 as the deep learning framework,and Python3.7 as the programming language.The network parameters for the training phase were designed as shown in Table S3.
3.3.4.Evaluation indicators of the model
The accuracy curve in formula (6) was drawn to evaluate the performance of the training model.To verify the generalizability of the training model,its precision and recall rates were evaluated by the calculation formulas shown in (7) and (8).The accuracy of multiclass target detection was evaluated based on the mean average precision (mAP) to measure the quality of the trained model in all categories.The mAP is the average of the sum of the average precisions (AP) over all categories.The formula is shown in (9).
In formulas (6-8),TP is the number of correctly detected rice spikes,FP is the number of incorrectly detected rice spikes,FN is the number of missing rice spikes or incorrectly detected backgrounds,and TN is the number of correctly detected backgrounds.
where AP is the area under the precision-recall curve and n is the number of categories.
Based on the rice spike detection model,to detect multiple times to reduce the error of target detection,n images corresponding to n moment points are collected online every day and input into the trained rice spike detection model.We first estimated the number of rice spikes with various maturities in each image collected daily and then calculated the spike density of various images daily using formula (10).
where n is the number of images collected online in a day,S is the actualareaofthecameramonitoringarea,bboxrepresentsthebound-
ing box obtainedfrom the target detectionnetwork,Numj(b boxi)represents the number of the ithtype of rice spikes detected in the jthimage,and Ωiis the density of the ithtype of image spike.
To establish the correlation between the image spike density and the development date of rice spikes,the fitted curves of daily changes in spike density of rice spike images with diverse maturities were plotted for multiple developmental stages.When the spike density curve showed a rapid upward trend for the first time,the arrival of the developmental stage could be determined.Two commonly used indicators were used to evaluate the goodness of curve fitting: the coefficient of determination (R2) and the root mean square error(RMSE).These indicators were calculated using formulas(11)and(12).
Loss values and accuracy rates changed during the model training process (Fig.S6).The loss value decreased with an increase in the training epoch.During the first 30 training epochs,the loss value decreased rapidly.After 80 epochs,the loss value remained stable in the range of 0.05 to 0.1.After 204 epochs,the training was stopped and the model converged.When the model loses,the accuracy of the model increases with the number of training epochs on both the training and validation sets,increases rapidly in the first 30 epochs,slows from 40 to 60 epochs,and stabilizes after 80 epochs.After training was completed,the accuracy of the model was as high as 97.55% and 96.68% on the training and validation sets,respectively.In summary,the loss of the model training process and the trend of the training set accuracy,as well as the validation set accuracy in different epochs,reflect the performance of the rice spike detection model.
Based on the original Faster R-CNN network model,this study proposes a targeted improvement strategy to optimize the effect of the rice spike detection model.First,the advanced Inception-ResNet-v2 network was used to replace VGG16 as the backbone of Faster R-CNN (strategy 1);then,the FPN was used to replace the single-scale feature map and merge it with RPN to generate the candidate regions at different scales (strategy 2);then,in the detection sub-network,not only the RoI Align module was used to replace the RoI pooling quantization operation (strategy 3) but DIoU instead of IoU was used as an indicator of NMS (strategy 4).This study verified the optimization effect on model performance by adding the improvement strategies one by one (Table 1).
Table 1 Optimization effect of rice spike detection model performance.
First,the use of data enhancement not only expanded the number of data samples but also increased the diversity of the data,making the network model better trained and greatly increasing the detection accuracy by 12.92% of mAP.Second,the use of the advanced Inception-ResNet-v2 network to replace the VGG16 network increased the detection accuracy of the model by 5.82%of the mAP.This is because the Inception module of the Inception-ResNet-v2 network replaced the fully connected layer with a global mean pooling layer to reduce the number of parameters and parallelize convolutional kernels of different sizes to capture receptive fields of different sizes,as well as combining residual connections to allow shortcutting in the model.Thus,deeper networks can be trained to produce better performance.Third,the detection accuracy of the Faster R-CNN model fused with FPN greatly improved by 13.73% of mAP,and the detection accuracy of each of the three types of rice spikes was substantially improved,especially for the smaller target rice spikes (ripe 1),reflecting the importance of multi-scale feature fusion for the detection of rice spikes of diverse sizes.Fourth,in the detection sub-network,the use of the RoI Align module instead of the RoI pooling quantization operation was ben-eficial for increasing the extraction accuracy of RoI.The use of DIoU instead of IoU as an indicator of NMS was beneficial for increasing the regression speed and accuracy of the bounding boxes,increasing the model detection accuracy by 2.74% and 5.18% mAP,respectively.
To further verify the superiority of the rice spike detection model,its detection accuracy was compared with that of the YOLOv4 model (Table 2).
Because the feature extraction layer of YOLOv4 adopts the feature pyramid down-sampling structure and the mosaic data enhancement method during training,it showed good results(89.07% mAP) for the detection of small target rice spikes.In view of the particularity of the dataset,some optimization strategies were designed to improve the Faster R-CNN network,resulting in a 92.47% mAP for rice spike detection,a large improvement over the original Faster R-CNN model without the improvement strategy (40.96% mAP),and 3.40% higher than the YOLOv4 model.In addition,on the test set,although the average detection speed of the YOLOv4 model is about 4.6 times higher than that of the rice spike detection model developed in this study,the average detection speed of the rice spike detection model is about 0.2 s,and this waiting time has negligible impact on user experience perception.
This study shows the detection results of the rice spike detection model with the original Faster R-CNN model and the YOLOv4 model in two scenarios:greenhouse pot(Fig.S7)and field(Fig.S8)samples.Because of the large image size,some details of the detection results of the three models were compared and analyzed.First,for the identification of small target rice spikes,the original Faster R-CNN model has a severe rice spike leakage problem(vs 1,vs 6),and both the original Faster R-CNN model and the YOLOv4 model have the problem of leaf misidentification when the background is similar to the color of the rice spike,whereas the rice spike detection model can solve this problem (vs 2 &vs 4,vs 6 &vs 7).Second,when rice spikes are more than 50 %occluded,both the original Faster R-CNN model and the YOLOv4 model ignore them,whereas the rice spike detection model fully detects them (vs 1 &vs 3,vs 6 &vs 7).Thus,partial occlusion of rice spikes the background leaves and partial overlap among rice spikes do not affect the detection accuracy of the rice spike detection model.Third,when spikes are smaller and their color is brighter,the YOLOv4 model incorrectly identifies them as background,whereas the rice spike detection model can still detect them correctly (vs 3,vs 7).The detection boxes of the rice-spike detection model better surround the rice spike and have a higher confidence score (vs 5).For clearer comparison of the three models,the examples given for the large-field scenario do not show the confidence scores of the rice spike detection result to avoid occlusion.
Both scenarios in this study employed natural lighting conditions,and time-series images throughout the growth phases of the rice spike needed to be acquired periodically.Owing to the uncertainty of weather conditions,the brightness and sharpness of the acquired images differed between sunny and cloudy days and between morning and afternoon hours.To investigate whether diverse lighting conditions affected the accuracy of the rice-spikedetection model,348 images in the test set were divided into strong and weak lighting groups based on variation in lighting intensity (Table 3).The recall rate (R) under weak lighting conditions was slightly lower than that under strong lighting conditions,but the difference was not significant.The detection precision (P)was as high as 97.21% for both,indicating that the lighting condition had little effect on the detection precision of the model.
Table 2 Comparison of the detection accuracy of the rice spike detection model in this study with the YOLOv4 model.
Table 4 Comparison results of automatic identification (AI) and manual recording (MR) of rice spikes with multiple developmental stages.
Fig.3.Curves of image density versus development days for each type of rice spike.(A)Image sequence of camera 1 for the 2020 late rice potted scenario.(B)Image sequence of field 2 for the 2020 late rice field scenario.(C)Image sequence of field 1 for the 2021 early rice field scenario.Ripe 1 is the maturity of rice spike at heading-flowering stage,Ripe 2 is the maturity of rice spike at milky maturity stage,Ripe 3 is the maturity of rice spike at full maturity stage.
Fig.4.Variation of daily increment of spike density for each type of rice spike image.(A) Image sequence of camera 1 for the 2020 late rice potted scenario.(B) Image sequence of field 2 for the 2020 late rice field scenario.(C) Image sequence of field 1 for the 2021 early rice field scenario.Ripe 1 is the maturity of rice spike at headingflowering stage,Ripe 2 is the maturity of rice spike at milky maturity stage,Ripe 3 is the maturity of rice spike at full maturity stage.
Because rice spike detection requires much time to process all the daily rice images,three images corresponding to each of three momentary points at an interval of 10 min at 9:00 AM every day(good lighting conditions) were collected online.The images were input into the rice spike detection model for the automatic counting of various rice spikes,and the daily spike density of each type of image after averaging was calculated.
The curves of the detected image spike densities of each type of image and their relationship with the development date for the image sequences of camera 1 of the 2020 late rice potting scenario(Fig.3A),field plot 2 of the 2020 late rice field scenario (Fig.3B),and field plot 1 of the 2021 early rice field(Fig.3C),are presented.First,the fitted spike densities of the various image types were in good agreement with the true values,and all had the highest coefficient of determination with the lowest root mean square error.This finding indicates that variation in spike density with development date was reflected by fitted curves.Second,for the three types of rice spikes with different maturity levels in different seasons and monitoring areas,their image spike densities showed the same trend as the development date,with ripe 1 and ripe 2 following a bell-shaped variation pattern and ripe 3 following an Sshaped pattern.
The initial developmental stage of rice spikes can be identified if a sufficient number of rice spikes is detected.To determine the HF,MM,and FM stage of rice spikes,this study yielded the following conclusion by experimental analysis based on the image spike density fitting curve of rice spikes of three categories with differing maturities: if the spike density of a certain type in the image of that day increased by more than 2 times compared with that of the previous day,then the fitting curve of spike density in the image of that day showed a rapid upward trend for the first time,so the date of that day was determined to be the beginning of the development stage of the corresponding rice spike type.The following figure shows the change of daily increment of spike density of three types of rice spike with different maturity in the above image sequence,and the threshold is shown in red line (Fig.4A-C).
Seven image sequences for the remaining two scenarios from 2020 to 2021 were tested for automatic identification of MDSRS.The identification results of the MDSRS using the proposed automatic identification algorithm and manual recording are shown in Table 4.The comparison results were used to verify the identification effect of the proposed automatic identification algorithm for multiple developmental periods of the rice spikes.
The automatic identification results for the three rice spike developmental stages were compared with manual observations(Table 4).The maximum error at each stage was no more than two days,indicating that the automatic identification algorithm was reliable for reporting the early developmental stages of rice spikes.
In conclusion,the rice spike detection model based on improved Faster R-CNN is a reliable model with high accuracy.This model had an mAP as high as 92.47%,with a large improvement over the original Faster R-CNN model (40.96% mAP) and the YOLOv4 model (mAP increased by 3.4%).The rice spike detection model based on improved Faster R-CNN had small fluctuations when used to detect various rice spikes under diverse lighting conditions,and the detection accuracy was as high as 98.01%.Compared with manual observation,the rice spike detection model based on improved Faster R-CNN yielded only a 0.7-1.1 day error in identifying the initiation date of the HF,MM,and FM stages of rice.However,the present result was obtained using only one rice variety and awaits validation with more varieties.We established the image spike density of various types of rice spikes by introducing monitoring area S to estimate the developmental stages of rice spikes with diverse maturities.Therefore,in order to increase the generalizability of the algorithm,spike images were collected from multiple planting densities.
CRediT authorship contribution statement
Yuanqin Zhang:Methodology,Software,Validation,Writing -original draft,Writing-review&editing.Deqin Xiao:Conceptualization,Supervision,Project administration,Funding acquisition.Youfu Liu:Software,Visualization.Huilin Wu:Validation,Formal analysis.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the Key-Area Research and Development Program of Guangdong Province (2019B020214005) and Agricultural Research Project and Agricultural Technology Promotion Project of Guangdong (2021KJ383).
Appendix A.Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2022.06.004.