Temporal sequence Object-based CNN (TS-OCNN) for crop classification from fine resolution remote sensing image time-series

2022-10-12 09:31:34HupengLiYjunTinCeZhngbShuqingZhngPeterAtkinson

The Crop Journal 2022年5期

Hupeng Li , ,Yjun Tin ,Ce Zhngb, ,Shuqing Zhng ,Peter M.Atkinson

a Northeast Institute of Geography and Agroecology,Chinese Academy of Sciences,Changchun 130012,Jilin,China

b Lancaster Environment Centre,Lancaster University,Lancaster LA1 4YQ,UK

c Faculty of Science and Technology,Lancaster University,Lancaster LA1 4YR,UK

Keywords:Convolutional neural network Multi-temporal imagery Object-based image analysis (OBIA) Crop classification Fine spatial resolution imagery

ABSTRACT Accurate crop distribution mapping is required for crop yield prediction and field management.Due to rapid progress in remote sensing technology,fine spatial resolution (FSR) remotely sensed imagery now offers great opportunities for mapping crop types in great detail.However,within-class variance can hamper attempts to discriminate crop classes at fine resolutions.Multi-temporal FSR remotely sensed imagery provides a means of increasing crop classification from FSR imagery,although current methods do not exploit the available information fully.In this research,a novel Temporal Sequence Object-based Convolutional Neural Network (TS-OCNN) was proposed to classify agricultural crop type from FSR image time-series.An object-based CNN(OCNN)model was adopted in the TS-OCNN to classify images at the object level (i.e.,segmented objects or crop parcels),thus,maintaining the precise boundary information of crop parcels.The combination of image time-series was first utilized as the input to the OCNN model to produce an ‘original’ or baseline classification.Then the single-date images were fed automatically into the deep learning model scene-by-scene in order of image acquisition date to increase successively the crop classification accuracy.By doing so,the joint information in the FSR multi-temporal observations and the unique individual information from the single-date images were exploited comprehensively for crop classification.The effectiveness of the proposed approach was investigated using multitemporal SAR and optical imagery,respectively,over two heterogeneous agricultural areas.The experimental results demonstrated that the newly proposed TS-OCNN approach consistently increased crop classification accuracy,and achieved the greatest accuracies(82.68%and 87.40%)in comparison with state-of-the-art benchmark methods,including the object-based CNN (OCNN) (81.63% and 85.88%),object-based image analysis (OBIA) (78.21% and 84.83%),and standard pixel-wise CNN (79.18%and 82.90%).The proposed approach is the first known attempt to explore simultaneously the joint information from image time-series with the unique information from single-date images for crop classification using a deep learning framework.The TS-OCNN,therefore,represents a new approach for agricultural landscape classification from multi-temporal FSR imagery.Besides,it is readily generalizable to other landscapes (e.g.,forest landscapes),with a wide application prospect.

1.Introduction

Accurate information about cropland distribution is very important for estimation of crop production [1],management of farmland [2] and assessment of crop-associated environmental impacts [3].Besides,information on crop distribution is needed to support agrarian policy actions associated to agroenvironmental measurements[4].Remote sensing has become a popular means of monitoring crops owing to its unique advantages over traditional field survey methods,such as synoptic view,repeat acquisition capability,and so on[5-8].Moreover,crop distribution maps generated from remote sensing imagery are consistent and comparable,which is especially beneficial to long-term analysis of cropping systems [9].The spatial resolution of remotely sensed imagery has a great influence on crop classification detail and accuracy.Mulla[10] demonstrated that a spatial resolution of less than 10 m is generally required for precision agriculture since agricultural landscapes are usually narrow,highly fragmented and heterogeneous.Fortunately,fine spatial resolution (FSR) remotely sensed imagery from sensors onboard both satellite (e.g.,Rapideye) and airborne [e.g.,Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR)] platforms is now commercially available,which offers huge opportunities for detailed and accurate crop monitoring and mapping[11,12].However,crop classification from FSR imagery is a very challenging task in consideration of the large intra-class variance [13].This is mainly due to variation in local elevation and relief,flow accumulation,soil composition,and management practice [5].

The use of multitemporal imagery over single-date imagery is a major means of increasing the accuracy of crop classification.This is because seasonal differences in the growth processes of different crop types can provide useful discrimination information,e.g.,corn and soybean have different senescence phases[8].Several previous studies have demonstrated the advantages of image time-series over single-date images for crop mapping and classification.Wardlow and Egbert [5] used a decision tree to classify crop classes in the state of Kansas with multi-temporal images spanning the growing season and achieved classification accuracies greater than 80%.Jiao et al.[14] showed that the time of image acquisition is crucial to crop classification accuracy.Zhong et al.[15] differentiated corn and soybean using the random forest algorithm and found that input sets containing phenological metrics achieved the greatest classification accuracy.Li et al.[8]illustrated that crop types could be completely separated from each other with UAVSAR time-series spanning the whole growing season.Most of these studies classified crop types with phenological metrics or temporal features extracted from image time-series using threshold-based methods (e.g.,time of peak biomass [16]) or predefined models(e.g.,Fourier transform [17]).In essence,the extracted features are hand-crafted via feature engineering which relies heavily on user experience and domain knowledge.Consequently,these manual feature engineering models might be effective for some specific tasks but it is hard to generalize them to other applications.In addition,the spatial configurations of crop fields can be very difficult to hand-code into features using predefined rules or models[13].Therefore,advanced expert-free data-driven models are urgently needed to extract features from image time-series automatically.

Recently,deep learning,a breakthrough technology in the field of computer vision and machine learning,has attracted increasing attention due to its capability to learn representative features in an end-to-end fashion[18].The Convolutional Neural Network(CNN),as one of the most popular network architectures,has achieved impressive,state-of-the-art results in a variety of domains,such as speech recognition [19],object detection [20] and visual recognition [21].Owing to its superiority in high-level spatial feature representations,the CNN has demonstrated great potential and achieved great success in a diverse set of remote sensing applications,such as change detection[22],urban functional zone division[23] and image classification [24].Recently,efforts have been devoted to increasing the accuracy of crop classification using the CNN network [25,26].For example,Zhong et al.[27] proposed a novel one-dimensional CNN framework for crop classification based on multi-temporal remotely sensed images.Sidike et al.[13] constructed a deep Progressively Expanded Network (dPEN)which allows the deep network to go deeper for accurate heterogeneous agricultural landscape mapping using WorldView-3 imagery.Ji et al.[28] designed a 3D CNN embedded with a channel attention module to classify crop types from multi-temporal fineresolution satellite sensor images.Li et al.[29] presented a hybrid CNN-transformer approach to mine temporal patterns from image time-series for crop classification.The multitemporal remotely sensed images were stacked and used directly as the input to the CNNs in most of the above-mentioned studies.A major drawback of this is that the unique and useful information about crop differentiation from single-date observations might be ignored.Besides,the pixel-wise CNN was adopted in these studies,which often generates blurred boundaries between crop fields due to the requirement for an input window or patch [26,30].These two issues impair greatly the accuracy of CNN-based crop classification from multi-temporal images.

The purpose of this research was to develop an approach that is able to learn fully discriminative features from image time-series automatically.A novel Temporal Sequence Object-based CNN(TS-OCNN)for crop classification was proposed,in which the combination of multi-temporal imagery was first used as the input to the CNN and then the single-date images in the time-series were fed automatically into the deep CNN model scene by scene following a forward temporal sequence (FTS) from early to late acquisition.In TS-OCNN,an object-based CNN (OCNN) was adopted to classify crop types at the object-level to maintain the precise crop parcel boundaries.The effectiveness of the proposed TS-OCNN approach was tested on two crop-rich agricultural fields,respectively,with FSR multitemporal SAR and optical images.

2.The proposed temporal sequence Object-based CNN method

2.1.Convolutional neural network (CNN)

The CNN,a variant of the multilayer feed forward neural network,involves a cascade of convolutional and pooling layers which are able to learn features at deep and abstract levels[21].By using convolutional filters,each convolutional layer transforms the input to an output which will be used as the input of the next layer.An activation function is taken outside the convolutional layer to enhance the non-linearity.A pooling layer is designed to further generalize the convolved features by reducing the resolution of the input[31].A fully connected layer is utilized on top of the last-pooling layer.The parameters(i.e.,weights and biases)of the CNN network are optimized using stochastic gradient descent.

2.2.Object-based convolutional neural networks (OCNN)

The object-based CNN was originally proposed to solve the complex land use classification task [32].Similar to the standard pixel-wise CNN(PCNN),the OCNN is trained with labelled patches.However,unlike the PCNN that predicts the label of each pixel across the entire imagery,the OCNN places an image patch at the centroid of each object to classify the segmented objects [26].Essentially,the OCNN framework is a hybrid method combining the OBIA and CNN techniques,which not only significantly increases the computational efficiency,but also maintains the precise boundary information of each object (e.g.,crop parcel) [30].The prediction results made by the OCNN for each segmented object formulate the final thematic classification map.

2.3.Temporal sequence Object-based convolutional neural network(TS-OCNN)

Suppose n scenes of multi-temporal remotely sensed images covering the study area are available,with m classes to be classified.Let M=（M1，M2，...，Mi，...，Mn）denote the set of multitemporal images,where i and n are the i-th image (Mi) in the temporal sequence and the total number of the multi-temporal images,respectively.Note that the multi-temporal images have the same spatial extent and spatial resolution so that they can be overlapped spatially at the pixel level.Let O=(o1，o2，...，oj，...，ou)be the set of segmented objects from M,where ojand u are the j-th object and the total number of objects,respectively.Let T=(t1，t2，...，tk，...，tv) be the set of training samples,where tkand v are the k-th sample and the total number of samples,respectively.Note that T is employed to train the OCNN model which estimates classification probability per object at each iteration through the iterative process.

The proposed TS-OCNN method is designed to explore fully the distinctive and useful information hidden in image time-series for crop classification.The basic assumption of the TS-OCNN is that the multi-temporal remotely sensed images are correlated with each other,and the classification results (X) of the i th image in the temporal sequence is conditional upon the output of the (i-1)th image,which formulates a Markov process as follows:

where,i denotes the number of ‘iterations’ within the Markov process andrepresents the classification probabilities of the i-th iteration.

The general procedure of the TS-OCNN approach is demonstrated in Fig.1,in which crop classifications are refined gradually along with the temporal sequence of image time-series.The methodology of TS-OCNN is detailed below.

In order to exploit the joint information in the image timeseries,multitemporal images are first combined spatially (Mstack)and used as the input of the original OCNN(OCNNori).The training process of the model can be represented as follows:

The original classification probabilities P（X）oriare calculated with the trained OCNN model as follows:

On the basis of the original classification (ORC),to further exploit the unique individual information from single-date images,images in the time-series are fed into the OCNN model scene-byscene.That is,a new image scene in the time-series is fed into the model at each iteration,and the number of multitemporal images is,thus,equal to the number of iterations.Specifically,from the i-th step where i ≥1,the classification probabilities at the previous iteration (i-1) (P（X）oriif i=1) and the i-th image (Mi) in the temporal sequence are spatially combined as conditional probabilities for image classification as:

where,Combine represents a function to combine the i-th image Miwith the probabilities generated at the previous iteration.In other words,the function combines spatially the bands contained in P（X）i-1produced at the previous iteration with those in the i-th image (Mi) as the input for current iteration.

With the newly generated dataset using Eq.(4) as input,the OCNN model at the i-th iteration is trained through the training samples (T) as follows:

Note that the OCNN model is trained from scratch at each iteration.The trained OCNN model is subsequently used to predict the classification probabilities as:

Based on Eq.(6),the probability of being assigned to each class for each segmented object is predicted within each iteration.The spatial extent of P（X）iis equal to that of any image in M,and the dimension of P（X）i(i.e.,the number of bands) equals the number of classes with each dimension corresponding to the probabilities of a specific class.

The thematic classification (TCi) of each iteration can be acquired using the corresponding probabilities (P（X）i) as:

Fig.1.The proposed TS-OCNN methodology.

where,arg max is a function classifying each object as the class with the maximum membership,and i denotes the number of iterations.

A total of n classification maps (n denotes the total number of multitemporal images) was generated with iteration.The classification accuracy of the thematic classification was assessed at each iteration,and the classification with the highest accuracy amongst the n classifications was selected as the final thematic classification(TCfinal) of the TS-OCNN approach.

3.Study materials

3.1.Study area and data

The Sacramento Valley,lying in the north of California,USA,was selected as a case study area.The Valley is considered as an important agricultural area across the state of California [33],which accounts for about one quarter of the state’s organic hectarage.Two typical agricultural sites (S1 and S2) within the Valley with distinctive heterogeneous crop compositions were intentionally selected to investigate the effectiveness of proposed method(Fig.2).

In S1,four scenes of fully polarimetric L-band Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) images were captured in 2011 on June 16,July 20,August 29,and October 3.The UAVSAR data selected is in the format of ground range projected GRD (georeferenced),with a fine spatial resolution of 5 m and a spatial extent of 3474 × 2250 pixels.Three linear polarizations(namely HH,HV and VV)were achieved from the original datasets and used for crop classification.S1 is a mixture of fruit crops,summer crops and forage crops,which consists of 10 crop classes,namely walnut (Juglans nigra),almond (Amygdalus communis L.),alfalfa(Lotus corniculatus L.),grass,clover(Trifolium repens L.),winter wheat (denoted as wheat hereafter,Triticum aestivum L.),corn(Zea mays L.),sunflower (Helianthus annuus L.),tomato (Lycopersicon esculentum Mill.),and pepper(Capsicum annuum L.)(Table S1).

In S2,three scenes of optical RapidEye images were acquired in 2016 on 30/05,10/07,and 07/09.Each scene of the image consists of five bands: blue,green,red,red edge,and near infrared.The RapidEye images were Level 3A ortho products,with sensor,radiometric and geometric corrections already applied.The spatial size of the image is 3222 × 2230 pixels with a fine spatial resolution of 5 m.A total of nine crop types were identified throughout this area,including walnut,almond,fallow,alfalfa,wheat,corn,sunflower,tomato,and cucumber (Cucumis sativus L.) (Table S1).

Sample points were acquired using a stratified random scheme according to the Crop Data Layer (CDL).The CDL is generated by the United States Department of Agriculture(USDA),and has been employed widely as the ground reference dataset in a wide range of crop classification studies[25,26,34-37]because of its very high accuracy [38].To collect representative samples,crop parcels within the two study areas with an area larger than 5 ha were selected and delineated manually according to the CDL datasets[8].To ensure that training,validation and test data are completely independent,the digitalized polygons were split randomly into a 50%subset for training and validation(for hyperparameter tuning)and the other 50%for testing.A stratified random sampling scheme was adopted to collect training and validation sample points from training and validation polygons,respectively.To ensure that the CNN networks can learn sufficiently representative features from input,the training sample size was guaranteed above an average of 200 per crop category.In total 2560 and 1900 training sample points were collected for S1 and S2,respectively (Table S1).To evaluate comprehensively the classifications,wall-to-wall assessment was used for both sites.In other words,all pixels within the testing polygons were utilized for accuracy assessment.

3.2.TS-OCNN model architecture and parameters

3.2.1.Image segmentation

Remotely sensed imagery first needs to be segmented into objects through object-based segmentation since the proposed TS-OCNN is implemented on the segmented objects.Herein,a multi-resolution segmentation(MRS)algorithm was implemented to obtain segmented objects [39].Note that the combination of multitemporal images was employed as the input of MRS algorithm in each experiment (study site).The ‘‘scale” parameter is the most critical in MRS since it determines directly the average size of segmented objects[40].Through cross-validation,the scale parameters were optimized as 30 and 180 for S1 and S2,respectively,to acquire slightly over-segmented results,with the Shape and Compactness parameters being 0.2 and 0.7 in both study sites.A total of 3040 and 3876 objects were acquired for S1 and S2,respectively.

Fig.2.Locations of the two study sites with the remotely sensed images.

Fig.3.Overall accuracy of the proposed TS-OCNN plotted against iteration for S1 and S2.The number of iterations denotes the position of the image in the time-series sequence.The position zero indicates the original classification (ORC) of the TS-OCNN (see Section 2.3).The dashed line represents the baseline accuracy of the TS-OCNN.

3.2.2.Model architecture and parameters

In the presented TS-OCNN,a standard CNN classifier was adopted to classify each segmented object by taking the centroid of each object as the convolutional point[25,26].Several hyperparameters need to be optimized for the CNN within the TS-OCNN to maximize classification accuracy.In order to test the transferability of the proposed method,the model architectures and parameters were optimized through cross-validation in S1 and directly generalized in S2,as detailed below.

The model architecture of the CNN applied in the TS-OCNN was designed similar to AlexNet [21] with nine hidden layers alternated with convolution,max-pooling,and batch normalization (Fig.S1).Small filters were applied in convolutional layers(5 × 5 for the first layer and 3 × 3 for the remaining layers),and the number of filters was tuned to 32 to learn deep feature representations.As suggested by Langkvist et al.[41],the input patch size of the network was chosen from {16 × 16,24 × 24,32 × 32,40 × 40 and 48 × 48} and 32 × 32 was found to be the optimal size.To tackle the overfitting problem,dropout was applied before the dense layer with an optimized value of 0.3.Besides,the number of epochs was set to 500 to allow the network to converge through iteration.

3.2.3.Benchmark methods and parameter settings

To evaluate the effectiveness of the proposed approach,three typical methods were benchmarked,including traditional objectbased image analysis (OBIA),standard pixel-wise CNN (PCNN),and standard object-based CNN (OCNN).Note that the multitemporal remotely sensed images were stacked directly together and used as the input of the three benchmarks in each experiment.To make a fair comparison,the architecture,as well as hyperparameters (filter size,dropout value,etc.) of the two CNN-based comparators (i.e.,PCNN and OCNN),were kept the same as the CNN model within the TS-OCNN (denoted as CNN_TS-OCNN hereafter).The parameters of the benchmark methods are detailed as follows:

OBIA:The OBIA was implemented on the segmented objects acquired by the MRS algorithm in Section 3.2.1.A range of handcrafted features were acquired from the objects,including spectral features (mean and standard deviation) and texture variables(mean and variance).A parameterized SVM classifier was then adopted to classify the segmented objects with these hand-coded feature representations.

PCNN:The standard pixel-wise CNN classifies each pixel across the entire image using densely overlapping patches.The most important parameter that determines directly the classification accuracy of the PCNN is the input patch size.Herein,the input window size was cross-validated with a range of CNN window sizes,including {8 × 8,16 × 16,24 × 24,32 × 32,and 40 × 40},and 24 × 24 was found to be the optimal input patch size.The other parameters were identical to the CNN_TS-OCNN.

OCNN:Like the proposed TS-OCNN,the OCNN was implemented based on the segmented objects.The difference between the OCNN and TS-OCNN is the usage of the multi-temporal images.Herein,the multitemporal images were spatially combined and used as the input to the CNN model.The input patch size was parameterized as 32 × 32 from {16 × 16,24 × 24,32 × 32,40×40 and 48×48}through trial and error.The other parameters were kept the same as for the CNN_TS-OCNN.

4.Crop classification results

4.1.TS-OCNN classification accuracies

Figure 3 illustrates how the overall accuracy (OA) of the TSOCNN varies with iteration in both study areas.Note that the number of iterations is equal to the number of images in the image time-series,and there are four and three iterations in S1 and S2,respectively.It can be seen that the classification accuracy of TSOCNN in S1 started at 81.63%,then kept increasing with iteration,and reached the highest accuracy of 82.68%at iteration 3,followed by a slight decrease at iteration 4.The most accurate classification generated at iteration 3 with the highest accuracy was selected as the final thematic classification(TCfinal)for S1.Similarly,the classification accuracy increased gradually through iteration in S2.Specifically,the accuracy started from 85.88%,then increased rapidly along the iterative process (i.e.,temporal sequence),and reached a maximum of 87.40% at the third iteration.As a result,the classification of the last iteration was chosen as the final thematic classification for S2.

4.2.TS-OCNN classification results

To illustrate visually how the temporal sequence increased the classification accuracy through iteration,the crop classifications produced by TS-OCNN in S1 and S2,are shown in Fig.4A and B,respectively.For each site,three typical subsets at different iterations (1,2 and 3) are illustrated,with the correct and incorrect class allocations highlighted with yellow and red circles,respectively.

Fig.4.Typical image subsets of the TS-OCNN classification in S1(A)and S2(B).ORC denotes the original classification of TS-OCNN.Correct and incorrect classifications are highlighted using yellow and red circles,respectively.

For S1,with spatially stacked multitemporal images,the TSOCNN failed to distinguish grass from alfalfa,as shown by the red circles in the original classification (ORC) (Fig.4A-a).Besides,parts of alfalfa and walnut were,respectively,misidentified as clover and almond (Fig.4A-b,c) because of their highly similar spectra.These misclassifications were gradually corrected through iteration with the single-date images being fed into the model successively (yellow circles in iteration 1-3).For example,the confusion between alfalfa and grass was rectified step-by-step,and the grass was correctly identified at iteration 3 (yellow circle in Fig.4A-a).Similarly,the classification errors for walnut and alfalfa were also resolved by iteration,as illustrated by the yellow circles of Fig.4Ab,c.In short,the proposed TS-OCNN achieved a desirable result by capturing the unique information contained in the single-date images.

Similar to S1,the crop classification results of S2 were refined gradually throughout the temporal sequence.Originally,wheat and sunflower were falsely identified as cucumber and tomato,respectively,as demonstrated by the red circles in the original classification (Fig.4B-a,b).Besides,classification errors were also found at the boundary between crop parcels.Such misclassifications were rectified with increasing iteration(Fig.4B).For example,the crop parcel misclassified as cucumber was correctly rectified to wheat at iteration 2.At the same time,the misclassifications between wheat and almond,and between sunflower and tomato were also corrected gradually,and were eliminated completely at iteration 3.Moreover,the classification errors near parcel boundaries were removed through the process.

4.3.Benchmark comparison for the crop classification

Accuracy assessment:To assess quantitatively the effectiveness of the classification methods,the proposed TS-OCNN wascompared with a range of benchmarks using the overall accuracy(OA),Kappa coefficient(k),and per-class mapping accuracy.Table 1 demonstrates the accuracy of crop classification for both S1 and S2.As can be observed from the table,the proposed TS-OCNN acquired consistently the most accurate results,with the OA up to 82.68%and 87.40% for S1 and S2,higher than for the OCNN (81.63% and 85.88%),OBIA (78.21% and 84.83%),and PCNN (79.18% and 82.90%).Similarly,the TS-OCNN produced the greatest Kappa coefficients,reaching 0.80 and 0.85 for S1 and S2,greater than those of the OCNN (0.78 and 0.83),OBIA (0.75 and 0.82),and PCNN (0.76 and 0.79),respectively.Besides,a McNemar test designed for pair-wise comparison further revealed a significant increase in crop mapping accuracy has achieved by the presented TS-OCNN approach over the OCNN,OBIA,and PCNN,with z-value=82.17,158.25,and 123.19 in S1 and z-value=142.88,117.16,192.97 in S2,respectively (Table S2).

Table 1 Classification accuracy comparison between pixel-wise CNN,OBIA,object-based CNN,and the proposed TS-OCNN in S1 and S2.

The effectiveness of the proposed approach was further evaluated with the per-class mapping accuracy.As shown in Table 1,the TS-OCNN achieved the greatest accuracy for most of the crop categories in both study sites.For S1,the most remarkable increase in accuracy can be seen for grass (79.31%),dramatically greater than for the comparators,including the OCNN (75.72%),the OBIA(58.38%) and the PCNN (74.56%).Similarly,the accuracies of almond,alfalfa and walnut by the TS-OCNN,were also markedly higher than for the benchmarks.For other crop categories such as clover,corn,and tomato,the TS-OCNN achieved only slight increases compared with benchmarks,with an average accuracy increase between 1% and 3%.With respect to S2,satisfactory classification accuracy was achieved for most of crop classes by the TSOCNN,with accuracies of six classes (walnut,alfalfa,wheat,corn,sunflower,and tomato) being higher than 85%.The most notable increase in accuracy was observed for alfalfa with an accuracy of 94.71%,markedly greater by 11.01%,9.71%,and 2.61% in comparison with the OCNN,OBIA,and PCNN,respectively (Table 1).Another large increase in accuracy was found for cucumber,with an average accuracy increase of 6.34%.Besides,a moderate accuracy increase was seen for walnut,almond,and fallow,with an average increase around 3%-4%.Other crop classes,including wheat,corn,sunflower,and tomato demonstrated only slight increases in average accuracy (<2.20%) relative to other comparators.

Classification results: To evaluate visually the superiority of the proposed approach,the classifications of the TS-OCNN were compared with those of benchmarks in S1 (Fig.5A) and S2(Fig.5B),respectively.Clearly,the classification results produced with pixel-wise CNN (PCNN) were severely affected by salt-andpepper noise.For example,several of speckle errors existed within the crop parcels,including alfalfa,tomato,and walnut (Fig.5A-b,c)).Besides,classification errors along the boundaries between crop parcels were found,as demonstrated by Fig.5B-c.In contrast,the two object-based methods (OBIA and OCNN) reduced significantly the salt-and-pepper noise,and achieved smooth classification results with precise boundary information.However,severe confusions between alfalfa and wheat,and between almond and walnut,were found in the OBIA classifications.The OCNN was superior to the OBIA in differentiating crop classes with similar spectra.For example,alfalfa and wheat were more accurately classified from each other,as shown in Fig.5A-a,b.Nevertheless,misclassifications between alfalfa and grass in S1,and between sunflower and tomato in S2,still existed in the OCNN classifications.Making use of the temporal sequence,the proposed TS-OCNN approach corrected most of the aforementioned classification errors while keeping precise crop boundaries well maintained.

5.Discussion

Agro-ecosystems are affected greatly by both human activities and natural conditions (e.g.,climate change),making them highly complex and heterogeneous.As a result,classifying crop types accurately from fine spatial resolution(FSR) remotely sensed imagery remains a great challenge,even for the-state-of-the-art deep learning-based approaches [13,28].Multitemporal remotely sensed images have been used widely to increase crop mapping accuracy.Prior studies have tended to explore the joint information of multi-temporal observations,which are usually stacked together and fed into predefined models for crop classification[27,28,42].However,few studies have explored the utility of mining the individual information from single-date images.In principle,a certain crop type might be easily and accurately classified from others at some point during the growing season [43].For example,rice can be identified using imagery collected at the stage of flooding and rice transplanting [44].Such unique individual information is complementary to the joint information captured from the whole image time-series and is potentially of great importance for differentiating crop classes across heterogeneous agricultural landscapes.In the proposed TS-OCNN,individual information about crop discrimination from each image was extracted and integrated automatically into the joint information gradually,thus,mining more comprehensively the latent information in image time-series for crop classification.To further illustrate the contribution of single-date imagery to crop classification,the TSOCNN was implemented respectively with each single-date imagery in the UAVSAR experiment.As can be seen from Fig.S2,the accuracies of TS-OCNN with single-date images (numbers in red in Fig.S2) were generally better than that by OCNN (81.63%),demonstrating the unique value of single-date images to crop mapping.Specifically,it was found that the accuracies with single imagery (numbers in red in Fig.S2) were inferior to that (82.68%)with all temporal images,indicating that the TS-OCNN makes use of the temporal images thoroughly.

Fig.5.Typical image subsets of classification maps produced by PCNN,OBIA,OCNN,and the presented TS-OCNN in S1 (A) and S2 (B).

This research demonstrates that it is difficult for the standard pixel-wise CNN (PCNN) to achieve desirable crop classification results from FSR imagery.In PCNN,an input patch is adopted to learn features to identify each pixel across the entire image,which usually leads to severe geometric distortions (e.g.,enlarged crop parcels) [45].Unlike the standard CNN,the proposed TS-OCNN is built and implemented on segmented objects,leading to increases in crop classification accuracy.By using image segmentation,the salt-and-pepper noise affecting negatively the crop classification results was eliminated.More importantly,the TS-OCNN avoids mislabeling pixels falling near the field edges(where misclassification occurs relatively often),thus,maintaining precise boundary of crop parcels.Our results are consistent with previous research[26,30],highlighting the significance of object-based image analysis for complex remote sensing classification from FSR imagery.It should be noted that the object-based image classification methods depend on the quality of image segmentation results since they are implemented based on segmented objects[30].For image segmentation algorithms,the scale is a key parameter as it directly determines the size of the segmented objects [40].By taking the UAVSAR experiment as an example,we illustrated how the results of image segmentation affect the classification accuracy of objectbased methods (OCNN and TS-OCNN) (Fig.6).It can be seen that object-based methods acquired the greatest accuracy with a scale value of 30 (achieving a small amount of over-segmentation),superior to that with a value of 20,and 40,respectively.Nevertheless,the proposed approach consistently outperformed OCNN,demonstrating the robustness of the TS-OCNN in object-based image classification.

While the TS-OCNN greatly increased the overall crop classification accuracy compared with benchmark comparator (i.e.,OCNN),the increases in accuracy resulting for individual crop category are even more impressive.Generally,the increases in accuracy achieved for small biomass crops with short stems and narrow leaves (e.g.,alfalfa and clover) were greater than those for large biomass crops with long stems and large leaves(e.g.,corn and sunflower).For example,the TS-OCNN increased the accuracies of grass in S1 (from 75.72% to 79.31%) and cucumber in S2 (from 63.21% to 67.09%) by around 4%,even though the increased accuracies were still relatively low (less than 80%).In contrast,the accuracies of corn and sunflower were only increased slightly(less than 1%) for both sites.It is likely that small biomass crops with relatively weak signals [8] are more difficult to identify than large biomass crops (which can be classified relatively easily),and they are,thus,difficult to distinguish accurately without utilizing the unique individual information available in single-date images.In other words,the accuracy of small biomass crops benefits more from the increases in the dimensionality of the observational data than large biomass crops,which is in accord with previous findings[46].

A forward temporal sequence (FTS) was adopted in the proposed TS-OCNN.That is,multitemporal images were fed into the deep learning model scene-by-scene in the order of image acquisition date (i.e.,start earliest,sequence moves towards latest).In fact,several strategies are available for ordering the temporal sequence.For example,the random temporal sequence(i.e.,select an image at each iteration randomly without considering image acquisition date) and the backward temporal sequence (i.e.,start latest,sequence moves towards earlies).However,we found that the FTS was superior to the others in increasing crop classification accuracy.This may be attributed to the fact that temporally adjacent images are correlated with each other [47];for a pair of temporally adjacent images,the classification result of the late acquisition is usually upon that of the early acquisition.

6.Conclusions

This paper presented a new approach for crop classification from multi-temporal FSR remotely sensed imagery.In the proposed TS-OCNN,a CNN model was adopted to classify agricultural landscapes into crop classes at the object (crop parcel) level,thus maintaining the precise boundary information of crop parcels.The combination of image time-series was first utilized as the input to a CNN model to produce an ‘original’ classification result,and then single-date images from the image time-series were fed into the deep learning model scene-by-scene in the order of image acquisition date to increase gradually the crop classification accuracy.As such,the joint information(sequential relationship)of the multi-temporal observations as well as the individual information from each image in the time-series were explored fully and utilized for crop classification.The experimental results on two heterogeneous agricultural areas with two types of FSR imagery demonstrated that the proposed TS-OCNN achieved consistently the most accurate classification results in comparison with state-ofthe-art benchmarks.Specifically,the TS-OCNN markedly increased the classification accuracies of small biomass crops (e.g.,forage crops)that were very difficult to identify because of their indistinct remote-sensing spectra.We,therefore,conclude that the newly presented TS-OCNN is an effective approach for crop classification from multi-temporal FSR remotely sensed imagery.Meanwhile,the TS-OCNN is readily generalizable to other landscapes(e.g.,forest and wetland) and it,thus,has a wide application prospect.

CRediT authorship contribution statement

Huapeng Li:Conceptualization,Data curation,Formal analysis,Funding acquisition,Investigation,Methodology,Project administration,Resources,Software,Supervision,Validation,Visualization,Writing -original draft,Writing -review &editing.Yajun Tian:Formal analysis,Investigation,Validation,Visualization.Ce Zhang:Methodology,Software,Writing -review &editing.Shuqing Zhang:Writing -original draft,Writing -review &editing.Peter M.Atkinson:Writing -original draft,Writing -review &editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA28070503),the National Key Research and Development Program of China(2021YFD1500100),the Open Fund of State Laboratory of Information Engineering in Surveying,Mapping and Remote Sensing,Wuhan University (20R04),and Land Observation Satellite Supporting Platform of National Civil Space Infrastructure Project(CASPLOS-CCSI).

The OCNN approach was developed during a PhD studentship‘‘Deep Learning in massive area,multi-scale resolution remotely sensed imagery” (EAA7369),sponsored by Lancaster University and Ordnance Survey (the national mapping agency of Great Britain).Ordnance Survey owns the intellectual property arising from the project,together with a US patent pending:‘Object Based Convolutional Neural Network’ (US application number 16/156044).Lancaster University wishes to thank Ordnance Survey for permission to publish this paper and for the supply of aerial imagery and the supporting geospatial data which facilitated the PhD(which is protected as Crown copyright).

Appendix A.Supplementary data

Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2022.07.005.

The Crop Journal2022年5期

The Crop Journal的其它文章: Crop phenotyping studies with application to crop monitoring; Changes and determining factors of crop evapotranspiration derived from satellite-based dual crop coefficients in North China Plain; Mapping rapeseed planting areas using an automatic phenology-and pixel-based algorithm (APPA) in Google Earth Engine; Integrating remotely sensed water stress factor with a crop growth model for winter wheat yield estimation in the North China Plain during 2008-2018; Stacked spectral feature space patch: An advanced spectral representation for precise crop classification based on convolutional neural network; Function fitting for modeling seasonal normalized difference vegetation index time series and early forecasting of soybean yield