MA Hao, JI Hai-yan, Won Suk Lee
1. Key Laboratory of Modern Precision Agriculture System Integration Research, Ministry of Education,China Agricultural University, Beijing 100083, China
2. College of Agricultural Engineering, Henan University of Science and Technology, Luoyang 471003, China
3. Department of Agricultural and Biological Engineering, University of Florida, Gainesville, FL 32611, USA
Identification of the Citrus Greening Disease Using Spectral and Textural Features Based on Hyperspectral Imaging
MA Hao1,2,3, JI Hai-yan1*, Won Suk Lee3
1. Key Laboratory of Modern Precision Agriculture System Integration Research, Ministry of Education,China Agricultural University, Beijing 100083, China
2. College of Agricultural Engineering, Henan University of Science and Technology, Luoyang 471003, China
3. Department of Agricultural and Biological Engineering, University of Florida, Gainesville, FL 32611, USA
In this paper we discussed the application of spectral and textural features in identifying early stage of the citrus greening disease (Huanglongbing or HLB). A total of 176 hyperspectral images of citrus leaves (60 for healthy, 60 for HLB-infected and 56 for zinc-deficient) were captured by using a near-ground hyperspectral imaging system. Regions of interest (ROI) were extracted manually from the part of pathological changes in the images to calculate the average reflectance spectra of each sample as the sample spectra, ranging from 396 to 1 010 nm. The dimensions of the sample spectra were reduced with the algorithms of principal component analysis (PCA) and successive projection analysis (SPA). Classification models were built with the original spectra and candidate variables, the first four PCs selected by PCA and a set of wavelengths (630.5, 679.4, 749.4 and 899.9 nm) selected by SPA. The results based on a classifier of least square-support vector machine (LS-SVM) showed that the classification models built with the candidate variables selected by PCA and SPA had a better performance, achieving 89.7% and 87.4% in terms of average accuracy. In addition, two groups of textural features, extracted from gray images of the four selected wavelengths based on gray-level histogram and gray-level co-occurrence matrix (GLCM), were also used for the classifier. The first ten features ranked by SPA promoted the average accuracy of classifier significantly, achieving 100%, 93.3% and 92.9% for the three class samples respectively. The results of this study indicated that it would be feasible to identify HLB using the image textural features based on selected wavelengths, and it provided a basis for developing a portable HLB detection system with multispectral imaging techniques.
Citrus greening; Hyperspectral imaging; Classification; Textural features; Successive projection analysis
The citrus greening disease (Huanglongbing or HLB), originated from China, is a widely spread and devastating citrus disease in the world. So far, it has been found in more than 50 countries and regions in the world, distributing in Asia, America and Africa[1]. The disease which causes huge damage to citrus industries is rapidly spreading and becoming incurable. In September, 2005, it was found in Florida for the first time and then it was spread to all 34 counties in Florida in less than 5 years[2].The direct consequence was that the citrus planting area and yield decreased more than 10%[3].
Citrus greening is a bacterialdisease infected bya tiny insect, Asian psyllid. The characteristic symptoms of the disease usually include blotchy, chlorotic mottling of leaves, yellow shoots, misshapen or lopsided small fruit[4]. It spreads rapidly with the propagationof the insects. The only effective way to decrease damage is to eliminate the insect and remove the infected trees. So an effective, fast, timely detection method is critically needed.
HLB detection methods can be summarized into two kinds of methods: direct or indirect. The direct method is usually a laboratory-based method including serological testing, polymerase chain reaction (PCR) and so on[5]. The sampling and analysis processes turn out to be highly labor intensive, expensive and time consuming. So, the indirect methods such as machine vision, which could provide substantial information of nature and object attributes fast and non-invasively[6], become more and more popular.
Spectroscopy has been used for disease detection for many years. Based on near-infrared and mid-infrared spectroscopy, Mishra et al.[7], and Sankaran et al.[8-9]conducted researchto identify HLB infection. Color images[10]and fluorescence imaging[11]are also common ways of HLB detection. However, for detecting and monitoring of plant diseases[12], imaging systemsare preferable to non-imaging systems due to their non-invasiveness.
Hyperspectral and multispectral imaging, integration of imaging and spectroscopy techniques, have high potential in improving the accuracy of disease detection. Kumar et al.[13]attempted to use airborne hyperspectral images with wavelengths ranging from 457 to 921 nm to monitor the spread of HLB in a citrus grove. Pourreza et al.[14]evaluated the abilities of textural features extracted from polarization images in 591 nm to identify the HLB-symptomatic leaves from zinc-deficient and normal leaves. In China, Tian et al.[15]developed diagnoses of cucumber diseases and Xie et al.[16]studied early detection of tomato early blight.
In this study we aimed to determine a set of optimum variables including spectral and textural features to identify the HLB-symptomatic leaves from the normal ones. Additionally, the results of classification with the candidate variables were evaluated to well prepare for developmentandapplication of a portable HLB detection system using multispectral imaging technique. The specific objectives of this research were to:
(1) select a set optimum wavelengths with the method of successive projection analysis (SPA), and estimatecapabilities of classification by comparing results of classification using principal component analysis (PCA);
(2) extract texture features for each gray image of the selected wavelengths based on gray-level histogram and gray level co-occurrence matrix;
(3) evaluate the performance of image texture features for citrus greening identification using method of least square-support vector machine (LS-SVM).
1.1 Hyperspectral image acquisition
The leaf samplesused in this research were collected from the Citrus Research and Education Center (CREC, Lake Alfred, Florida, USA) in Mar 7th, 2013. A total of 176 leaveswere collected, which contained three common citrus varieties (Hamlin, Tangelo and Valencia). Among the samples,there were 60 for healthy, 60 for HLB infected and 56 for zinc deficiency. The leaves were sealed in plastic bags, labeled with tags of class name and then taken back to a laboratory in a cooler, to prevent leaf moisture loss.
A ground-based hyperspectral imaging system was used to acquire citrus leaf images under natural outdoor sunlight. The imaging system mainly consisted of four parts. An imaging spectrograph (V10E, Specim, Finland) was used as an optical acquisition component. A drive motor and an encoder (Omron-E6B2, Omron Cooperation, Japan) were used to control tilting motion of the imaging spectrograph. An image acquisition unit (NI-PCIe 6430 & NI-6036E, National Instruments Inc. Austin, TX, USA) was used to transform and transfer images to a laptop computer (E6500, DELL, TX, USA).
The imaging spectrograph was aline-scan CCD camera, which captured a 1by 1 312 pixels image each time. In this study, there were 700 lines collected for each image. The spatial resolution was 0.2 mm and the actual scene of an image was about 14.0 cmⅹ26.2 cm. In each pixel, the wavelengths ranged from 396 nm to 1 010 nm, with a 1.59 nm spectral resolutionin 388 spectral bands. So, an image with both spatial and spectral data had a dimension of 700ⅹ1312ⅹ388.
1.2 Image processing and Regions of interest (ROI) extraction
To obtain reflectance images, white standards were used as optical references. In this study, three polytetrafluoroethylene standard disks (Labshpere Inc., North Sutton, NH, USA) with a 5.1 cm diameter were usedwith reflectance as 50%, 75% and 99% fro each. The 99% standard disk was used to calibrate the lens aperture of camera to prevent the image over-exposure. A dark image also was collected to remove the dark current noise with a lens cover closed. Then the images were transformed into reflectance images using the method of quadratic regression based on the remaining two standard disks.
Due to the large size of image increasing the computing complexity and time, regions of interestwith 200 by 120 pixelswere extracted manually from different images to be used as data sets in this study. And the region of interest should be selected in the part of pathological changes in sample leaves to ensure the accuracies of identification. The average reflectance spectra of the ROI represented the spectra of samples. A total of 176 ROI were obtained, half for a training dataset and the other half for a validation dataset. This step was achieved in ENVI (Version 5.0, Exelis, Boulder, CO, USA).
1.3 Textural features extraction
Image texture was defined as a visual characteristic responding homogeneity of images, which contained important information of textural arrangement in material surface and interaction with surround environment[17]. Usually under stresses of diseases and nutrition deficiencies, theplant would suffer physiological function degradation such as moisture loss, photosynthesis decrease and growth inhibition. And further, it would cause changes of leaf colors, texture and even spectral characteristics, which could be used as a basis of diseases detection.
In this study, two groups of textural features were extracted from ROI of images based on the gray-level histogram and gray-level co-occurrence matrix. The first six features, listed in Table 1, were extracted from thenormalized histogram (H(i), where,iis the gray level, ranged from 0 toL,L=255 in this study). The gray-level histogram was a common method of image processing, which described statistical information of pixels gray distribution in an image.The features from the histogram were the rotation invariance, displacement invariance and scale invariance. However, the gray-level histogram cannot provide the spatial location information of the two-dimensional pixels in an image. So, gray-level co-occurrence matrix was also introduced to extract two-dimensionalstatistical information.
Table 1 Features extracted from normalized histogram of gray level
Table 2 shows ninefeatures extracted from the normalized GLCM matrices (g(i,j), whereiandjare the indices of GLCM matrix elements,Nis the number of gray levels,N=256 in this study). The GLCM matrix described the textural features with joint probability density of two different pixels in the image. It can reflect the distribution feature of brightness and positions of pixels with same or similar brightness at the same time. Since it was proposed in 1973, the GLCM matrix had been widely applied and developed in textural feature extraction. A total of 15 textural features were extracted from each principal components (PCs) image and each selected wavelength variable of ROI in this study.
Table 2 Features extracted from normalized GLCM matrices
1.4 Features selection method and classifiers
Principal component analysis (PCA) is a multivariate statistical method, which can efficiently reduce the dimensions of data and keep the information of the original data as much as possible. All principal components (PCs)is independent to each other, which helps to eliminate the influence of redundant information in high-dimension dataset. In this study, PCs were obtained by ranking their contributions.
Successive projections algorithm (SPA) is a newly developed spectral variable selection method. It is a forward selection method, and the specific algorithm process can be found in a previous study[18]. It starts with a wavelength, calculates the projections inother ones, finds and incorporates the one which makes the maximum projection, and then iterates the step until a specified numberN(N
The classifier, least squares -support vector machine (LS-SVM), was used to evaluate the performance of the selected variables. LS-SVM is an extension of SVM, which usually transform the low-dimension nonlinear data into high-dimension linear data to solve the problem of nonlinear data modeling. Based on structural risk minimization, it efficiently reduces the computing complexity and time in modeling of small-size, nonlinear and high-dimension samples.
2.1 Sample spectra and spectral characteristics analysis
Fig.1 shows some examples of citrus leaves in the three different classes (healthy, HLB-infected and zinc-deficient) in dataset. Fig.1(a) shows three healthy leaves in different growth stages. The top one is young healthy with light green color and the other two are mature healthy with dark green color. The texture of the healthy leaves is smooth and without spot, while, for HLB-infected and zinc-deficient, there aresome similar mottles in the leaves shown in Fig.1(b) and (c). The leaves in advanced level of HLB tend to be yellowish and blotch in blades. And the zinc-deficient leaves tend to have somesymmetrical mottles between the veins. However, in the early stage, they are truly hard to be distinguished using both human vision and spectroscopy, especially for HLB-infected and zinc-deficient. Fig.2 shows an example of average reflectance spectra of three typical leaves in the three classes.
Fig.1 Images of leaves in three different classes in dataset
The spectra used in this study was in the region of 410 to 1 000 nm (370 bands), and some wavebands at both ends were discarded due to noise. In general trend, the spectral reflectance curves of the three leaf classes were very similar to each other: spectral absorption area mainlyin the region of 400 to 700 nm, high reflectance area in the region of 750 to 1 000 nm, high reflectance peak in 550 nm and a strong absorption band in 680 nm. Certainly, with the different symptoms, there were some variations in their spectra. The spectra of HLB-infected and zinc-deficient had a higher reflectance peak in 550 nm, a lower reflectance in region of 800~1 000 nm. However, the spectra of zinc-deficient and HLB-infected were very close, which would cause a lot of trouble in classification.
Fig.2 Average reflectance spectra of three classes (Healthy, HLB infected and Zinc deficiency) of citrus leaves
2.2 Variables selection and Classification results comparison
Fig.3 shows the wavelength variables selection using SPA. In Fig.3(a), the red square shows the final number of selected variables. A total of 14 variables were selected in this study. The band positions of selected variables are shown in the black squares in Fig.3(b). From the figure, it can be seen that the selected variables were clustering into four groups. So, a further selection for the variables was made to avoid the over-fitting. Contribution was introduced to rank the selected variables. The contribution for each variable was defined as the ratio of RMSEP of modeling with the other 13 variables and RMSEP of modeling with 14 variables. The ranking by the values of contribution is shown in Table 3.
Fig.3 Wavelength variables selection using SPA
Table 3 Ranking of selected variables based on the values of contribution
The third column was the band number ranked by contributions. Some selected bands were very close to each other, such as the band 2~4. The wavelength interval was only 1.6 nm. Actually, more than 10 nm of wavelength intervalwas neededindeveloping multispectral imaging system. So, the similar bands were merged into one, and the final selected variables are shown in the fifth column, which are 749.4, 899.9, 630.5 and 679.4 nm.
Fig.4 Distribution of three class samples
Principal component analysis was also used to processthe spectral data. The cumulative contribution of the first four PCs was 97.8%. Based on the first three PCs, the distribution of the three class samples is shown in Fig.4(a). From the figure, PCA had a good performance of clustering, especially between healthy and the other two classes. However, between HBL-infected and zinc-deficient, there were still some areas overlapped together. It indicated it was not enough to classify the three classes completely using the only spectral information.
Table 4 Results of classification for three classes of citrus leaves in the validation set using spectral variables
ModelNumberofinputvariablesAccuraciesofidentification/%HealthyHLBZinc?deficientaverageLS?SVM37090 076 77580 6PCA?LS?SVM496 786 785 789 7SPA?LS?SVM496 783 382 187 4
Based on the original spectra, four PCs andfour selected wavelengths, the classifier was built using LS-SVM. The results of classification for three classes of citrus leaves are shown in Table 4. Comparing the results, the dataset reduced dimension using PCA and SPA can significantly improve the identification accuracies, which achieved 89.7% and 87.4% average accuracies respectively. It indicated that it was feasible to identify the HLB using several selected variables to take a place of the whole spectra.
2.3 Textural features extraction
The normalized images were processed with PCA, and the cumulative contribution of the first four PCs was nearly 100%. One example of the reshaped gray images of the first four PCs for HLB symptomatic sample is shown in Fig.5.The images were segmented usingk-means clustering to help to identify the spatial information of symptomatic areas.
Fig.5 Example of gray images of the first four PCs for HLBsymptomatic sample, segmented withk-means clustering, the green and red color indicate the healthy and symptomaticareas in the sample
From the Fig.5, it can be seen that the useful spatial information was contained in the second principal component. Based on the gray image of the second PC, the textural features were extracted from the gray-level histogram and GLCM. A total of 15 features were extracted for each sample. Based on the first three PCs of textual features the distribution of the three class samples is shown in Fig.4(b). Compared the results of clustering in the Fig.4(a) and (b), the mixed areas between HLB-infected and zinc-deficient in the Fig.4(b) were much more than those in Fig.4(a). The result of classification usingtexturalfeatures of PCs image, shown in Table 5, was less accurate than thatusing spectral reflectance. It might indicate that for HLB symptomatic and zinc-deficientleaves the spectral variation was more prominent than the textural variation.In terms of classification, the effect of spectral features was better than the textual features.
Also, gray images of the four wavelengths selected by SPA were used to extract the textural features. The useful spatial information was contained in the images of 630.5 and 679.4 nm. And the gray information was contained in the images of 749.4 and 899.9 nm. A total of 60 textural features were extracted from the four gray images based on the gray-level histogram and GLCM. The normalized textural features were also selected by SPA. The first ten features ranked by the contributions were: gray means of 749.4, 899.9 and 630.4 nm; gray variance and GLCM uniformity of 679.4 nm;gray skewness of 899.9 nm; GLCM contrast of 630.4 and 679.4 nm; GLCM uniformity and GLCM mean of 630.4 nm. And the results of classification for the three classes based on the selected textural features are listed in table 5.
Table 5 Results of classification for three classes of citrus leaves in the validation data set using textural variables
ModelNumberofinputvariablesAccuraciesofidentification/%HealthyHLBZinc?deficientaveragePCA?LS?SVM493 380 082 185 1SPA?LS?SVM1010093 392 995 4
Compared with the results in Table 4, the accuracies of identification based on textual features of PCs images reduced a little bit. While, the accuracies based on the textural variables selected by SPA were improved, which were 100%, 93.3% and 92.9% for healthy, HLB and zinc-deficient respectively. That was because in PCs images the main information used in model was the spatial texture while in the selected gray images both spectral and spatial texture were used for modeling. The gray-level provided the gray information of the image, which contained the average spectral reflectance in gray mean. And the textural features such as GLCM uniformity and GLCM contrast contained much ofspatial information of pixel gray distribution, which made a better result of classification.
2.4 Discussion
The method of variable selection, successive projections algorithm (SPA), was introduced in this study. A total of four wavelength variables were selected using SPA, which were 749.4, 899.9, 630.5 and 679.4 nm ranked by their contributions to classification model. They improved the average prediction accuracy from 80.6% to 87.4%, similar to the results using PCA. The 749.4 nm was usually considered in the red edge area, which was sensitive to the growth stages of plant. And 630.5 and 679.4 nm were in region of the red band, which were closely related to absorption of chlorophyll. All in all, the selected wavelengths can indirectly reflect the growth of plants.
The capabilities of classification based on spectral features and spatial textual features were also estimated in this study. Compared the results based on spectral features of PCs and the textual features of PCs images, it can be seen that stresses such as disease and nutrition deficiency always caused physiological changes (spectral variation) in plant firstly and then surface organization structure changes (texture variation). So, for early stage of HLB identification, spectral features had a better performance than the spatial textural features. Certainly, the combined information containing both spectral and textural features had a higher potential in disease identification.
In thisstudy, spectral features and textural features were evaluated in identification of HLB-symptomatic citrus leaves based on hyperspectral imaging technique. Another class of stress with zinc deficiency, which was similar to HLB-symptomatic in both visual and spectral symptoms, was introduced to test the performance of identification. The features containing both spectral and spatial information had a good potential in HLB identification. It indicated it was needed to develop a portable and low-cost multispectral imaging system for disease detection.
[1] Bove J M. Journal of Plant Pathology, 2006, 88(1): 7.
[2] Li X, Lee W S, Li M, et al. Computers and Electronics in Agriculture, 2012, 83: 32.
[3] Choi D, Lee W S, Ehsani R. St. Joseph, Mich.: ASABE, 2013.
[4] Wang N, Trivedi P. Phytopathology, 2013, 103(7): 652.
[5] Li W, Abad J A, French-Monar R D, et al. Microbiol. Methods, 2009, 78: 59.
[6] Lorente D, Aleixos N, Gómez-Sanchis J, et al. Food Bioprocess Technol., 2012, 5: 1121.
[7] Mishra A, Ehsani R, Albrigo G, et al. St. Joseph, Mich.: ASABE, 2007.
[8] Sankaran S, Ehsani R. Crop Prot., 2011, 30(11): 1508.
[9] Sankaran S, Mishra A, Maja J M, et al. Computers and Electronics in Agric., 2011, 77(2): 127.
[10] Kim D G, Burks T F, Schumann A W, et al. Agric. Eng. Intl. (CIGR Ejournal), 2009, 9, Manuscript 1194.
[11] Pereira F M V, Milori D M B P, Pereira-Filho E R, et al. Computers and Electronics in Agric., 2011, 79: 90.
[12] Anne K M, Erich C O, et al. Eur. J. Plant Pathol., 2012, 133: 197.
[13] Kumar A, Lee W S, Ehsani R, et al. International Conference on Precision Agriculture, 2010.
[14] Pourreza A, Lee W S,Ehsani R,et al. Computers and Electronics in Agriculture, 2015, 110: 221.
[15] Tain Y W, Li T L, Zhang L, et al. Transactionsof the Chinese Society of Agricultural Engineering, 2010, 26(5): 202.
[16] Xie C Q, Wang J Y, Feng L, et al. Spectrosc. Spectr. Anal., 2013, 33(6): 1603.
[17] Galiano V F R, Olmmo M C, Hernandez F A, et al. Remote Sensing of Environment, 2012, 121: 93.
[18] Araújo M C U, Saldanha T C B, Galv?o R K H, et al. Chemometrics and Intelligent Laboratory Systems, 2001, 57(2): 65.
*通訊聯(lián)系人
O433
A
基于高光譜成像技術(shù)應(yīng)用光譜及紋理特征識(shí)別柑橘黃龍病
馬 淏1,2,3,吉海彥1*,Won Suk Lee3
1. 中國(guó)農(nóng)業(yè)大學(xué)現(xiàn)代精細(xì)農(nóng)業(yè)系統(tǒng)集成研究教育部重點(diǎn)實(shí)驗(yàn)室,北京 100083
2. 河南科技大學(xué)農(nóng)業(yè)工程學(xué)院,河南 洛陽 471003
3. Department of Agricultural and Biological Engineering, University of Florida,Gainesville, FL 32611, USA
討論了基于高光譜成像技術(shù)光譜及紋理特征在識(shí)別早期柑橘黃龍病中的應(yīng)用。使用一套近地高光譜成像系統(tǒng)采集了176枚柑橘葉片的高光譜圖像作為實(shí)驗(yàn)樣品,其中健康葉片60枚,黃龍病葉片60枚,缺鋅葉片56枚。手工選取每幅葉片高光譜圖像的病斑位置作為樣品感興趣區(qū)域(regions of interest, ROI),計(jì)算其平均光譜反射率,并以此作為樣品的反射光譜,光譜范圍為396~1 010 nm。樣品光譜分別經(jīng)過主成分分析(PCA)及連續(xù)投影算法(SPA)進(jìn)行數(shù)據(jù)降維,再結(jié)合最小二乘支持向量機(jī)(LS-SVM)分類器建立分類模型。相比原始光譜,由PCA選取的前四個(gè)主成分及SPA選取的一組最佳波長(zhǎng)組合(630.4,679.4,749.4和899.9 nm)建立的模型擁有更好的分類識(shí)別能力,其對(duì)三類柑橘葉片平均預(yù)測(cè)準(zhǔn)確率分別為89.7%和87.4%。同時(shí),從被選四個(gè)波長(zhǎng)的每幅灰度圖像中提取6個(gè)灰度直方圖的紋理特征以及9個(gè)灰度共生矩陣的紋理特征再次構(gòu)建分類模型。經(jīng)SPA優(yōu)選的10個(gè)紋理特征值進(jìn)一步提高了分類效果,對(duì)三類柑橘葉片的識(shí)別正確率達(dá)到了100%,93.3%和92.9%。實(shí)驗(yàn)結(jié)果表明,同時(shí)包含光譜信息及空間紋理信息的高光譜圖像在柑橘黃龍病的識(shí)別中顯示了很大的潛力。
柑橘黃龍病; 高光譜成像; 分類; 紋理特征; 連續(xù)投影算法
2015-04-20,
2015-08-17)
Foundation item: The Citrus Research and Development Council, USA; National Natural Science Foundation for Young Scholars of China (31301240)
10.3964/j.issn.1000-0593(2016)07-2344-07
Received: 2015-04-20; accepted: 2015-08-17
Biography: MA Hao, (1985—), lecturer in College of Agricultural Engineering, Henan University of Science and Technology e-mail: mah85@cau.edu.cn *Corresponding author e-mail: instru@cau.edu.cn