閆建偉,趙 源,張樂偉,蘇小東,劉紅蕓,張富貴,3,樊衛(wèi)國,何 林,4
改進(jìn)Faster-RCNN自然環(huán)境下識(shí)別刺梨果實(shí)
閆建偉1,2,3,趙 源1,張樂偉1,蘇小東1,劉紅蕓1,張富貴1,3※,樊衛(wèi)國2,何 林1,4
(1. 貴州大學(xué)機(jī)械工程學(xué)院,貴陽 550025; 2. 國家林業(yè)和草原局刺梨工程技術(shù)研究中心,貴陽 550025;3. 貴州省山地農(nóng)業(yè)智能裝備工程研究中心,貴陽 550025;4. 六盤水師范學(xué)院,六盤水 553004)
為了實(shí)現(xiàn)自然環(huán)境下刺梨果實(shí)的快速準(zhǔn)確識(shí)別,根據(jù)刺梨果實(shí)的特點(diǎn),該文提出了一種基于改進(jìn)的Faster RCNN刺梨果實(shí)識(shí)別方法。該文卷積神經(jīng)網(wǎng)絡(luò)采用雙線性插值方法,選用Faster RCNN的交替優(yōu)化訓(xùn)練方式(alternating optimization),將卷積神經(jīng)網(wǎng)絡(luò)中的感興趣區(qū)域池化(ROI pooling)改進(jìn)為感興趣區(qū)域校準(zhǔn)(ROI align)的區(qū)域特征聚集方式,使得檢測結(jié)果中的目標(biāo)矩形框更加精確。通過比較Faster RCNN框架下的VGG16、VGG_CNN_M1024以及ZF 3種網(wǎng)絡(luò)模型訓(xùn)練的精度-召回率,最終選擇VGG16網(wǎng)絡(luò)模型,該網(wǎng)絡(luò)模型對(duì)11類刺梨果實(shí)的識(shí)別精度分別為94.00%、90.85%、83.74%、98.55%、96.42%、98.43%、89.18%、90.61%、100.00%、88.47%和90.91%,平均識(shí)別精度為92.01%。通過對(duì)300幅自然環(huán)境下隨機(jī)拍攝的未參與識(shí)別模型訓(xùn)練的刺梨果實(shí)圖像進(jìn)行檢測,并選擇以召回率、準(zhǔn)確率以及1值作為識(shí)別模型性能評(píng)價(jià)的3個(gè)指標(biāo)。檢測結(jié)果表明:改進(jìn)算法訓(xùn)練出來的識(shí)別模型對(duì)刺梨果實(shí)的11種形態(tài)的召回率最低為81.40%,最高達(dá)96.93%;準(zhǔn)確率最低為85.63%,最高達(dá)95.53%;1值最低為87.50%,最高達(dá)94.99%。檢測的平均速度能夠達(dá)到0.2 s/幅。該文算法對(duì)自然條件下刺梨果實(shí)的識(shí)別具有較高的正確率和實(shí)時(shí)性。
卷積神經(jīng)網(wǎng)絡(luò);Faster RCNN;機(jī)器視覺;深度學(xué)習(xí);刺梨果實(shí);目標(biāo)識(shí)別
刺梨廣泛分布于暖溫帶及亞熱帶地區(qū),在我國主要分布在貴州、云南、四川等地,其中以貴州的盤縣、龍里等刺梨資源最豐富、品種最多、產(chǎn)量最高[1]。
近年來,卷積神經(jīng)網(wǎng)絡(luò)(convolutional neural network,CNN)在目標(biāo)識(shí)別與檢測方面有廣泛的應(yīng)用。孫世鵬等[2]采用機(jī)器視覺技術(shù)對(duì)冬棗黑斑病害和縮果病害進(jìn)行無損檢測,分類正確率分別達(dá)到了89.6%和99.4%,但是該方法過于依賴顏色分量,在復(fù)雜背景下的冬棗識(shí)別效果有限;傅隆生等[3]提出一種基于LeNet卷積神經(jīng)網(wǎng)絡(luò)的深度學(xué)習(xí)模型進(jìn)行多簇獼猴桃果實(shí)圖像的識(shí)別方法,該方法對(duì)田間獼猴桃圖像具有較高的識(shí)別率和實(shí)時(shí)性,但對(duì)于強(qiáng)反射光及重疊果實(shí)的識(shí)別效果不理想。孫云云等[4]采用AlexNet經(jīng)典網(wǎng)絡(luò)模型對(duì)茶樹病害進(jìn)行圖像識(shí)別,平均測試準(zhǔn)確率為90%,正確區(qū)分率分別為85%、90%和85%,說明卷積神經(jīng)網(wǎng)絡(luò)在農(nóng)作物的識(shí)別上具有高效性和可行性。Przybylo等[5]提出了利用卷積神經(jīng)網(wǎng)絡(luò)對(duì)橡膠種子切片顏色圖像進(jìn)行活性識(shí)別的方法,該方法的準(zhǔn)確度(85%)相當(dāng)或略高于人工(84%),提高了作業(yè)效率。夏為為等[6]提出了一種基于卷積神經(jīng)網(wǎng)絡(luò)的改進(jìn)算法對(duì)宮頸癌細(xì)胞圖像進(jìn)行識(shí)別,該改進(jìn)算法降低了對(duì)宮頸癌細(xì)胞圖像的識(shí)別錯(cuò)誤率(從4.74%降到4.38%左右),說明神經(jīng)網(wǎng)絡(luò)在醫(yī)學(xué)領(lǐng)域也有重要的應(yīng)用。目標(biāo)識(shí)別方法正在從傳統(tǒng)的機(jī)器學(xué)習(xí)算法轉(zhuǎn)向神經(jīng)網(wǎng)絡(luò)領(lǐng)域,傳統(tǒng)的機(jī)器學(xué)習(xí)算法由于對(duì)目標(biāo)的顏色過度依賴,使其對(duì)于復(fù)雜背景中的目標(biāo)識(shí)別正確率較低。而卷積神經(jīng)網(wǎng)絡(luò)則通過對(duì)大量數(shù)據(jù)的訓(xùn)練,學(xué)習(xí)目標(biāo)具體特征,以實(shí)現(xiàn)對(duì)目標(biāo)的精準(zhǔn)識(shí)別與定位。卷積神經(jīng)網(wǎng)絡(luò)不僅在手寫字符識(shí)別[7-9]、人臉識(shí)別[10-14]、行為識(shí)別[15-21]以及車輛檢測[22-23]等方面有較成熟的應(yīng)用,在蘋果[24-26]、獼猴桃[3,27]和橘子[28]等果實(shí)的識(shí)別方面也有廣泛的應(yīng)用,但是,還沒有將神經(jīng)網(wǎng)絡(luò)用于刺梨果實(shí)識(shí)別方面的相關(guān)文獻(xiàn)。
目前,刺梨果實(shí)采摘是刺梨生產(chǎn)中最耗時(shí)、耗力的環(huán)節(jié),其投入的勞力約占生產(chǎn)過程50%~70%。刺梨果實(shí)的采摘人工成本高、勞動(dòng)強(qiáng)度大、采摘效率低[29]。刺梨果實(shí)自身重力較小,且枝梗較硬,使得刺梨花苞朝向各個(gè)方向,且刺梨果實(shí)顏色特征與其枝條和葉片相近,這對(duì)實(shí)現(xiàn)自然環(huán)境下刺梨果實(shí)的識(shí)別和定位帶來了困難。
本文結(jié)合自然環(huán)境下刺梨果實(shí)的生長特征,對(duì)Faster RCNN框架下的VGG16網(wǎng)絡(luò)的結(jié)構(gòu)和參數(shù)進(jìn)行了調(diào)整、改進(jìn)和優(yōu)化,通過對(duì)刺梨數(shù)據(jù)集的訓(xùn)練,最終得到一個(gè)基于改進(jìn)的卷積神經(jīng)網(wǎng)絡(luò)的刺梨果實(shí)識(shí)別模型,該模型能夠高效快速地識(shí)別自然環(huán)境下的刺梨果實(shí),以實(shí)現(xiàn)對(duì)刺梨果實(shí)進(jìn)行高精度、快速的識(shí)別。
本文刺梨果實(shí)圖像采集于貴州省龍里縣谷腳鎮(zhèn)茶香村刺梨產(chǎn)業(yè)示范園區(qū),品種為貴龍5號(hào)。2018年8月8日下午采集未成熟時(shí)期刺梨果實(shí)圖像1 500幅,天氣晴朗;2018年9月20日下午采集成熟時(shí)期刺梨果實(shí)圖像1 600幅,天氣晴朗;共采集自然環(huán)境下刺梨果實(shí)原始圖像3 100幅。本文所用圖像采用尼康(Nikon)D750單反相機(jī)多角度近距離(2 m以內(nèi))進(jìn)行拍攝,原始圖像格式為.JPEG,分辨率為6 016×4 016像素。自然環(huán)境下的刺梨果實(shí)圖像樣本示例如圖1所示。
圖1 自然環(huán)境下的刺梨果實(shí)圖像樣本示例
本文從拍攝的3 100幅刺梨果實(shí)圖像中選出2 000幅,將識(shí)別類別數(shù)設(shè)為11種。為避免參與訓(xùn)練的某類別數(shù)目過少而出現(xiàn)無法精確分類的欠擬合現(xiàn)象,以及某類別數(shù)目過多而出現(xiàn)在分類過程中過于注重某個(gè)特征的學(xué)習(xí)而導(dǎo)致分類錯(cuò)誤的過擬合現(xiàn)象。本研究盡量保證各類別的刺梨果實(shí)圖像樣本數(shù)量均衡。
通過Photoshop CS6軟件,將2 100幅大小為6 016× 4 016像素的原圖裁剪為多幅大小為500×500像素的完全包含刺梨果實(shí)的樣本,借助ACDSee軟件,對(duì)裁剪出的樣本進(jìn)行上下翻轉(zhuǎn)以及旋轉(zhuǎn)45°、90°和270°,增強(qiáng)圖像數(shù)據(jù)集,同時(shí)統(tǒng)一批量重命名為2018_000001.jpg格式,最終處理完后得到8 475幅樣本。再借助labelImg對(duì)其中8 175幅樣本進(jìn)行標(biāo)簽制作,本文使用POSCAL VOC2007數(shù)據(jù)集格式,制作樣本標(biāo)簽。
根據(jù)刺梨果實(shí)自然生長環(huán)境,按其成熟程度、獨(dú)立與相鄰、遮擋與否等情況,將刺梨果實(shí)圖像分為11類:1g0csnot、1g0csyes、1g1csnot、1g1csyes、2g0csnot、2g0csyes、2g1csnot、2g1csyes、ng0csnot、ng1csnot、ng1csyes;其中,1g、2g、ng分別表示相鄰單元刺梨果實(shí)數(shù)為1個(gè)、2個(gè)和3個(gè)以上(包含3個(gè)),0cs表示刺梨果實(shí)未成熟,1cs表示刺梨果實(shí)已成熟(顏色純黃視為成熟,其余情況均視為未成熟),yes表示有樹葉或樹枝等遮擋超過1/4而小于3/4的刺梨,not表示沒有遮擋或遮擋小于1/4或相互重疊而沒有被枝葉遮擋的情況,對(duì)于2個(gè)和3個(gè)的情況,有1個(gè)未成熟則視為未成熟,有1個(gè)遮擋超過1/2或有枝葉橫跨整個(gè)刺梨則視為遮擋。刺梨果實(shí)圖像分類簡圖如圖2所示。
注:圖為1g0csnot表示獨(dú)立刺梨未成熟沒有遮擋的情況, 其中:g表示個(gè)數(shù),其前面的數(shù)字表示有幾個(gè)相鄰,0cs表示未成熟,1cs表示成熟,顏色純黃視為成熟,其余情況均視為未成熟。yes和not分別表示有遮擋和無遮擋。
本文試驗(yàn)條件為:Ubuntu 16.04、64位操作系統(tǒng),采用caffe框架。相機(jī):尼康(Nikon D750),常用變焦頭:AF-S尼克爾,24-120mm f/4G ED VR鏡頭。電腦配置:臺(tái)式電腦,GeForce GTX 1060顯卡,6G顯存;Intel (R)Core(TM) i7-8700K處理器,主頻3.70 GHz,磁盤內(nèi)存 250 GB,編程語言是Python編程語言。
本文選擇以Faster RCNN作為刺梨果實(shí)檢測識(shí)別的基礎(chǔ)網(wǎng)絡(luò)框架,根據(jù)刺梨果實(shí)的圖像特征,對(duì)該框架下的VGG16、VGG_CNN_M1024及ZF 3種訓(xùn)練模型的重要結(jié)構(gòu)參數(shù)和模型訓(xùn)練策略進(jìn)行了改進(jìn)和優(yōu)化,使其能夠更好地實(shí)現(xiàn)對(duì)刺梨果實(shí)圖像的識(shí)別。
Faster RCNN由2部分構(gòu)成:特征提取和RPN+Fast RCNN。首先對(duì)特征進(jìn)行提取,再進(jìn)入?yún)^(qū)域候選網(wǎng)絡(luò)(region proposal network,RPN),最后進(jìn)入Faster RCNN的后半部分,作為特征提取的網(wǎng)絡(luò),再接入?yún)^(qū)域提取網(wǎng)絡(luò),生成建議矩形框(Proposals)[30]。各網(wǎng)絡(luò)主要結(jié)構(gòu)都由卷積層、激勵(lì)層、池化層、RPN層、ROI Align層及全連接層構(gòu)成,網(wǎng)絡(luò)結(jié)構(gòu)如下。
2.2.1卷積層(Conv layer)
Faster RCNN支持輸入任意大小的圖像,卷積層輸出圖像的大小如公式(1)所示。
式中outputsize表示該卷積層輸出到下一層的圖像大小,inputsize表示輸入該層的圖像大小,kernel_size表示卷積核大小,pad表示填充的像素,stride表示卷積核在圖像上滑動(dòng)的步長。
由于在卷積層圖像的每一個(gè)像素點(diǎn)都有一個(gè)新值,所以卷積層不會(huì)改變圖像的大小。
2.2.2 激勵(lì)層(ReLu layer)
由于ReLu函數(shù)的收斂速度非常快,所以選用修正線性單元函數(shù)(the rectified linear unit,ReLu)作為激勵(lì)層函數(shù)。
2.2.3 池化層(Pooling layer)
池化層選用了最大池化(Max-pooling)方法,可以在一定的程度上降低卷積層參數(shù)誤差造成的估計(jì)均值偏移所引起的特征提取的誤差。通過卷積層、激勵(lì)層和池化層完成了對(duì)輸入圖像特征圖的提取。
2.2.4 RPN(Region proposal networks)
RPN可以把一個(gè)任意尺度的圖像作為輸入,輸出一系列的建議矩形框(object proposals),每個(gè)矩形框都帶一個(gè)目標(biāo)框得分(objectness score)。它是在特征圖上提取候選框,大幅加快了訓(xùn)練速度。
2.2.5 感興趣區(qū)域校準(zhǔn)(ROI Align)
ROI Align是Mask-RCNN框架中提出的一種區(qū)域特征聚集方式,該方式能夠解決ROI Pooling操作中2次量化造成的區(qū)域不匹配(mis-alignment)的問題[31]。ROI Align使用雙線性內(nèi)插的方法獲得坐標(biāo)為浮點(diǎn)數(shù)的像素點(diǎn)上的圖像數(shù)值,從而將整個(gè)特征聚集過程轉(zhuǎn)化為一個(gè)連續(xù)的操作,解決了區(qū)域不匹配問題。
2.2.6 全連接層(FC layer)
全連接層即兩層之間的所有神經(jīng)元都有權(quán)重連接,它將ROI Align層輸出的特征圖(feature map)進(jìn)行全連接操作。
本文使用ImageNet預(yù)訓(xùn)練的模型初始化權(quán)值,選擇交替優(yōu)化訓(xùn)練方式(alternating optimization),在VGG16、VGG_CNN_M1024以及ZF3種訓(xùn)練模型下進(jìn)行了訓(xùn)練。
VGG16訓(xùn)練模型具有13個(gè)卷積層,13個(gè)激勵(lì)層,4個(gè)池化層,是一種用于數(shù)據(jù)分別較多,數(shù)據(jù)集較大的大型網(wǎng)絡(luò);VGG_CNN_M1024有5個(gè)卷積層,5個(gè)激勵(lì)層,2個(gè)池化層,是一種中型網(wǎng)絡(luò);ZF也有5個(gè)卷積層,5個(gè)激勵(lì)層,2個(gè)池化層,是一種用于分類較少數(shù)據(jù)較小的小型網(wǎng)絡(luò)。
VGG16、VGG_GNN_M_1024以及ZF 3種神經(jīng)網(wǎng)絡(luò)參數(shù)調(diào)整如下:樣本在4個(gè)階段的訓(xùn)練迭代總次數(shù)為280 000次,初始學(xué)習(xí)率設(shè)置為0.001,批處理圖像為128幅,RPN第一、二階段stepsize均為60 000,最大迭代次數(shù)為80 000,F(xiàn)aster RCNN第一、二階段stepsize均為40 000,最大迭代次數(shù)為60 000,動(dòng)量因子均為0.9,weight_decay均為0.000 5,分別在VGG16、VGG_GNN_M_1024和ZF模型下訓(xùn)練出來的精確率-召回率曲線(precision-recall, PR)如圖3所示。從圖3中可以看出,以VGG16網(wǎng)絡(luò)訓(xùn)練出來的P-R曲線圖效果最佳,其召回率更接近于1,說明在這3種網(wǎng)絡(luò)模型中,VGG16網(wǎng)絡(luò)下訓(xùn)練出來的模型所檢測出的目標(biāo)矩形框與制作標(biāo)簽時(shí)所畫的矩形框重疊度最高。
圖3 VGG16、VGG_GNN_M_1024和ZF模型下的PR曲線圖
VGG16、VGG_GNN_M_1024以及ZF 3種網(wǎng)絡(luò)模型下各類別的訓(xùn)練精度如表1所示,3種網(wǎng)絡(luò)模型下訓(xùn)練性能對(duì)比如表2所示。
從表1可以看出,3種網(wǎng)絡(luò)訓(xùn)練模型中,VGG16網(wǎng)絡(luò)模型訓(xùn)練出的模型平均精度最高,最低精度為0.837 4,最高精度為1.000 0。所以本文采用VGG16網(wǎng)絡(luò)模型進(jìn)行訓(xùn)練。
隨機(jī)從8 175幅樣本中選擇6 540(80%)幅作為訓(xùn)練驗(yàn)證集(trainval),作為訓(xùn)練集與驗(yàn)證集,其余20%作為測試集(test);trainval的80%作為訓(xùn)練集,其余20%作為驗(yàn)證集;其余未參與訓(xùn)練的300幅用于對(duì)最終模型進(jìn)行檢驗(yàn)。
表1 3種網(wǎng)絡(luò)模型下各類別的訓(xùn)練精度
表2 3種網(wǎng)絡(luò)模型下訓(xùn)練性能對(duì)比
本文的網(wǎng)絡(luò)可以直接將刺梨樣本作為數(shù)據(jù)輸入,通過歸一化將樣本縮放為500像素×500像素大小,在卷積層中,對(duì)歸一化后的圖像進(jìn)行了擴(kuò)邊處理(=1,即圖像四周填充一圈0像素點(diǎn)),以保證不改變輸入和輸出矩陣的大小。使用Relu函數(shù)作為激勵(lì)函數(shù)。經(jīng)過一個(gè)大小為2×2的核進(jìn)行池化(下采樣),圖像變?yōu)?50×250像素大小,網(wǎng)絡(luò)的訓(xùn)練階段采用批量隨機(jī)梯度下降法,經(jīng)過13個(gè)卷積層,13個(gè)Relu層,4個(gè)池化層后,生成大小為31像素×31像素大小的特征圖,該特征圖的每一個(gè)特征點(diǎn)都與原圖16像素×16像素大小的區(qū)域?qū)?yīng)。RCNN使用選擇性搜索(Selective Search)方法[8]生成檢測框,極大地提升了檢測框生成速度。
本文選用損失函數(shù)Softmax loss(對(duì)應(yīng)Softmax回歸分類器)進(jìn)行網(wǎng)絡(luò)性能的對(duì)比分析。基于VGG16的Faster RCNN網(wǎng)絡(luò)結(jié)構(gòu)框架圖如圖4所示。
通過對(duì)300幅未參與訓(xùn)練的刺梨果實(shí)圖像進(jìn)行檢驗(yàn),當(dāng)圖像大小為500像素×500像素時(shí)檢驗(yàn)速度最快,平均速度可達(dá)到0.2 s/幅。通過對(duì)300幅檢驗(yàn)圖像中實(shí)際存在的刺梨類別及其個(gè)數(shù)與檢驗(yàn)出來的數(shù)據(jù)進(jìn)行對(duì)比分析和對(duì)比,即可得到識(shí)別準(zhǔn)確率。
圖4 基于VGG16的Faster RCNN網(wǎng)絡(luò)結(jié)構(gòu)框架圖
改進(jìn)的卷積神經(jīng)網(wǎng)絡(luò)進(jìn)行刺梨果實(shí)識(shí)別模型訓(xùn)練的步驟如下:
1)對(duì)刺梨果實(shí)原圖進(jìn)行預(yù)處理,根據(jù)刺梨生長狀況進(jìn)行分類,盡量使各類別刺梨果實(shí)圖像數(shù)量相近;
2)根據(jù)改進(jìn)的卷積神經(jīng)網(wǎng)絡(luò)對(duì)輸入數(shù)據(jù)大小的要求,統(tǒng)一裁剪出500×500像素大小的樣本,同時(shí)通過旋轉(zhuǎn)、鏡像等擴(kuò)展樣本數(shù)量,制作訓(xùn)練樣本集;
3)采用交叉驗(yàn)證的模型訓(xùn)練方法,分別使用VGG16、VGG_GNN_M_1024以及ZF模型進(jìn)行訓(xùn)練,對(duì)比分析后,決定使用模型訓(xùn)練精度較高的VGG16進(jìn)行最終模型的訓(xùn)練。通過對(duì)輸入網(wǎng)絡(luò)的刺梨樣本進(jìn)行固定縮放至500像素×500像素,采用ReLu激勵(lì)函數(shù)以及最大池化的下采樣法,經(jīng)過13個(gè)卷積層+13個(gè)relu層+4個(gè)池化層提取特征圖;經(jīng)過RPN網(wǎng)絡(luò)的一個(gè)3×3的卷積,生成前景錨點(diǎn)與邊界框回歸偏移量,計(jì)算出建議候選框;
4)通過反向傳播算法調(diào)節(jié)更新網(wǎng)絡(luò)參數(shù),進(jìn)行參數(shù)的調(diào)整與改進(jìn);
5)通過區(qū)域特征聚集的方式,從特征圖中提取出建議候選框,輸入全連接層以及softmax網(wǎng)絡(luò)進(jìn)行分類,得到各類的平均精度值(average precision,AP)以及所有類別的平均精度均值(mean average precision,mAP)。最終得到矩形框幾何中心位置,即刺梨的近似質(zhì)心坐標(biāo),達(dá)到對(duì)刺梨的準(zhǔn)確識(shí)別與定位。
改進(jìn)的VGG16網(wǎng)絡(luò)模型下訓(xùn)練的各階段(stage1_rpn、stage1_fast_rcnn、stage2_rpn、stage2_fast_rcnn)Loss曲線圖如圖5所示。
圖5 改進(jìn)后的VGG16網(wǎng)絡(luò)模型下訓(xùn)練的各階段Loss曲線圖
從以上訓(xùn)練過程的4幅損失率-迭代次數(shù)關(guān)系圖可以看出,stage1_rpn和stage2_rpn為區(qū)域生成網(wǎng)絡(luò)(RPN)階段,在這一訓(xùn)練階段將生成大量的檢測框,其損失率較低,說明生成的大多數(shù)檢測框與標(biāo)簽中的目標(biāo)框重疊度較高。stage1_fast_rcnn和stage2_fast_rcnn為Fast RCNN網(wǎng)絡(luò)訓(xùn)練過程的損失率。stage1_rpn和stage2_rpn生成的檢測框進(jìn)行判別,區(qū)分目標(biāo)是背景還是目標(biāo)對(duì)象,并返回含有目標(biāo)對(duì)象的矩形框與感興趣區(qū)域的信息,stage2_rpn再生成大量檢測框,由stage2_fast_rcnn階段接著上一階段的數(shù)據(jù)進(jìn)行判別。從stage1_fast_rcnn和stage2_fast_rcnn可以看出,當(dāng)?shù)螖?shù)達(dá)到40 000次時(shí)曲線收斂,損失率約為0.05%左右,訓(xùn)練結(jié)果比較理想。
本文的研究是為刺梨果實(shí)的智能化采摘提供刺梨果實(shí)的識(shí)別、類別及位置等提供依據(jù),在采摘過程中,執(zhí)行末端的允許誤差半徑為10 mm,因此,需要識(shí)別出刺梨果實(shí)的大部分區(qū)域即可實(shí)現(xiàn)刺梨果實(shí)的識(shí)別。當(dāng)檢測的紅框區(qū)域與刺梨果實(shí)重疊部分超過約3/4且紅框標(biāo)記的類別與實(shí)際刺梨果實(shí)類別相同時(shí),視為正確檢測。
本文所用的模型性能評(píng)價(jià)指標(biāo)有召回率(recall,)、準(zhǔn)確率(precision,)以及F值,F值計(jì)算公式如公式(2)所示,其中,=檢測正確/(檢測正確+檢測誤以為正確);=檢測正確/(檢測正確+檢測誤以為錯(cuò)誤)。
式中為準(zhǔn)確率,%,為召回率,%。
根據(jù)對(duì)刺梨果實(shí)不同尺寸大小的圖像進(jìn)行檢測,當(dāng)圖像尺寸大小為500像素×500像素時(shí)檢測用時(shí)最少,檢測平均速度約為0.2 s每幅圖像。本文將沒參與訓(xùn)練的300幅大小為500像素×500像素的圖像對(duì)模型進(jìn)行了驗(yàn)證,其中包含刺梨的11種類別。未參與訓(xùn)練的300幅刺梨圖像在模型中檢測的結(jié)果如表3所示。
表3 模型檢測結(jié)果
由表3可知,本文選擇F值來度量識(shí)別模型的優(yōu)劣,1值即準(zhǔn)確率與召回率的一個(gè)平衡點(diǎn),能同時(shí)考慮到準(zhǔn)確率和召回率。1值均高于87.50%,最高可達(dá)到94.99%,說明本文訓(xùn)練出來的識(shí)別模型檢測效果較好。改進(jìn)前后訓(xùn)練的模型檢測效果圖如圖6所示。
由圖6可以看出,使用感興趣區(qū)域池化(ROI pooling)訓(xùn)練出來的模型對(duì)刺梨果實(shí)的檢測存在較大的偏差,改進(jìn)為感興趣區(qū)域校準(zhǔn)(ROI align)方法后,檢測矩形框的精度有明顯的改善。有極少數(shù)刺梨由于果實(shí)過小、遮擋面積較大或是模糊不清而不能被檢測到。同時(shí),也有小部分刺梨類別識(shí)別有誤,一方面是由于在制作標(biāo)簽時(shí)帶有人為判別誤差導(dǎo)致,另一方面也有數(shù)據(jù)集樣本不夠多的因素在內(nèi)。從效果圖可以看出,該識(shí)別模型對(duì)光線較暗情況下的目標(biāo)也能夠檢測到。
注:圖中小圖為具體果實(shí)識(shí)別情況。
目前,還沒有刺梨果實(shí)圖像識(shí)別相關(guān)研究,本文將應(yīng)用卷積神經(jīng)網(wǎng)絡(luò)Faster RCNN(ZFNet網(wǎng)絡(luò))、LeNet網(wǎng)絡(luò)對(duì)獼猴桃、蘋果等其他果實(shí)圖像進(jìn)行識(shí)別及檢測的結(jié)果進(jìn)行了對(duì)比,以驗(yàn)證本文的算法Faster RCNN(VGG16網(wǎng)絡(luò))的識(shí)別準(zhǔn)確率及識(shí)別速率。具體對(duì)比分析見表4基于卷積神經(jīng)網(wǎng)絡(luò)的果實(shí)識(shí)別算法比較所示。
表4 基于卷積神經(jīng)網(wǎng)絡(luò)的果實(shí)識(shí)別算法比較
從表4對(duì)獼猴桃、蘋果、刺梨等果實(shí)識(shí)別的準(zhǔn)確率和識(shí)別時(shí)間比較可以看出,本文的算法Faster RCNN(VGG16網(wǎng)絡(luò))對(duì)刺梨果實(shí)的識(shí)別精度高,達(dá)到95.16%;在單個(gè)果實(shí)識(shí)別識(shí)別速度更快,平均每個(gè)刺梨果實(shí)的識(shí)別時(shí)間約為0.20 s,平均時(shí)間上具有一定的優(yōu)勢,比Fu等[32]的方法要快0.07 s。同時(shí),本文提出的算法對(duì)弱光照和強(qiáng)光照條件下的刺梨果實(shí)都有較好的識(shí)別效果,適合在復(fù)雜的田園環(huán)境中對(duì)刺梨果實(shí)進(jìn)行有效地識(shí)別檢測,能夠達(dá)到刺梨果實(shí)自動(dòng)化識(shí)別定位采摘的要求。
1)為了實(shí)現(xiàn)刺梨果實(shí)的自動(dòng)化采摘,本文建立了一種基于卷積神經(jīng)網(wǎng)絡(luò)的田間刺梨果實(shí)識(shí)別方法。通過對(duì)Faster RCNN架構(gòu)下的VGG16、VGG_GNN_M_1024以及ZF網(wǎng)絡(luò)模型的結(jié)構(gòu)和參數(shù)進(jìn)行調(diào)整優(yōu)化,對(duì)比分析后選擇了適用于刺梨果實(shí)識(shí)別模型訓(xùn)練的VGG16網(wǎng)絡(luò),本文訓(xùn)練出來的識(shí)別模型對(duì)自然條件下刺梨果實(shí)識(shí)別率較高,能夠?yàn)榇汤婀麑?shí)的采摘奠定數(shù)據(jù)基礎(chǔ)。
2)本文中將Faster RCNN框架中的感興趣區(qū)域池化方法(ROI Pooling)改為Mask RCNN中提出的一種區(qū)域特征聚集方式-ROI Align,提高了模型檢測精度。本文算法對(duì)圖像中刺梨果實(shí)的平均識(shí)別速度能夠達(dá)到0.2 s/個(gè),1值最低為87.50%,最高達(dá)94.99%,能夠滿足刺梨果實(shí)識(shí)別采摘的要求。
本文利用卷積神經(jīng)網(wǎng)絡(luò)對(duì)刺梨果實(shí)圖像特征進(jìn)行深度提取的文章,為刺梨果實(shí)的智能化識(shí)別與采摘奠定了一定的基礎(chǔ),為刺梨果實(shí)的自動(dòng)化采摘技術(shù)的研究開啟了新的征程。
[1] 唐玲,陳月玲,王電,等. 刺梨產(chǎn)品研究現(xiàn)狀和發(fā)展前景[J]. 食品工業(yè),2013,34(1):175-178.
Tang Ling, Chen Yueling, Wang Dian, et al. The research status and the development prospect ofTratt products[J]. Food Industry, 2013, 34(1): 175-178. (in Chinese with English abstract)
[2] 孫世鵬,李瑞,謝洪起,等. 基于機(jī)器視覺的冬棗病害檢測[J]. 農(nóng)機(jī)化研究,2018(9):183-188.
Sun Shipeng, Li Rui, Xie Hongqi, et al. Detection of winter jujube diseases based on machine vision[J]. Journal of Agricultural Mechanization Research, 2018(9): 183-188. (in Chinese with English abstract)
[3] 傅隆生,馮亞利,Tola Elkamil,等. 基于卷積神經(jīng)網(wǎng)絡(luò)的田間多簇獼猴桃圖像識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2018,34(2):205-211.
Fu Longsheng, Feng Yali, Elkamil Tola, et al. Image recognition method of multi-cluster kiwifruit in field based onconvolutional neural networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE) 2018, 34(2): 205-211. (in Chinese with English abstract)
[4] 孫云云,江朝暉,董偉,等. 基于卷積神經(jīng)網(wǎng)絡(luò)和小樣本的茶樹病害圖像識(shí)別[J]. 江蘇農(nóng)業(yè)學(xué)報(bào),2019,35(1):48-55.
Sun Yunyun, Jinag Zhaohui,Dong Wei, et al. Image recognition of tea plant disease based on convolution neural net-work and small samples[J]. Jiangsu Journal of Agricultural Sciences, 2019, 35(1): 48-55. (in Chinese with English abstract)
[5] Przyby?o J, Jab?oński M. Using deep convolutional neural network for oak acorn viability recognition based on color images of their sections[J]. Computers and Electronics in Agriculture, 2019, 156: 409-499.
[6] 夏為為,夏哲雷. 基于卷積神經(jīng)網(wǎng)絡(luò)的宮頸癌細(xì)胞圖像識(shí)別的改進(jìn)算法[J]. 中國計(jì)量大學(xué)學(xué)報(bào),2018,29(4):439-444.
Xia Weiwei, Xia Zhelei. An improved algorithm for cervical cancer cell image recognition based on convolution neural networks[J]. Journal of China University of Metrology, 2018, 29(4): 439-444. (in Chinese with English abstract)
[7] Mane D T, Kulkarni U V. Visualizing and understanding customized convolutional neural network for recognition of handwritten marathi numerals[J]. Procedia Computer Science, 2018, 132: 1123-1137.
[8] Rabby A S A, Haque S, Abujar S, et al. Using convolutional neural network for bangla handwritten recognition[J]. Procedia Computer Science,2018, 143: 603-610.
[9] Trivedi A, Srivastava S, Mishra A, et al. Hybrid evolutionary approach for devanagari handwritten numeral recognition using convolutional neural network[J]. Procedia Computer Science,2018, 125: 525-532.
[10] Li Ya,Wang Guangrun, Nie Lin , et al. Distance metric optimization driven convolutional neural network for age invariant face recognition[J]. Pattern Recognition, 2018, 75: 51-62.
[11] O Toole A J, Castillo C D, Parde C J, et al. Face space representations in deep convolutional neural networks[J]. Trends in Cognitive Sciences, 2018, 22(9): 794-809.
[12] Jiao Licheng, Zhang Sibo, Li Lingling, et al.A modified convolutional neural network for face sketch synthesis[J]. Pattern Recognition, 2018, 76: 125-136.
[13] Banerjee S, Das S. Mutual variation of information on transfer-CNN for face recognition with degraded probe samples[J]. Neurocomputing,2018, 310: 299-315.
[14] Yang Meng, Wang Xing, Zeng Guohang, et al. Joint and collaborative representation with local adaptive convolution feature for face recognition with single sample per person[J]. Pattern Recognition, 2017, 66: 117-128.
[15] Aminikhanghahi S, Cook D J. Enhancing activity recognition using CPD-based activity segmentation[J]. Pervasive and Mobile Computing, 2019, 53: 75-89.
[16] Hassan M M, Uddin M Z, Mohamed A, et al. A robust human activity recognition system using smartphone sensors and deep learning[J]. Future Generation Computer Systems, 2018, 81: 307-313.
[17] Nweke H F, Teh Y W, Al-Garadi M A, et al. Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: State of the art and research challenges[J]. Expert Systems with Applications, 2018, 105: 233-261.
[18] San-Segundo R, Blunck H, Moreno-Pimentel J, et al. Robust human activity recognition using smartwatches and smartphones[J]. Engineering Applications of Artificial Intelligence, 2018, 72: 190-202.
[19] Ignatov A. Real-time human activity recognition from accelerometer data using convolutional neural networks[J]. Applied Soft Computing,2018, 62: 915-922.
[20] 張匯,杜煜,寧淑榮,等. 基于Faster RCNN的行人檢測方法[J]. 傳感器與微系統(tǒng),2019,38(2):147-149.
Zhang Hui, Du Yu, Ning Shurong,et al. Pedestrian detection method based on Faster RCNN[J]. Transducer and Microsystem Technologies, 2019, 38(2): 147-149. (in Chinese with English abstract)
[21] 李宗民,邢敏敏,劉玉杰,等. 結(jié)合Faster RCNN和相似性度量的行人目標(biāo)檢測[J]. 圖學(xué)學(xué)報(bào),2018,39(5):901-908.
Li Zongmin, Xing Minmin, Liu Yujie, et al. Pedestrian object detection based on Faster RCNN and similarity measurement[J]. Journal of graphics, 2018, 39(5): 901-908. (in Chinese with English abstract)
[22] 張琦,胡廣地,李雨生,等. 改進(jìn)Fast-RCNN的雙目視覺車輛檢測方法[J]. 應(yīng)用光學(xué),2018,39(6):832-838.
Zhang Qi, Hu Guangdi, Li Yusheng,et al. Binocular vision vehicle detection method based on improved Fast-RCNN[J]. Journal of Applied Optics, 2018, 39(6): 832-838. (in Chinese with English abstract)
[23] 史凱靜,鮑泓,徐冰心,等. 基于Faster RCNN的智能車道路前方車輛檢測方法[J]. 計(jì)算機(jī)工程,2018,44(7):36-41.
Shi Kaijing, Bao Hong, Xu Binxin,et al. Forward vehicle detection method of intelligent vehicle in road based on Faster RCNN[J]. Computer Engineering, 2018, 44(7): 36-41. (in Chinese with English abstract)
[24] 車金慶,王帆,呂繼東,等. 重疊蘋果果實(shí)的分離識(shí)別方法[J]. 江蘇農(nóng)業(yè)學(xué)報(bào),2019,35(2):469-475.
Che Jinqing, Wang Fan, Lv Jidong,et al. Separation and recognition method for overlapped apple fruits[J]. Jiangsu Journal of Agricultural Sciences, 2019, 35(2): 469-475. (in Chinese with English abstract)
[25] 程鴻芳,張春友. 自然場景下基于改進(jìn)LeNet卷積神經(jīng)網(wǎng)絡(luò)的蘋果圖像識(shí)別技術(shù)研究[J]. 食品與機(jī)械,2019,35(3):155-158.
Cheng Hongfang, Zhang Chunyou. Research on apple image recognition technology based on improved LeNet convolution neural network in natural scene[J]. Food and Machinery, 2019, 35(3): 155-158. (in Chinese with English abstract)
[26] Park K, Hong Y K, Kim G H,et al. Classification of apple leaf conditions in hyper-spectral images for diagnosis of Marssonina blotch using mRMR and deep neural network[J]. Computers and Electronics in Agriculture, 2018(148): 179-187.
[27] 詹文田,何東健,史世蓮. 基于Adaboost算法的田間獼猴桃識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2013,29(23):140-146.
Zhan Tianwen, He Dongjian, Shi Shilian. Recognition of kiwifruit in field based on adaboost algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering, 2013, 29(23): 140-146. (in Chinese with English abstract)
[28] 畢松,高峰,陳俊文,等. 基于深度卷積神經(jīng)網(wǎng)絡(luò)的柑橘目標(biāo)識(shí)別方法[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2019,50(5):182-186.
Bi Song, Gao Feng, Chen Junwen,et al. Detection method of citrus based on deep convolution neural network[J]. Transactions of The Chinese Society for Agricultural Machinery, 2019, 50(5): 182-186. (in Chinese with English abstract)
[29] 雷倩,楊永發(fā). 便攜式變徑球形刺梨采摘器的設(shè)計(jì)[J]. 林業(yè)機(jī)械與木工設(shè)備,2017,45(3):26-28.
Lei Qian, Yang Yongfa. Design of a portable variable- diameter sphericalPicker[J]. Forestry machinery and woodworking equipment, 2017, 45(3): 26-28. (in Chinese with English abstract)
[30] Shaoqing Ren, Kaiming He, Ross Girshick,et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[31] Kaiming He, Georgia Gkioxari, Piotr Dollar,et al. Mask R-CNN[C]. in ICCV,2017.
[32] Fu Longsheng, Feng Yali, Majeed Yaqoob, et al. Kiwifruit detection in field images using Faster R-CNN with ZFNet[J]. IFAC-Papers OnLine, 2018, 51(17): 45-50.
Recognition ofin natural environment based on improved Faster RCNN
Yan Jianwei1,2,3, Zhao Yuan1, Zhang Lewei1, Su Xiaodong1, Liu Hongyun1, Zhang Fugui1,3※, Fan Weiguo2, He Lin1,4
(1.,550025,; 2,550025,; 3.550025,; 4.553004,)
is widely distributed in warm temperate zone and subtropical zone, mainly in Guizhou, Yunnan, Sichuan and other places in China. Panxian and Longli are the most abundant the most varieties and the highest yieldresources in Guizhou. The harvesting offruit is the most time-consuming and labor-consuming work inproduction, and its labor input accounts for 50%-70% of the production process. Hand-picking offruit is of high cost, high labor intensity and low picking efficiency. In recent years, convolutional neural network has been widely used in target recognition and detection. However, there is no relevant literature on the application of neural network infruit recognition. In this paper, in order to realize rapid and accurate identification offruits in natural environment, according to the characteristics offruits, the structure and parameters of VGG16, VGG_CNN_M1024 and ZF network models under the framework of Faster RCNN were optimized by comparing them. The convolutional neural network adopted bilinear interpolation method and selected alternating optimization training method of Faster RCNN. ROI Pooling in convolutional neural network is improved to ROI Align regional feature aggregation. Finally, VGG16 network model is selected to make the target rectangular box in the detection result more accurate. 6 540 (80%) of 8 175 samples were selected randomly as training validation set (trainval), the remaining 20% as test set, 80% as training set, the remaining 20% as validation set, and the remaining 300 samples that were not trained were used to test the final model. The recognition accuracy of the network model for 11fruits was 94.00%, 90.85%, 83.74%, 98.55%, 96.42%, 98.43%, 89.18%, 90.61%, 100.00%, 88.47% and 90.91%, respectively. The average recognition accuracy was 92.01%. The results showed that the recognition model trained by the improved algorithm had the lowest recall rate of 81.40%, the highest recall rate of 96.93%, the lowest accuracy rate of 85.63%, the highest 95.53%, and the lowest1value of 87.50%, the highest 94.99%. Faster RCNN (VGG16 network) has high recognition accuracy forfruit, reaching 95.16%. The recognition speed of single fruit is faster, and the average recognition time of eachfruit is about 0.2 seconds. The average time has some advantages, which is 0.07 s faster than the methods of Fu Longsheng. In this paper, a Faster RCNNfruit recognition network model based on improved VGG16 is proposed, which is suitable forfruit recognition model training. The algorithm proposed in this paper has good recognition effect forfruit under weak and strong illumination conditions, and is suitable for effective recognition and detection offruit in complex rural environment. This paper is the first study on the depth extraction offruit image features by using convolution neural network. This research has high recognition rate and good real-time performance under natural conditions, and can meet the requirements of automatic identification and positioning picking offruit. It lays a certain foundation for intelligent identification and picking offruit, and opens a new journey for the research of automatic picking technology offruit.
convolutional neural network; Faster RCNN; machine vision; deep learning;; target recognition
10.11975/j.issn.1002-6819.2019.18.018
TP391.41
A
1002-6819(2019)-18-0143-08
閆建偉,趙 源,張樂偉,蘇小東,劉紅蕓,張富貴,樊衛(wèi)國,何 林. 改進(jìn)Faster-RCNN自然環(huán)境下識(shí)別刺梨果實(shí)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(18):143-150.doi:10.11975/j.issn.1002-6819.2019.18.018 http://www.tcsae.org
Yan Jianwei, Zhao Yuan, Zhang Lewei, Su Xiaodong, Liu hongyun, Zhang Fugui, Fan Weiguo, He Lin. Recognition ofin natural environment based on improved Faster RCNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(18): 143-150. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2019.18.018 http://www.tcsae.org
2019-03-26
2019-08-25
貴州大學(xué)培育項(xiàng)目(黔科合平臺(tái)人才[2017]5788);貴州省普通高等學(xué)校工程研究中心建設(shè)項(xiàng)目(黔教合KY字[2017]015);貴州省科技計(jì)劃項(xiàng)目(黔科合平臺(tái)人才[2019]5616號(hào))
閆建偉,博士,副教授。主要從事農(nóng)業(yè)智能化技術(shù)與裝備研究。Email:jwyan@gzu.edu.cn。
張富貴,博士,教授,主要從事農(nóng)業(yè)機(jī)械化技術(shù)研究。Email:zhfugui@vip.163.com
中國農(nóng)業(yè)工程學(xué)會(huì)會(huì)員:閆建偉(E041201018S)