• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

      基于移位窗口Transformer網(wǎng)絡(luò)的玉米田間場(chǎng)景下雜草識(shí)別

      2022-11-13 07:02:22武新慧張燕青王文俊
      關(guān)鍵詞:雜草田間語義

      王 璨,武新慧,張燕青,王文俊

      ·農(nóng)業(yè)信息與電氣技術(shù)·

      基于移位窗口Transformer網(wǎng)絡(luò)的玉米田間場(chǎng)景下雜草識(shí)別

      王 璨,武新慧※,張燕青,王文俊

      (山西農(nóng)業(yè)大學(xué)農(nóng)業(yè)工程學(xué)院,太谷 030801)

      針對(duì)實(shí)際復(fù)雜田間場(chǎng)景中作物與雜草識(shí)別準(zhǔn)確性與實(shí)時(shí)性差,易受交疊遮擋影響,像素級(jí)數(shù)據(jù)標(biāo)注難以大量獲取等問題,該研究提出基于移位窗口Transformer網(wǎng)絡(luò)(Shifted Window Transformer,Swin Transformer)的高效識(shí)別方法,在實(shí)現(xiàn)作物語義分割的基礎(chǔ)上快速分割雜草。首先建立玉米語義分割模型,引入Swin Transformer主干并采用統(tǒng)一感知解析網(wǎng)絡(luò)作為其高效語義分割框架;改進(jìn)Swin Transformer主干調(diào)整網(wǎng)絡(luò)參數(shù),生成4種改進(jìn)模型,通過精度與速度的綜合對(duì)比分析確定最佳模型結(jié)構(gòu);基于玉米形態(tài)分割,建立改進(jìn)的圖像形態(tài)學(xué)處理組合算法,實(shí)時(shí)識(shí)別并分割全部雜草區(qū)域。測(cè)試結(jié)果表明,該研究4種改進(jìn)模型中,Swin-Tiny-UN達(dá)到最佳精度-速度平衡,平均交并比為94.83%、平均像素準(zhǔn)確率為97.18%,推理速度為18.94幀/s。對(duì)于模擬實(shí)際應(yīng)用的視頻數(shù)據(jù),平均正確檢測(cè)率為95.04%,平均每幀檢測(cè)時(shí)間為5.51′10-2s。該方法有效實(shí)現(xiàn)了玉米與雜草的實(shí)時(shí)準(zhǔn)確識(shí)別與精細(xì)分割,可為智能除草裝備的研發(fā)提供理論參考。

      作物;目標(biāo)識(shí)別;圖像分割;語義分割;玉米;雜草識(shí)別

      0 引 言

      雜草是影響幼苗期作物生長(zhǎng)的主要因素之一,及時(shí)進(jìn)行除草作業(yè)可減少養(yǎng)分爭(zhēng)奪、發(fā)育遲緩以及病蟲害等問題,是保障作物穩(wěn)產(chǎn)增產(chǎn)的必要措施[1]。當(dāng)前大田除草方式依然以化學(xué)防治為主,除草劑等投入品的過量使用帶來農(nóng)業(yè)面源污染、作物農(nóng)藥殘留和雜草抗藥性增強(qiáng)等一系列問題[2-5]。為實(shí)現(xiàn)除草投入品的減量化,以精準(zhǔn)噴藥、機(jī)械除草和電擊除草等為主要工作方式的田間除草裝備被廣泛研究[6-10]。當(dāng)前智慧農(nóng)業(yè)階段,智能化的田間除草裝備是組建無人農(nóng)場(chǎng)作業(yè)裝備系統(tǒng)[11-12]的重要環(huán)節(jié)。為實(shí)現(xiàn)無人參與條件下的精準(zhǔn)除草,智能除草裝備首先要實(shí)現(xiàn)作物與雜草的快速、準(zhǔn)確識(shí)別。

      基于機(jī)器視覺的識(shí)別方法在作物與雜草識(shí)別研究中被廣泛接納。通過建立機(jī)器學(xué)習(xí)模型[13-14],對(duì)圖像中人工提取的特征向量進(jìn)行分類,達(dá)到識(shí)別作物和雜草圖像的目的。苗榮慧等[15]基于圖像分塊重構(gòu)結(jié)合支持向量機(jī)模型實(shí)現(xiàn)對(duì)菠菜和雜草的識(shí)別。淺層機(jī)器學(xué)習(xí)類方法特征模式相對(duì)固定,泛化能力和環(huán)境適應(yīng)性較差。近年來基于深層卷積神經(jīng)網(wǎng)絡(luò)的各類識(shí)別方法被廣泛應(yīng)用于相關(guān)研究中[16-17]。孫俊等[18]結(jié)合空洞卷積和全局池化提出多尺度特征融合模型,實(shí)現(xiàn)對(duì)多種雜草的識(shí)別。趙輝等[19]基于改進(jìn)的DenseNet模型,解決玉米幼苗及其6種伴生雜草的種類識(shí)別問題。Jiang等[20]建立圖卷積神經(jīng)網(wǎng)絡(luò),對(duì)3種作物及其伴生雜草進(jìn)行識(shí)別。上述方法實(shí)現(xiàn)了作物和雜草的圖像分類,但無法識(shí)別并定位同一圖像中的不同種類目標(biāo)。為解決該問題,基于深度學(xué)習(xí)結(jié)構(gòu)的目標(biāo)檢測(cè)方法被廣泛采用[21]。彭明霞等[22]融合快速區(qū)域卷積神經(jīng)網(wǎng)絡(luò)和特征金字塔網(wǎng)絡(luò),提出復(fù)雜背景下棉田雜草高效識(shí)別檢測(cè)方法。孟慶寬等[23]提出基于輕量卷積和特征信息融合機(jī)制的多框檢測(cè)器對(duì)玉米及其伴生雜草進(jìn)行識(shí)別。這類方法在作物與雜草間存在交疊遮擋情況下的檢測(cè)效果欠佳,且生成的檢測(cè)錨框出現(xiàn)大面積重疊時(shí)無法進(jìn)一步分割不同目標(biāo)區(qū)域。有學(xué)者進(jìn)而采用基于深度學(xué)習(xí)的語義分割方法實(shí)現(xiàn)作物和雜草識(shí)別。Wang等[24]建立了一種Encoder-Decoder結(jié)構(gòu)的語義分割網(wǎng)絡(luò),通過融合近紅外與RGB增強(qiáng)圖像,實(shí)現(xiàn)甜菜地的雜草識(shí)別。孫俊等[25]融合近紅外與可見光圖像,構(gòu)建多通道深度可分離卷積模型識(shí)別甜菜與雜草。Khan等[26]建立CED-Net分割模型,并在4種數(shù)據(jù)集上進(jìn)行測(cè)試。王璨等[27]基于改進(jìn)的雙注意力語義分割模型實(shí)現(xiàn)田間苗期玉米和雜草的識(shí)別分割。已有研究表明,語義分割能夠在識(shí)別作物和雜草的同時(shí)得到各自的區(qū)域分割。但訓(xùn)練所需的像素級(jí)標(biāo)注難度大,數(shù)據(jù)樣本不易獲取,且實(shí)時(shí)性較弱。

      為解決上述問題,增強(qiáng)識(shí)別精度與實(shí)時(shí)性,本文提出基于移位窗口Transformer網(wǎng)絡(luò)(Shifted Window Transformer,Swin Transformer)的識(shí)別方法。首先針對(duì)真實(shí)的復(fù)雜田間場(chǎng)景建立玉米語義分割模型。該模型基于先進(jìn)的Swin Transformer主干,并采用統(tǒng)一感知解析網(wǎng)絡(luò)(Unified Perceptual Parsing Network,UperNet)作為高效的語義分割框架。通過改進(jìn)Swin Transformer結(jié)構(gòu),生成具有最佳精度-速度平衡的分割模型。模型的訓(xùn)練數(shù)據(jù)中無需對(duì)各類雜草進(jìn)行額外的人工像素標(biāo)注,樣本獲取難度大幅減小。然后通過組合改進(jìn)圖像形態(tài)學(xué)處理方法,提出簡(jiǎn)單有效的雜草識(shí)別算法,在玉米形態(tài)區(qū)域分割結(jié)果的基礎(chǔ)上,實(shí)時(shí)分割出全部的雜草區(qū)域。本文方法可對(duì)交疊遮擋情況下的玉米和雜草目標(biāo)進(jìn)行識(shí)別,并得到各自區(qū)域的精細(xì)分割。通過復(fù)雜田間場(chǎng)景圖像和視頻對(duì)本文方法進(jìn)行試驗(yàn)研究,以期達(dá)到更高的準(zhǔn)確性和實(shí)時(shí)性,為智能除草裝備的研發(fā)提供理論依據(jù)。

      1 材料與方法

      1.1 數(shù)據(jù)集生成

      本研究以幼苗期玉米的田間圖像為研究對(duì)象。為保證數(shù)據(jù)集的泛化性,所采集的圖像包含環(huán)境差異、光線差異以及生長(zhǎng)差異。圖像采集地點(diǎn)為5處不同的玉米種植田,代表作業(yè)時(shí)的不同田間環(huán)境。一般情況下,玉米苗后除草工作在2~5葉期間進(jìn)行,因此選擇不進(jìn)行任何除草作業(yè)的真實(shí)地塊,在此期間進(jìn)行3次圖像采集,每次采集均分為3個(gè)不同時(shí)間段(07:00-09:00、10:00-12:00、15:00-17:00)完成,代表實(shí)際應(yīng)用時(shí)不同的作物生長(zhǎng)階段以及光線條件。采用垂直俯視角度拍攝,設(shè)備距地面高度在50~60 cm之間隨機(jī)變化,代表實(shí)際作業(yè)過程中因地形波動(dòng)所可能引起的圖像尺度變化。共采集幼苗期玉米田間圖像1 000幅,包涵目標(biāo)交疊在內(nèi)的各種真實(shí)復(fù)雜情況。

      將采集圖像的分辨率統(tǒng)一調(diào)整為512′512像素,采用Labelme(v4.5.6)軟件進(jìn)行手工標(biāo)注。僅標(biāo)注圖像中的玉米目標(biāo)區(qū)域,其他部分均為背景。采用多邊形標(biāo)注法,于圖像中的玉米目標(biāo)輪廓上進(jìn)行人工密集選點(diǎn),連接繪制成貼合邊界的封閉多邊形區(qū)域,如圖1b所示。其內(nèi)全部像素即標(biāo)記為玉米類別,其外全部像素自動(dòng)定義為背景類別,生成標(biāo)簽如圖1c所示。在作物雜草識(shí)別任務(wù)中,常規(guī)像素級(jí)標(biāo)注除按上述方法標(biāo)注玉米區(qū)域外,還須以同樣的步驟對(duì)雜草像素進(jìn)行標(biāo)記,如圖1d所示。在圖像中包含大量雜草的情況下,待標(biāo)注目標(biāo)數(shù)量成倍增長(zhǎng)。由圖1c和1d對(duì)比可知,本文方法下的圖像標(biāo)注具有更少的目標(biāo)數(shù)量以及標(biāo)記類別,人工數(shù)據(jù)標(biāo)注量大幅減少。

      注:圖1c包含玉米和背景標(biāo)簽;圖1d包含玉米、雜草和背景標(biāo)簽。

      按照PASCAL VOC 2012格式生成數(shù)據(jù)集。以7∶2∶1的比例將數(shù)據(jù)集劃分為模型的訓(xùn)練集(700幅圖像)、驗(yàn)證集(200幅圖像)和測(cè)試集(100幅圖像),各集合間無重復(fù)數(shù)據(jù)。

      1.2 數(shù)據(jù)增強(qiáng)

      為使模型得到更為充分的訓(xùn)練,進(jìn)一步提升語義分割精度,本研究通過數(shù)據(jù)增強(qiáng)[28]方式對(duì)訓(xùn)練集樣本量進(jìn)行擴(kuò)充。采用的方法包括:1)色彩抖動(dòng),隨機(jī)調(diào)整圖像的飽和度、亮度、對(duì)比度以及銳度;2)隨機(jī)裁剪,以隨機(jī)尺寸的正方形窗口(大于128′128像素)裁剪圖像并通過雙線性插值調(diào)整回原大??;3)隨機(jī)旋轉(zhuǎn),以任意隨機(jī)角度旋轉(zhuǎn)圖像并填充空余像素。通過上述方法將訓(xùn)練集樣本量擴(kuò)展為原來的15倍。

      1.3 基于Swin Transformer的玉米語義分割模型

      為準(zhǔn)確識(shí)別田間場(chǎng)景圖像中的玉米幼苗及其形態(tài)區(qū)域,本文提出的玉米語義分割模型基于先進(jìn)的Swin Transformer主干網(wǎng)絡(luò),并采用UperNet作為其高效率的識(shí)別與分割框架。為進(jìn)一步提高模型的準(zhǔn)確性與快速性,通過對(duì)Swin Transformer不同改進(jìn)變體的對(duì)比試驗(yàn),探尋最佳的精度-速度平衡,確定最優(yōu)模型結(jié)構(gòu)。

      1.3.1 Swin Transformer主干網(wǎng)絡(luò)

      Swin Transformer是一種采用全自注意力機(jī)制的視覺任務(wù)主干網(wǎng)絡(luò)[29]。在語義分割方面的表現(xiàn)優(yōu)于卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Network,CNN)架構(gòu)的主干[30]。其與普通Transformer架構(gòu)相比:1)可構(gòu)建分層特征表達(dá),實(shí)現(xiàn)模型的像素級(jí)密集預(yù)測(cè);2)采用基于移位窗口的自注意力機(jī)制,建模能力顯著增強(qiáng)。3)自注意力在分割圖像的非重疊窗口內(nèi)局部計(jì)算,同時(shí)允許跨窗口連接,加快了模型的推理速度。

      建立Swin Transformer主干網(wǎng)絡(luò)的基礎(chǔ)形式,記為Swin-Base,結(jié)構(gòu)如圖2所示。網(wǎng)絡(luò)首先通過塊分割層將輸入圖像分割成不重疊的圖像塊。圖像塊大小為4×4,每個(gè)圖像塊的特征維度為4×4×3=48(原始值特征)。塊分割大小的選擇與網(wǎng)絡(luò)各階段輸出特征尺寸相關(guān),在本文輸入圖像分辨率下,2×2塊分割僅能得到維度12的原始特征,包含較少的局部信息,網(wǎng)絡(luò)各階段輸出特征尺寸均變?yōu)?倍,運(yùn)算壓力增大易導(dǎo)致顯存溢出;3×3、5×5至7×7塊分割在網(wǎng)絡(luò)各階段無法得到整數(shù)的特征圖尺寸,變換中的取整操作會(huì)導(dǎo)致部分特征信息丟失或改變;8×8塊分割在最終階段的輸出特征尺寸僅為8×8元素,對(duì)于輸入圖像尺寸來說,特征分辨率不足。網(wǎng)絡(luò)階段1由線性嵌入層和Swin Transformer模塊組成。線性嵌入層將每個(gè)圖像塊的原始特征投影到維度128,Swin Transformer模塊保持圖像塊的數(shù)量為128×128(特征圖尺寸)。在該模塊中,W-MSA和SW-MSA分別為使用常規(guī)和移位窗口劃分的多頭自注意力(Multi-head Self Attention,MSA)[31],MLP為帶有GELU非線性激活函數(shù)的2層感知器(Multi-layer Perceptron,MLP),在每個(gè)MSA和MLP模塊前應(yīng)用層歸一化(Layer Normalization,LN),且均有殘差連接。模塊使用移位窗口劃分和向左上方循環(huán)移位的批處理方法。批處理窗口由特征圖中不相鄰的子窗口組成,使用掩蔽機(jī)制將自注意力計(jì)算限制在每個(gè)子窗口內(nèi)。遵循文獻(xiàn)[32]中的方法計(jì)算自注意力。

      注:LN表示層歸一化;W-MSA和SW-MSA分別表示具有常規(guī)窗口配置和移位窗口配置的多頭自注意力模塊;MLP表示多層感知器;?表示按元素求和。

      1.3.2 Swin Transformer的改進(jìn)變體

      為進(jìn)一步增強(qiáng)推理速度,在Swin-Base基礎(chǔ)上改進(jìn)模型結(jié)構(gòu)。選取架構(gòu)的重要超參數(shù)進(jìn)行敏感性試驗(yàn)。以網(wǎng)絡(luò)合理性為前提設(shè)置各參數(shù)的可調(diào)檔位。通過控制變量,考察調(diào)節(jié)各參數(shù)對(duì)于識(shí)別精度與速度的影響。每次試驗(yàn)通過固定的預(yù)訓(xùn)練權(quán)重初始化,重復(fù)3次。采用驗(yàn)證集平均交并比(Mean Intersection over Union,mIoU)和每秒處理幀數(shù)(Frames Per Second,F(xiàn)PS)作為精度和速度指標(biāo),結(jié)果如表1所示。

      表1 參數(shù)敏感性試驗(yàn)

      注:mIoU為平均交并比;FPS為每秒處理幀數(shù)。下同。

      Note: mIoU is mean intersection over union; FPS is frames per second. Same below.

      由表1可知,調(diào)整移位窗口大小對(duì)mIoU存在較大影響,極差達(dá)8.46個(gè)百分點(diǎn),說明過大或過小的局部感受野均無法使模型維持較高精度;在推理速度方面極差為0.41幀/s,對(duì)于該參數(shù)變化不敏感。調(diào)整下采樣比率可對(duì)FPS產(chǎn)生影響,極差為7.04幀/s;但同時(shí)mIoU極差可達(dá)到14.49個(gè)百分點(diǎn),說明模型精度對(duì)于該參數(shù)變化更為敏感,較大的下采樣比率會(huì)使特征分辨率下降過快,出現(xiàn)鋸齒效應(yīng)使mIoU大幅降低,較小的下采樣比率無法達(dá)到特征池化效果,泛化能力降低導(dǎo)致精度下降。因此上述兩項(xiàng)參數(shù)不宜調(diào)整,以保證模型精度。隱層通道數(shù)決定特征映射的維度,通過調(diào)整特征維度可對(duì)FPS產(chǎn)生較大影響,極差達(dá)5.49幀/s;mIoU的極差為2.56個(gè)百分點(diǎn),相對(duì)不敏感。頭部數(shù)量用于監(jiān)督特征產(chǎn)生預(yù)測(cè)能力,mIoU和FPS的極差分別為1.05個(gè)百分點(diǎn)和4.86幀/s,推理速度對(duì)于參數(shù)調(diào)整更為敏感。Swin Transformer模塊是網(wǎng)絡(luò)核心,調(diào)整模塊數(shù)量對(duì)mIoU和FPS均產(chǎn)生影響,極差分別為5.41個(gè)百分點(diǎn)和8.77幀/s,F(xiàn)PS對(duì)模塊數(shù)量的變化更為敏感。因此模型推理速度對(duì)后3項(xiàng)參數(shù)均存在敏感性,且精度敏感性相對(duì)較弱。

      基于各項(xiàng)超參數(shù)的敏感性分析,本文在保持Swin-Base移位窗口大小與下采樣比率不變的條件下,通過調(diào)整隱層通道數(shù)、頭部數(shù)量以及模塊數(shù)量等部分參數(shù)改變模型體量,各項(xiàng)參數(shù)在相鄰變體之間不做跨間隔調(diào)整,以在盡可能保持精度平穩(wěn)的同時(shí)加快推理速度。為生成充分的變體模型進(jìn)行對(duì)比試驗(yàn),以每次減少計(jì)算復(fù)雜度的1/2為依據(jù),等比調(diào)整模型體量。生成變體模型Swin-Small、Swin-Tiny以及Swin-Nano,網(wǎng)絡(luò)大小與計(jì)算復(fù)雜度分別為Swin-Base的1/2、1/4和1/8左右。此外,為了加強(qiáng)模型對(duì)比的全面性,依照相鄰模型計(jì)算復(fù)雜度之間的等比關(guān)系設(shè)置變體Swin-Large,網(wǎng)絡(luò)大小與計(jì)算復(fù)雜度為Swin-Base的2倍左右。由此4個(gè)變體與基礎(chǔ)模型的參數(shù)設(shè)置覆蓋了敏感性試驗(yàn)中的全部可調(diào)范圍,各模型結(jié)構(gòu)超參數(shù)如表2中所示。

      表2 Swin Transformer改進(jìn)變體的結(jié)構(gòu)超參數(shù)

      注:Swin-Base為本文所建立的Swin Transformer基礎(chǔ)網(wǎng)絡(luò);Swin-Large、Swin-Small、Swin-Tiny和Swin-Nano為在Swin-Base基礎(chǔ)上生成的改進(jìn)變體。下同。

      Note: Swin-Base is the basic network of Swin Transformer established in this paper; Swin-Large, Swin-Small, Swin-Tiny and Swin-Nano are improved variants generated on Swin-Base. Same below.

      對(duì)于全部模型試驗(yàn),每個(gè)頭部的查詢維度均設(shè)置為32,每個(gè)MLP的擴(kuò)展層均設(shè)置為=4。此外,構(gòu)建經(jīng)典的ResNet-101主干網(wǎng)絡(luò)作為對(duì)比試驗(yàn)的基準(zhǔn)。

      1.3.3 UperNet語義分割框架

      本文采用UperNet統(tǒng)一感知解析網(wǎng)絡(luò)[33]作為語義分割的實(shí)現(xiàn)框架,所構(gòu)建的模型結(jié)構(gòu)如圖3所示。該語義分割架構(gòu)的特征提取器設(shè)定為基于Swin Transformer主干的特征金字塔網(wǎng)絡(luò)(Feature Pyramid Network,F(xiàn)PN)。它利用Swin Transformer獲取的多層次特征表示對(duì)應(yīng)的金字塔層級(jí),使用具有橫向連接的自上而下的FPN體系結(jié)構(gòu),下采樣比率與Swin Transformer保持一致。金字塔池化模塊[34](Pyramid Pooling Module,PPM)位于FPN自上而下的分支之前,并與Swin Transformer網(wǎng)絡(luò)的階段4相連接,PPM能夠帶來有效的全局先驗(yàn)特征表達(dá),與FPN結(jié)構(gòu)高度兼容。該架構(gòu)形式可與Swin Transformer獲取的分層特征表達(dá)有效配合,基于高中低層語義信息的融合達(dá)到更好的語義分割效果。特征融合模塊通過雙線性插值將FPN輸出的所有層次特征調(diào)整到同一大小,然后應(yīng)用卷積層融合來自不同級(jí)別的特征。目標(biāo)分割頭被附加到融合特征圖上,每個(gè)分類器前都有一個(gè)單獨(dú)的卷積層。所有額外的非分類器卷積層都具有512通道輸出的批量歸一化[35],并應(yīng)用ReLU[36]激活函數(shù)。模型輸出是由像素分類預(yù)測(cè)標(biāo)簽所生成的類別掩膜,進(jìn)而得到分割圖。由此實(shí)現(xiàn)玉米田間圖像的細(xì)?;评?,在識(shí)別目標(biāo)的同時(shí)獲取目標(biāo)區(qū)域的精細(xì)分割。

      圖3 UperNet語義分割框架

      1.4 基于語義分割結(jié)果的雜草識(shí)別算法

      玉米幼苗與雜草的植物屬性,決定了兩者在田間圖像中具有相似的顏色特征表達(dá)。這增加了兩者的識(shí)別與分割難度,但是根據(jù)該特性,從圖像中分割出全部的植物區(qū)域是容易的。基于語義分割模型對(duì)玉米的精細(xì)分割結(jié)果,可利用圖像形態(tài)學(xué)處理從全部植物區(qū)域中進(jìn)一步快速分割出所有雜草。在該思路基礎(chǔ)上對(duì)實(shí)現(xiàn)細(xì)節(jié)進(jìn)行改進(jìn)調(diào)整以提高雜草分割效果,本文提出了基于語義分割結(jié)果的雜草識(shí)別與分割算法,流程如圖4所示:1)對(duì)原始圖像進(jìn)行超綠特征分割。計(jì)算歸一化的超綠特征分量作為灰度值,并結(jié)合最大類間方差法實(shí)現(xiàn)二值化,提取出包含全部植物區(qū)域的分割掩膜;2)刪除掩膜中的玉米區(qū)域。對(duì)語義分割模型推理出的玉米掩膜進(jìn)行輕度膨脹修正邊界后,將植物分割掩膜中對(duì)應(yīng)于玉米掩膜位置的像素值置0,生成僅包含全部雜草區(qū)域的分割掩膜;3)優(yōu)化雜草區(qū)域的分割掩膜。對(duì)雜草分割掩膜進(jìn)行形態(tài)學(xué)閉運(yùn)算,消除掩膜內(nèi)可能存在的細(xì)小孔隙。再進(jìn)行面積濾波,去除掩膜中的噪聲區(qū)域。最后對(duì)掩膜進(jìn)行膨脹處理,優(yōu)化掩膜的區(qū)域形態(tài);4)最終獲得雜草掩膜與雜草分割圖。

      1.5 模型訓(xùn)練與性能評(píng)價(jià)

      1.5.1 試驗(yàn)平臺(tái)配置

      全部模型的訓(xùn)練與測(cè)試均在本研究搭建的試驗(yàn)平臺(tái)上完成,保證了對(duì)比條件的一致性。該平臺(tái)的主要硬件配置:中央處理器(CPU)為AMD R5 3600X,主頻3.8 GHz;運(yùn)算內(nèi)存64 GB;圖形處理器(GPU)為NVIDIA GeForce RTX 2080Ti,顯存11 GB。主要軟件環(huán)境為:Ubuntu 20.04操作系統(tǒng),Pytorch 1.6深度學(xué)習(xí)框架,CUDA 10.2通用并行計(jì)算架構(gòu),cuDNN 8.0.4用于深度神經(jīng)網(wǎng)絡(luò)的GPU加速庫(kù),Python 3.8編程語言,OpenCV 4.5.1計(jì)算機(jī)視覺庫(kù)。

      圖4 雜草識(shí)別算法流程

      1.5.2 模型訓(xùn)練策略

      模型訓(xùn)練為端到端進(jìn)行,輸入為原始圖像,輸出為對(duì)應(yīng)的識(shí)別分割圖,中間過程無人為干預(yù)。Swin Transformer主干網(wǎng)絡(luò)和語義分割框架通過解碼器-編碼器(Encoder-Decoder)結(jié)構(gòu)組合成整體模型,同時(shí)進(jìn)行訓(xùn)練。主干網(wǎng)絡(luò)作為解碼器負(fù)責(zé)特征變換與提取,除預(yù)訓(xùn)練數(shù)據(jù)集和目標(biāo)數(shù)據(jù)集外不需要額外的監(jiān)督信號(hào)訓(xùn)練。語義分割框架作為編碼器對(duì)主干網(wǎng)絡(luò)輸出特征進(jìn)行重構(gòu)融合,并以此為依據(jù)產(chǎn)生分類預(yù)測(cè)。

      模型通過遷移學(xué)習(xí)方式完成訓(xùn)練。主干網(wǎng)絡(luò)中各層采用在ImageNet-1K數(shù)據(jù)集上預(yù)先訓(xùn)練的權(quán)重初始化[37],隨后連同語義分割框架權(quán)重一起在本文數(shù)據(jù)集上進(jìn)行調(diào)整,使模型更快收斂。訓(xùn)練微調(diào)的具體步驟為:1)設(shè)定隨機(jī)種子初始化語義分割框架各層參數(shù)權(quán)重,給定隨機(jī)初值;2)凍結(jié)主干網(wǎng)絡(luò)的預(yù)訓(xùn)練權(quán)重,同時(shí)設(shè)定框架各層學(xué)習(xí)率為下述標(biāo)準(zhǔn)設(shè)置的10倍,在目標(biāo)數(shù)據(jù)集上對(duì)模型進(jìn)行訓(xùn)練,快速調(diào)整框架權(quán)重;3)主干網(wǎng)絡(luò)和框架的學(xué)習(xí)率均采用下文標(biāo)準(zhǔn)設(shè)置,整體在目標(biāo)數(shù)據(jù)集上進(jìn)行訓(xùn)練,通過反向傳播同時(shí)調(diào)整全部參數(shù)權(quán)重。ImageNet-1K是遷移學(xué)習(xí)中通用的大規(guī)模圖像集,有效性在大量已有研究中被證實(shí)[21]。在本文任務(wù)中,其豐富的類別(包括植物大類)以及圖像數(shù)量可直接將主干網(wǎng)絡(luò)參數(shù)預(yù)訓(xùn)練到一個(gè)相對(duì)最優(yōu)的權(quán)值空間,且極大增強(qiáng)提取特征的泛化性。在此基礎(chǔ)上,利用目標(biāo)數(shù)據(jù)的類別特點(diǎn)微調(diào)權(quán)值,更易達(dá)到全局最優(yōu)解,兼顧特征差異性與泛化性。為進(jìn)一步保證微調(diào)效果,本文通過前述數(shù)據(jù)增強(qiáng)方法大規(guī)模擴(kuò)充目標(biāo)訓(xùn)練數(shù)據(jù),以使網(wǎng)絡(luò)學(xué)習(xí)到足夠的目標(biāo)特征,充分適應(yīng)本文任務(wù)。預(yù)試驗(yàn)顯示,采用上述微調(diào)方法和單一訓(xùn)練相比,模型的像素識(shí)別準(zhǔn)確率可高出3%~5%。

      綜合考慮物理內(nèi)存與學(xué)習(xí)效率,設(shè)置每批次訓(xùn)練圖像為2幅,總迭代次數(shù)為20 000。在訓(xùn)練中,模型采用AdamW優(yōu)化器[38]、線性學(xué)習(xí)速率衰減的調(diào)度器以及1 500次迭代的線性預(yù)熱。當(dāng)前迭代的學(xué)習(xí)率的更新計(jì)算方法如下:

      式中0為初始學(xué)習(xí)率,為當(dāng)前迭代次數(shù),為衰減周期即總迭代次數(shù),為多項(xiàng)式衰減指數(shù)(Power)。初始學(xué)習(xí)率和多項(xiàng)式衰減指數(shù)分別設(shè)置為6×10-5和1。本文使用0.01的權(quán)重衰減(Weight decay)和0.9的動(dòng)量(Momentum),學(xué)習(xí)率更新下限為0。全部模型均在上述標(biāo)準(zhǔn)設(shè)置上進(jìn)行訓(xùn)練。

      采用交叉熵?fù)p失函數(shù)(Cross-Entropy Loss)衡量訓(xùn)練過程中模型對(duì)于像素類別的預(yù)測(cè)概率分布和真實(shí)標(biāo)簽類別概率分布之間的距離,具體計(jì)算方法如下:

      1.5.3 模型評(píng)價(jià)指標(biāo)

      為評(píng)價(jià)模型性能,本文使用mIoU和平均像素準(zhǔn)確率(Mean Pixel Accuracy,mPA)作為模型識(shí)別與分割效果的量化評(píng)價(jià)指標(biāo)。采用FPS指標(biāo)評(píng)價(jià)模型的推理速度。

      2 結(jié)果與分析

      2.1 不同模型的訓(xùn)練表現(xiàn)

      在UperNet框架結(jié)構(gòu)下分別建立基于不同改進(jìn)主干網(wǎng)絡(luò)的語義分割模型,分別記為Swin-Large-UN、Swin-Base-UN、Swin-Small-UN、Swin-Tiny-UN、Swin-Nano-UN和ResNet-101-UN。不同模型在訓(xùn)練過程中的損失函數(shù)變化曲線如圖5所示。

      注:UN代表UperNet框架。

      在試驗(yàn)中未能獲得Swin-Large-UN的損失曲線,原因是較大的模型結(jié)構(gòu)導(dǎo)致GPU運(yùn)算發(fā)生顯存溢出,無法完成訓(xùn)練,即使能夠達(dá)到更高精度水平,也不符合研究應(yīng)用需要,故首先排除。在圖5中,Swin-Base-UN、Swin-Small-UN和Swin-Tiny-UN均能達(dá)到良好的訓(xùn)練效果,網(wǎng)絡(luò)收斂情況良好,終止迭代時(shí)平均損失值約為7.74′10-3,相比于ResNet-101-UN(0.01)更低。三者的損失函數(shù)變化情況基本相同,說明Swin-Small-UN和Swin-Tiny-UN保持了Swin-Base-UN原有的數(shù)據(jù)學(xué)習(xí)能力,改進(jìn)后訓(xùn)練表現(xiàn)未受影響。Swin-Nano-UN的訓(xùn)練損失在快速下降到0.70左右時(shí)出現(xiàn)停滯并不斷波動(dòng),最終損失值為0.72,同Swin-Base-UN等相比高2個(gè)數(shù)量級(jí)。說明該模型的深度和參數(shù)量難以匹配本文任務(wù),對(duì)于數(shù)據(jù)特征的擬合與泛化能力不足,難以收斂到全局最優(yōu)點(diǎn)。

      2.2 不同模型的驗(yàn)證表現(xiàn)

      為排除隨機(jī)性影響,對(duì)模型訓(xùn)練表現(xiàn)進(jìn)行5次重復(fù)驗(yàn)證,每次試驗(yàn)均調(diào)整隨機(jī)種子生成各模型語義分割框架的參數(shù)權(quán)值,隨后按前文所述方法完成訓(xùn)練。于訓(xùn)練過程中,每2 000次迭代對(duì)各模型性能進(jìn)行1次驗(yàn)證評(píng)估。采用模型在驗(yàn)證集上的mIoU和mPA作為評(píng)價(jià)指標(biāo),取5次試驗(yàn)的平均值。各性能指標(biāo)隨迭代變化情況如圖6所示。

      圖6 不同模型的驗(yàn)證集表現(xiàn)

      由圖6a可知,Swin-Base-UN、Swin-Small-UN和Swin-Tiny-UN同ResNet-101-UN相比,初次驗(yàn)證時(shí)的mIoU均有不同程度提高,最終驗(yàn)證時(shí)的mIoU分別提高了4.32個(gè)百分點(diǎn)、4.51個(gè)百分點(diǎn)和3.08個(gè)百分點(diǎn)。表明這三者對(duì)于驗(yàn)證集的區(qū)域識(shí)別與分割表現(xiàn)均優(yōu)于ResNet-101-UN。Swin-Nano-UN的表現(xiàn)與對(duì)比模型相當(dāng)。由圖6b可知,Swin-Base-UN、Swin-Small-UN、Swin-Tiny-UN和Swin-Nano-UN同ResNet-101-UN相比,初次驗(yàn)證時(shí)的mPA均有所提高,最終驗(yàn)證時(shí)的mPA分別提高了5.56個(gè)百分點(diǎn)、5.29個(gè)百分點(diǎn)、4.86個(gè)百分點(diǎn)和2.15個(gè)百分點(diǎn)。表明本文各模型對(duì)于驗(yàn)證集的像素識(shí)別精度均優(yōu)于ResNet-101-UN,但Swin-Nano-UN相對(duì)其他變體模型性能較弱。

      在變體模型中,Swin-Small-UN和Swin-Tiny-UN在mIoU和mPA指標(biāo)上均可達(dá)到同Swin-Base-UN更為接近的驗(yàn)證集表現(xiàn),說明這2種改進(jìn)變體能夠在更精簡(jiǎn)的模型結(jié)構(gòu)下達(dá)到與基礎(chǔ)模型相近的訓(xùn)練成效,符合模型改進(jìn)目標(biāo)。而Swin-Nano-UN的驗(yàn)證集表現(xiàn)同基礎(chǔ)模型相比不夠理想,精度衰減較大。驗(yàn)證結(jié)果表明,Swin-Small-UN和Swin-Tiny-UN在本文任務(wù)中可替代Swin-Base-UN,且建模能力遠(yuǎn)超基于傳統(tǒng)主干的ResNet-101-UN模型。

      2.3 不同模型的測(cè)試結(jié)果

      利用測(cè)試集數(shù)據(jù)對(duì)完成訓(xùn)練后的全部模型進(jìn)行測(cè)試,考察本文模型的實(shí)際泛化性能與快速性,結(jié)果如表3所示。在綜合對(duì)比mIoU、mPA和推理速度的基礎(chǔ)上,確定最佳模型結(jié)構(gòu)。

      表3 不同模型的測(cè)試集表現(xiàn)

      由表3可知,Swin-Base-UN、Swin-Small-UN和Swin-Tiny-UN相比于ResNet-101-UN,mIoU分別提高了3.98個(gè)百分點(diǎn)、3.93個(gè)百分點(diǎn)和3.27個(gè)百分點(diǎn),mPA分別提高了5.23個(gè)百分點(diǎn)、4.68個(gè)百分點(diǎn)和4.71個(gè)百分點(diǎn),推理速度分別提高了0.98%、7.16%和24.36%。表明這三者對(duì)于本文任務(wù)有更強(qiáng)的實(shí)際泛化性能,在區(qū)域分割準(zhǔn)確性、像素識(shí)別精度和推理速度上全面優(yōu)于傳統(tǒng)模型。Swin-Nano-UN相比于ResNet-101-UN,推理速度提高了31.85%,但mIoU降低了1.24個(gè)百分點(diǎn),在主要精度指標(biāo)上的實(shí)際泛化表現(xiàn)未高于傳統(tǒng)模型。

      在本文所構(gòu)建的4個(gè)模型中,Swin-Base-UN具有最高的mIoU和mPA,但其速度最慢。Swin-Small-UN的準(zhǔn)確性同Swin-Base-UN更為接近,但快速性提升有限。Swin-Tiny-UN在推理速度上獲得了有效提升(18.94幀/s),同Swin-Base-UN和Swin-Small-UN相比分別提高了23.15%和16.05%。雖然其識(shí)別與分割準(zhǔn)確性在4個(gè)模型中未達(dá)最高(94.83%),但是同最準(zhǔn)確的Swin-Base-UN相比,mIoU和mPA僅降低了0.71個(gè)百分點(diǎn)和0.52個(gè)百分點(diǎn),差距較小。Swin-Nano-UN達(dá)到了最快的推理速度,在Swin-Tiny-UN的基礎(chǔ)上提高了6.02%,但mIoU和mPA降低了4.51個(gè)百分點(diǎn)和3.82個(gè)百分點(diǎn),其主要精度指標(biāo)僅能達(dá)到90%左右,差距較大。

      在本文任務(wù)中,期望能夠在保證模型準(zhǔn)確率的前提下盡可能提升推理速度。Swin-Tiny-UN模型在推理速度上的提升同準(zhǔn)確率上的細(xì)微差距相比是更明顯的。繼續(xù)減少參數(shù)量到Swin-Nano-UN會(huì)導(dǎo)致模型實(shí)際泛化能力不足,精度大幅降低。因此Swin-Tiny-UN模型達(dá)到了最佳的精度-速度平衡,為最佳模型結(jié)構(gòu)。

      2.4 識(shí)別與分割效果

      為考察最佳模型對(duì)玉米田間圖像的實(shí)際分割效果,在測(cè)試集圖像上進(jìn)行推理,將玉米幼苗的分割掩膜可視化在原圖像上得到分割圖,部分樣本圖像的識(shí)別與分割結(jié)果如圖7中所示。

      注:圓圈區(qū)域指示了ResNet-101-UN與Swin-Tiny-UN分割圖的不同之處。

      在圖7a中,各原始圖像均為真實(shí)的玉米田間場(chǎng)景。對(duì)比圖7b和7c可知,Swin-Tiny-UN的分割效果同真實(shí)值間無明顯差異,與Swin-Base-UN相比亦基本相同,此處不再展示。這說明最佳模型Swin-Tiny-UN以更快的推理速度達(dá)到了與改進(jìn)前相同的識(shí)別分割效果。錯(cuò)誤分割應(yīng)主要集中在目標(biāo)邊界的個(gè)別像素上,對(duì)分割效果影響甚微,因此分割圖與真實(shí)值間僅存在難以觀察的像素級(jí)差異。進(jìn)一步對(duì)比圖7c和7d,可見ResNet-101-UN對(duì)玉米幼苗的中心區(qū)域和葉片末端容易產(chǎn)生錯(cuò)誤分割,這會(huì)導(dǎo)致雜草分割算法的誤判。上述分析表明,Swin-Tiny-UN模型在復(fù)雜田間場(chǎng)景中能夠?qū)δ繕?biāo)進(jìn)行準(zhǔn)確識(shí)別與分割,同傳統(tǒng)模型相比無論在整體還是局部均達(dá)到更好效果,為實(shí)現(xiàn)雜草識(shí)別與分割提供有力保證。

      基于最佳模型Swin-Tiny-UN的推理預(yù)測(cè),通過本文算法進(jìn)一步生成雜草分割圖,并對(duì)全目標(biāo)識(shí)別與分割效果進(jìn)行展示,結(jié)果如圖8所示。

      圖8 雜草識(shí)別與分割結(jié)果

      由圖8可知,本文算法能夠?qū)崿F(xiàn)雜草區(qū)域的有效識(shí)別,同時(shí)分割出全部目標(biāo)的區(qū)域邊界。對(duì)于不同的測(cè)試圖像,均達(dá)到較好效果。在完整保留玉米形態(tài)區(qū)域的同時(shí),分割掩膜基本完全覆蓋全部雜草區(qū)域,即使圖中一些較小的雜草也可識(shí)別并分割其所在區(qū)域。所得雜草和玉米的分割區(qū)域在具備各自形態(tài)的同時(shí)互不交疊,有效解決復(fù)雜田間場(chǎng)景中各目標(biāo)交疊、難以精確分割邊界的問題。此外算法簡(jiǎn)單高效,基本不會(huì)對(duì)模型推理速度造成影響,具備較強(qiáng)實(shí)時(shí)性。由此實(shí)現(xiàn)精確快速的復(fù)雜田間場(chǎng)景下雜草識(shí)別與分割。

      2.5 視頻測(cè)試結(jié)果

      為進(jìn)一步考察本文方法在實(shí)際應(yīng)用中的表現(xiàn),利用田間實(shí)地模擬作業(yè)移動(dòng)過程中采集的視頻數(shù)據(jù)進(jìn)行測(cè)試。視頻分辨率為768′432像素,設(shè)定視頻檢測(cè)中被正確分割像素?cái)?shù)占90%以上的幀為正確幀,正確幀數(shù)占總幀數(shù)的比例為正確檢測(cè)率。統(tǒng)計(jì)結(jié)果見表4。

      表4 視頻檢測(cè)性能

      由表4可知,平均視頻正確檢測(cè)率可達(dá)95.04%,對(duì)于每幀的平均檢測(cè)時(shí)間為5.51′10-2s。表明本文方法能夠?qū)μ镩g移動(dòng)作業(yè)過程中的視頻流進(jìn)行準(zhǔn)確的檢測(cè),同時(shí)具備較好的實(shí)時(shí)同步性能。從視頻中抽取部分幀的檢測(cè)結(jié)果,如圖9中所示。

      注:在視頻流中的隨機(jī)位置,以15~30幀的隨機(jī)間隔依次選取6幀圖像。

      對(duì)比圖9中檢測(cè)前后的視頻幀可知,本文方法可對(duì)視頻各幀中的玉米幼苗和雜草目標(biāo)進(jìn)行有效的識(shí)別與分割,檢測(cè)效果受視頻抖動(dòng)影響較小,與圖像測(cè)試效果基本保持一致,可用于移動(dòng)作業(yè)中的實(shí)時(shí)檢測(cè)。

      2.6 與相關(guān)研究的對(duì)比

      同文獻(xiàn)[15]的研究相比,本研究模型的學(xué)習(xí)為端到端進(jìn)行,直接通過輸入原始圖像得到識(shí)別與分割結(jié)果,無需人工設(shè)計(jì)與提取特征,具備更強(qiáng)的實(shí)際泛化能力。同文獻(xiàn)[18-20]的研究相比,本研究不是對(duì)僅含有單一種類的作物和雜草進(jìn)行識(shí)別分類,而是針對(duì)含有多類目標(biāo)的復(fù)雜田間圖像,識(shí)別并分割出圖像中的作物和雜草區(qū)域,更接近應(yīng)用實(shí)際。同文獻(xiàn)[22-23]的研究相比,本研究除同樣能夠進(jìn)行目標(biāo)檢測(cè)識(shí)別外,還可解決目標(biāo)交疊情況下的進(jìn)一步分割問題,實(shí)現(xiàn)對(duì)復(fù)雜田間圖像中作物和雜草的準(zhǔn)確識(shí)別與精細(xì)分割。同文獻(xiàn)[24-25]相比,本研究不需要融合圖像以及額外的近紅外數(shù)據(jù),僅通過常規(guī)的圖像數(shù)據(jù)即可達(dá)到更好的效果,在同樣的像素級(jí)識(shí)別與分割目標(biāo)中,模型的mIoU分別提高了5.92個(gè)百分點(diǎn)和7.25個(gè)百分點(diǎn)。同文獻(xiàn)[26]相比,本研究模型mIoU提高了11.39個(gè)百分點(diǎn)。同文獻(xiàn)[27]相比,本研究在相同的圖像分辨率和硬件條件下,推理速度提高了19.12%。綜上所述,本研究在當(dāng)前相關(guān)研究中具備一定的優(yōu)勢(shì)。

      3 結(jié) 論

      為實(shí)現(xiàn)復(fù)雜田間環(huán)境中玉米和雜草的有效識(shí)別與分割,探索具有更強(qiáng)實(shí)際應(yīng)用能力的識(shí)別方法,本研究提出了一種基于Swin Transformer統(tǒng)一感知解析網(wǎng)絡(luò)的田間玉米和雜草分割方法,精確識(shí)別目標(biāo)并實(shí)時(shí)獲取各自形態(tài)區(qū)域的精細(xì)分割。

      1)模型引入Swin Transformer主干網(wǎng)絡(luò),并采用UperNet高效語義分割框架,改進(jìn)Swin Transformer結(jié)構(gòu)生成4種不同性能的變體模型。通過對(duì)基礎(chǔ)模型Swin-Base-UN及其4種變體在訓(xùn)練、驗(yàn)證和測(cè)試當(dāng)中的性能表現(xiàn)進(jìn)行綜合對(duì)比分析,確定Swin-Tiny-UN為達(dá)到最佳精度-速度平衡模型,mIoU和mPA分別達(dá)到94.83%和97.18%,與ResNet-101-UN模型相比,分別提高3.27個(gè)百分點(diǎn)和4.71個(gè)百分點(diǎn),推理速度可達(dá)18.94幀/s。本文模型在區(qū)域分割精度、像素識(shí)別準(zhǔn)確性以及推理速度上全面優(yōu)于傳統(tǒng)語義分割模型。

      2)基于玉米形態(tài)區(qū)域分割結(jié)果,建立改進(jìn)的圖像形態(tài)學(xué)處理組合算法,實(shí)時(shí)識(shí)別并分割全部的雜草區(qū)域。本文方法在分割玉米的基礎(chǔ)上分割雜草,模型訓(xùn)練數(shù)據(jù)不含雜草像素標(biāo)注,緩解語義分割方法難以獲取大量像素級(jí)標(biāo)注數(shù)據(jù)的問題。圖像分割結(jié)果表明,本文方法能夠?qū)?fù)雜田間場(chǎng)景中的玉米和雜草進(jìn)行準(zhǔn)確快速的識(shí)別與分割,受目標(biāo)交疊影響較小。同本研究之前提出的方法相比,在精度提升的同時(shí),推理速度提高了19.12%。

      3)對(duì)于模擬田間移動(dòng)作業(yè)的視頻流數(shù)據(jù),本文方法的平均視頻正確檢測(cè)率可達(dá)95.04%,每幀平均檢測(cè)時(shí)間為5.51′10-2s。表明本文方法能夠?qū)μ镩g作業(yè)過程中的視頻流進(jìn)行玉米和雜草的識(shí)別檢測(cè),在實(shí)際應(yīng)用條件下有較好的準(zhǔn)確性和實(shí)時(shí)同步性。

      [1] Machleb J, Peteinatos G G, Kollenda B L, et al. Sensor-based mechanical weed control: Present state and prospects[J]. Computers and Electronics in Agriculture, 2020, 176: 105638.

      [2] 段小賀,韓建國(guó),巴金磊,等. 玉米田化學(xué)除草現(xiàn)狀及發(fā)展趨勢(shì)[J]. 園藝與種苗,2019,39(8):54-56.

      Duan Xiaohe, Han Jianguo, Ba Jinlei, et al. The current situation and development trend of chemical weeding in corn fields[J]. Horticulture and Seedlings, 2019, 39(8): 54-56. (in Chinese with English abstract)

      [3] Gaines T A, Busi R, Küpper A. Can new herbicide discovery allow weed management to outpace resistance evolution?[J]. Pest Management Science, 2021, 77(7): 3036-3041.

      [4] Saha D, Cregg B M, Sidhu M K. A review of non-chemical weed control practices in Christmas tree production[J]. Forests, 2020, 11(5): 554.

      [5] Muola A, Fuchs B, Laihonen M, et al. Risk in the circular food economy: glyphosate-based herbicide residues in manure fertilizers decrease crop yield[J]. Science of the Total Environment, 2021, 750: 141422.

      [6] 孫君亮,閆銀發(fā),李法德,等. 智能除草機(jī)器人的研究進(jìn)展與分析[J]. 中國(guó)農(nóng)機(jī)化學(xué)報(bào),2019,40(11):73-80.

      Sun Junliang, Yan Yinfa, Li Fade, et al. Research progress and analysis of intelligent weeding robot[J]. Journal of Chinese Agricultural Mechanization, 2019, 40(11): 73-80. (in Chinese with English abstract)

      [7] Gerhards R, Andujar S D, Hamouz P, et al. Advances in site‐specific weed management in agriculture: A review[J]. Weed Research, 2022, 62(2): 123-133.

      [8] Fennimore S A, Cutulle M. Robotic weeders can improve weed control options for specialty crops[J]. Pest Management Science, 2019, 75(7): 1767-1774.

      [9] Bauer M V, Marx C, Bauer F V, et al. Thermal weed control technologies for conservation agriculture: A review[J]. Weed Research, 2020, 60(4): 241-250.

      [10] Raja R, Nguyen T T, Slaughter D C, et al. Real-time robotic weed knife control system for tomato and lettuce based on geometric appearance of plant labels[J]. Biosystems Engineering, 2020, 194: 152-164.

      [11] 李道亮,李震. 無人農(nóng)場(chǎng)系統(tǒng)分析與發(fā)展展望[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2020,51(7):1-12.

      Li Daoliang, Li Zhen. System analysis and development prospect of unmanned farming[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51(7): 1-12. (in Chinese with English abstract)

      [12] Wang T, Xu X, Wang C, et al. From smart farming towards unmanned farms: A new mode of agricultural production[J]. Agriculture, 2021, 11(2): 145.

      [13] Wang A, Zhang W, Wei X. A review on weed detection using ground-based machine vision and image processing techniques[J]. Computers and Electronics in Agriculture, 2019, 158: 226-240.

      [14] Bakhshipour A, Jafari A. Evaluation of support vector machine and artificial neural networks in weed detection using shape features[J]. Computers and Electronics in Agriculture, 2018, 145: 153-160.

      [15] 苗榮慧,楊華,武錦龍,等. 基于圖像分塊及重構(gòu)的菠菜重疊葉片與雜草識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2020,36(4):178-184.

      Miao Ronghui, Yang Hua, Wu Jinlong, et al. Weed identification of overlapping spinach leaves based on image sub-block and reconstruction[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(4): 178-184. (in Chinese with English abstract)

      [16] Yu J, Sharpe S M, Schumann A W, et al. Deep learning for image-based weed detection in turfgrass[J]. European Journal of Agronomy, 2019, 104: 78-84.

      [17] Quan L, Jiang W, Li H, et al. Intelligent intra-row robotic weeding system combining deep learning technology with a targeted weeding mode[J]. Biosystems Engineering, 2022, 216: 13-31.

      [18] 孫俊,何小飛,譚文軍,等. 空洞卷積結(jié)合全局池化的卷積神經(jīng)網(wǎng)絡(luò)識(shí)別作物幼苗與雜草[J]. 農(nóng)業(yè)工程學(xué)報(bào),2018,34(11):159-165.

      Sun Jun, He Xiaofei, Tan Wenjun, et al. Recognition of crop seedling and weed recognition based on dilated convolution and global pooling in CNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(11): 159-165. (in Chinese with English abstract)

      [19] 趙輝,曹宇航,岳有軍,等. 基于改進(jìn) DenseNet 的田間雜草識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2021,37(18):136-142.

      Zhao Hui, Cao Yuhang, Yue Youjun, et al. Field weed recognition base d on improved DenseNet[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(18): 136-142. (in Chinese with English abstract)

      [20] Jiang H, Zhang C, Qiao Y, et al. CNN feature based graph convolutional network for weed and crop recognition in smart farming[J]. Computers and Electronics in Agriculture, 2020, 174: 105450.

      [21] Hasan A S M M, Sohel F, Diepeveen D, et al. A survey of deep learning techniques for weed detection from images[J]. Computers and Electronics in Agriculture, 2021, 184: 106067.

      [22] 彭明霞,夏俊芳,彭輝. 融合FPN的Faster R-CNN復(fù)雜背景下棉田雜草高效識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(20):202-209.

      Peng Mingxia, Xia Junfang, Peng Hui. Efficient recognition of cotton and weed in field based on Faster R-CNN by integrating FPN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(20): 202-209. (in Chinese with English abstract)

      [23] 孟慶寬,張漫,楊曉霞,等. 基于輕量卷積結(jié)合特征信息融合的玉米幼苗與雜草識(shí)別[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2020,51(12):238-245,303.

      Meng Qingkuan, Zhang Man, Yang Xiaoxia, et al. Recognition of maize seedling and weed based on light weight convolution and feature fusion [J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51(12): 238-245, 303. (in Chinese with English abstract)

      [24] Wang A, Xu Y, Wei X, et al. Semantic segmentation of crop and weed using an encoder-decoder network and image enhancement method under uncontrolled outdoor illumination[J]. IEEE Access, 2020, 8: 81724-81734.

      [25] 孫俊,譚文軍,武小紅,等. 多通道深度可分離卷積模型實(shí)時(shí)識(shí)別復(fù)雜背景下甜菜與雜草[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(12):184-190.

      Sun Jun, Tan Wenjun, Wu Xiaohong, et al. Real-time recognition of sugar beet and weeds in complex backgrounds using multi-channel depth-wise separable convolution model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(12): 184-190. (in Chinese with English abstract)

      [26] Khan A, Ilyas T, Umraiz M, et al. Ced-net: crops and weeds segmentation for smart farming using a small cascaded encoder-decoder architecture[J]. Electronics, 2020, 9(10): 1602.

      [27] 王璨,武新慧,張燕青,等. 基于雙注意力語義分割網(wǎng)絡(luò)的田間苗期玉米識(shí)別與分割[J]. 農(nóng)業(yè)工程學(xué)報(bào),2021,37(9):211-221.

      Wang Can, Wu Xinhui, Zhang Yanqing, et al. Recognition and segmentation of maize seedlings in field based on dual attention semantic segmentation network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(9): 211-221. (in Chinese with English abstract)

      [28] Takahashi R, Matsubara T, Uehara K. Data augmentation using random image cropping and patching for deep CNNs[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(9): 2917-2931.

      [29] Gao L, Liu H, Yang M, et al. STransFuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 10990-11003.

      [30] Zhou H Y, Lu C, Yang S, et al. ConvNets vs. Transformers: Whose Visual Representations are More Transferable?[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 2230-2238.

      [31] Xiao X, Zhang D, Hu G, et al. CNN–MHSA: A Convolutional Neural Network and multi-head self-attention combined approach for detecting phishing websites[J]. Neural Networks, 2020, 125: 303-312.

      [32] Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21(140): 1-67.

      [33] Xiao T, Liu Y, Zhou B, et al. Unified perceptual parsing for scene understanding[C]//Proceedings of the European Conference on Computer Vision (ECCV). Switzerland: Springer, 2018: 418-434.

      [34] Long X, Zhang W, Zhao B. PSPNet-SLAM: A Semantic SLAM Detect Dynamic Object by Pyramid Scene Parsing Network[J]. IEEE Access, 2020, 8: 214685-214695.

      [35] Wang J, Li S, An Z, et al. Batch-normalized deep neural networks for achieving fast intelligent fault diagnosis of machines[J]. Neurocomputing, 2019, 329: 53-65.

      [36] Jiang X, Pang Y, Li X, et al. Deep neural networks with elastic rectified linear units for object recognition[J]. Neurocomputing, 2018, 275: 1132-1139.

      [37] Morid M A, Borjali A, Del Fiol G. A scoping review of transfer learning research on medical image analysis using ImageNet[J]. Computers in Biology and Medicine, 2021, 128: 104115.

      [38] Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention[C]// International Conference on Machine Learning. New York: PMLR, 2021: 10347-10357.

      Recognizing weeds in maize fields using shifted window Transformer network

      Wang Can, Wu Xinhui※, Zhang Yanqing, Wang Wenjun

      (030801,)

      Weeds have been one of the main factors to affect the growth of crops in the seedling stage. Timely weeding is a necessary measure to ensure crop yield. An intelligent field weeding equipment can also be a promising potential deployment in the unmanned farm system at the current stage of intelligent agriculture. Effective recognition of crops and weeds has been a high demand to promote the development of intelligent weeding equipment. Previous research was focused mainly on object detection and semantic segmentation using deep learning. A great challenge is still remained in the performance of target detection, in the case of overlap images between the crops and weeds under the complex field. The reason was that the different target areas cannot be further divided when the generated anchor box overlaps in a large area. The pixel level annotation can also be required to train the semantic segmentation, where the data samples cannot be easy to obtain. The weak real-time performance cannot be conducive to practical application. In this study, an improved model was proposed using shifted window Transformer (Swin Transformer) network, in order to enhance the accuracy and real-time performance of crop and weed recognition. The specific procedure was as follows. 1) A semantic segmentation model of corn was established for the real and complex field scene. The backbone of the model was the Swin Transformer architecture, which was denoted by Swin-Base. The full self-attention mechanism was also adopted to significantly enhance the modeling ability in the Swin Transformer using the shift window division configuration. Self-attention was then calculated locally in the non-overlapping window of the segmented image block, where the cross-window connection was allowed. The computational complexity of the backbone presented a linear relationship with the image size, thereby elevating the inference speed of the model. The hierarchical feature representation was constructed through the Swin Transformer for the dense prediction of the model at the pixel level. 2) The Unified perceptual parsing Network (UperNet) was used as an efficient semantic segmentation framework. Among them, the feature extractor was the Feature Pyramid Network (FPN) using the Swin Transformer backbone. The multi-level features obtained by Swin Transformer were used by the FPN to represent the corresponding pyramid level. An effective global prior feature expression was added in the Pyramid Pooling Module (PPM). Better performance of semantic segmentation was achieved using the fusion of the hierarchical semantic information. The Swing transformer backbone and UperNet framework were combined into one model through the Decoder-Encoder structure, denoted by Swin-Base-UN. 3) The structure of the Swin-Base backbone was improved to enhance the inference speed. The number of network parameters and calculation cost were reduced to decrease the number of hidden layer channels, headers, and Swin Transformer blocks. Therefore, four improved models were generated, including the Swin-Large-UN, Swin-Small-UN, Swin-Tiny-UN, and Swin-Nano-UN. The model size and computational complexity of improved models were about 2, 1/2, 1/4, and 1/8 times of Swin-Base-UN, respectively. 4) Taking the segmentation of corn morphological region as the case study, an improved image morphological processing combination was established to recognize and segment all the weed regions in real time. The segmentation of corn was also used to segment the weeds. The weed pixel annotation was removed from the training data of the model. As such, a large number of annotation data at the pixel level was obtained in the semantic segmentation of the improved model, compared with the original one. A comparison was made on the performance of all models in training, validation, and testing. Consequently, the Swin-Tiny-UN was determined as the best model to achieve the optimal balance between accuracy and speed. Specifically, the mean Intersection over Union (mIoU) and mean Pixel Accuracy (mPA) on the test set were 94.83% and 97.18%, respectively, which increased by 3.27 and 4.71 percentage points, respectively, compared with the RestNet-101-UN using traditional Convolutional Neural Networks (CNN) backbone. The inference speed of the model was achieved by 18.94 frames/s. The best model of semantic segmentation was superior to the traditional one, in terms of the region segmentation accuracy, pixel recognition accuracy, and inference speed. The image segmentation showed that the improved model can be expected to accurately recognize and segment maize and weeds in complex field scenes. The average correct detection rate of the improved model was 95.04% for the video stream data in the process of field work, whereas, the average detection time per frame was 5.51′10-2s. Consequently, the improved model can be expected to detect the corn and weeds in the process of field work, indicating higher accuracy and real-time performance under practical application conditions. The findings can provide a strong reference for the development of intelligent weeding equipment.

      crops; object recognition; image segmentation; semantic segmentation; maize; weed recognition

      10.11975/j.issn.1002-6819.2022.15.014

      TP274; TP391.41

      A

      1002-6819(2022)-15-0133-10

      王璨,武新慧,張燕青,等. 基于移位窗口Transformer網(wǎng)絡(luò)的玉米田間場(chǎng)景下雜草識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2022,38(15):133-142.doi:10.11975/j.issn.1002-6819.2022.15.014 http://www.tcsae.org

      Wang Can, Wu Xinhui, Zhang Yanqing, et al. Recognizing weeds in maize fields using shifted window Transformer network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(15): 133-142. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2022.15.014 http://www.tcsae.org

      2022-02-23

      2022-07-23

      山西省基礎(chǔ)研究計(jì)劃項(xiàng)目(202103021223147);山西省高等學(xué)校科技創(chuàng)新項(xiàng)目(2020L0134);山西農(nóng)業(yè)大學(xué)科技創(chuàng)新基金項(xiàng)目(2018YJ44)

      王璨,博士,副教授,研究方向?yàn)橹悄苻r(nóng)業(yè)裝備關(guān)鍵技術(shù)及應(yīng)用。Email:wangcan8206@163.com

      武新慧,博士,副教授,研究方向?yàn)檗r(nóng)業(yè)生物力學(xué)與智能農(nóng)業(yè)機(jī)械。Email:wuxinhui0321@163.com

      猜你喜歡
      雜草田間語義
      春日田間
      拔雜草
      田間地頭“惠”果農(nóng)
      “碼”上辦理“田間一件事”
      田間地頭有了“新綠”
      金橋(2020年9期)2020-10-27 01:59:34
      語言與語義
      “上”與“下”語義的不對(duì)稱性及其認(rèn)知闡釋
      水稻田幾種難防雜草的防治
      認(rèn)知范疇模糊與語義模糊
      雜草圖譜
      突泉县| 咸宁市| 巴南区| 英吉沙县| 邮箱| 宜昌市| 新郑市| 临邑县| 会昌县| 桂平市| 赤水市| 白银市| 杂多县| 常山县| 平安县| 唐山市| 三门峡市| 顺平县| 伊川县| 观塘区| 万全县| 武邑县| 高青县| 瓮安县| 体育| 印江| 五寨县| 崇明县| 樟树市| 濮阳县| 石渠县| 那坡县| 秦安县| 新田县| 赫章县| 合川市| 宜都市| 翁牛特旗| 左云县| 宜丰县| 米脂县|