胡曉華,劉 偉,劉長(zhǎng)虹,錢赟惠
?
基于太赫茲光譜和支持向量機(jī)快速鑒別咖啡豆產(chǎn)地
胡曉華1,劉 偉2,3※,劉長(zhǎng)虹3,錢赟惠3
(1. 合肥工業(yè)大學(xué)計(jì)算機(jī)與信息學(xué)院,合肥 230009; 2. 合肥學(xué)院機(jī)器視覺與智能控制實(shí)驗(yàn)室,合肥 230601; 3. 合肥工業(yè)大學(xué)食品科學(xué)與工程學(xué)院,合肥 230009)
結(jié)合太赫茲時(shí)域光譜技術(shù)和支持向量機(jī)對(duì)3種典型產(chǎn)地的咖啡豆進(jìn)行了鑒別。選取埃塞俄比亞(Ethiopia)、哥斯達(dá)黎加(Costa Rica)以及印度尼西亞(Indonesia)3個(gè)產(chǎn)地咖啡豆樣品進(jìn)行壓片處理,采用太赫茲透射模式獲取樣品的時(shí)域和頻域光譜信號(hào),并用主成分分析法對(duì)太赫茲頻域光譜信號(hào)進(jìn)行分析;構(gòu)造了基于粒子群(partical swarm optimization,PSO)參數(shù)尋優(yōu)的支持向量機(jī)(support vector machine,SVM)鑒別模型,模型對(duì)不同產(chǎn)地咖啡豆樣品的綜合識(shí)別正確率達(dá)到95%。試驗(yàn)結(jié)果表明,太赫茲作為新型的檢測(cè)手段結(jié)合模式識(shí)別方法可用于咖啡豆的產(chǎn)地鑒別。該文為一類在太赫茲波段下沒有明顯特征吸收峰的農(nóng)產(chǎn)品/食品安全檢測(cè)和產(chǎn)地追溯研究提供了一種快速、準(zhǔn)確的方法。
光譜學(xué);模型;支持向量機(jī);咖啡豆;太赫茲;粒子群算法
咖啡是世界3大飲料作物之一,其產(chǎn)量、銷售量、消費(fèi)量均居世界3大飲料植物之首。近年來中國(guó)咖啡的進(jìn)口量增長(zhǎng)迅速,年均增長(zhǎng)率超過10%,已成為重要的大宗進(jìn)口消費(fèi)品??Х榷故侵谱骺Х鹊闹饕牧?,目前世界上咖啡豆的種植主要集中在拉丁美洲、非洲、亞洲等的熱帶發(fā)展中國(guó)家,如印度利西亞、埃塞俄比亞、巴西、哥倫比亞、哥斯達(dá)黎加等。不同產(chǎn)地的咖啡豆其外觀色澤、氣味以及內(nèi)部化學(xué)成分存在較大差異,是影響咖啡品質(zhì)的重要因素[1-3]。目前,咖啡豆的產(chǎn)地鑒別主要采用人工感官評(píng)定法或化學(xué)分析法[4-6],存在方法繁瑣、主觀性強(qiáng)、效率低下等缺點(diǎn)。因此,如何快速準(zhǔn)確地鑒別咖啡豆產(chǎn)地,保障咖啡品質(zhì),規(guī)范咖啡市場(chǎng),是中國(guó)咖啡產(chǎn)業(yè)亟待解決的重要問題之一。
太赫茲(Terahertz,THz)是指頻率在0.1~10 THz范圍內(nèi)的電磁波,研究表明,大量有機(jī)大分子(DNA、蛋白質(zhì)等)的振動(dòng)能級(jí)和轉(zhuǎn)動(dòng)能級(jí)之間的躍遷在THz波段,因此太赫茲光譜包含了檢測(cè)對(duì)象豐富的物理、化學(xué)和構(gòu)象信息[7-11]。近年來太赫茲時(shí)域光譜(THz-TDS)技術(shù)作為一種迅速發(fā)展的無損檢測(cè)新技術(shù),因其具有穿透能力強(qiáng)、安全性好、靈敏度高和動(dòng)態(tài)范圍寬等特點(diǎn),在食品安全檢測(cè)以及農(nóng)產(chǎn)品質(zhì)量控制等方面表現(xiàn)出了較強(qiáng)的技術(shù)優(yōu)勢(shì)和廣泛的應(yīng)用前景[12-19]。但目前太赫茲在農(nóng)產(chǎn)品/食品領(lǐng)域的研究多是針對(duì)具有特征吸收峰的單一化學(xué)成分的檢測(cè),在沒有光譜特征吸收峰的復(fù)雜生物體系中,太赫茲光譜特征往往分布于某些波段范圍內(nèi),會(huì)造成光譜特征的高維性和不確定性等問題。因此,應(yīng)用太赫茲時(shí)域光譜技術(shù)進(jìn)行農(nóng)產(chǎn)品/食品這一復(fù)雜生物體的檢測(cè)尚處于探索階段。
本文針對(duì)咖啡豆產(chǎn)地的快速鑒別問題,應(yīng)用太赫茲時(shí)域光譜系統(tǒng)獲取典型產(chǎn)地咖啡豆樣品在太赫茲波段下的時(shí)域和頻域光譜信息,通過主成分分析(principal component analysis,PCA)法降低光譜特征維度,通過粒子群(partical swarm optimization,PSO)算法進(jìn)行模型參數(shù)優(yōu)化,采用支持向量機(jī)(support vector machine,SVM)構(gòu)建基于太赫茲光譜技術(shù)的鑒別模型,以期為咖啡豆產(chǎn)地的快速鑒別提供一種新方法,同時(shí)為太赫茲在農(nóng)產(chǎn)品/食品中的檢測(cè)應(yīng)用做出探索。
1.1 試驗(yàn)裝置及原理
設(shè)備采用TAS7500TS HF1 THz光譜系統(tǒng)(Advantest Co., Ltd, JAPAN),儀器光路示意圖如圖1所示。試驗(yàn)采用透射模式,激光脈沖射出后經(jīng)分光分束器CBS分為泵浦光與探測(cè)光。泵浦光入射至砷化鎵(GaAs)襯底的光電導(dǎo)天線上,激發(fā)THz輻射;探測(cè)光與THz脈沖一同聚焦在電光晶體碲化鋅(ZnTe)上,其中THz脈沖會(huì)被吸收同時(shí)受到色散效應(yīng)影響發(fā)生幅值和相位的變化,包含樣品信息的THz波將聚焦在探測(cè)晶體上。系統(tǒng)通過掃描獲取THz脈沖和探測(cè)激光脈沖的相對(duì)時(shí)間延遲,利用探測(cè)光的光電效應(yīng)對(duì)THz脈沖電場(chǎng)強(qiáng)度進(jìn)行取樣測(cè)量,從而獲取測(cè)量樣品的THz時(shí)域信號(hào)波形,經(jīng)快速傅里葉變換(FFT)得到頻域信號(hào)。TAS7500TS HF1的頻率范圍為0.1~4 THz,光譜分辨率為7.6 GHz,激光發(fā)射器平均功率20 mW,脈沖中心波長(zhǎng)為1 550 nm,脈沖寬度為50 fs,激光重復(fù)率為50 MHz±200 Hz。試驗(yàn)在室溫下進(jìn)行,溫度為25 ℃,試驗(yàn)全過程使用空氣壓縮泵對(duì)測(cè)量的空間環(huán)境進(jìn)行干燥,減少空氣中水分對(duì)測(cè)量結(jié)果影響,提高信噪比。
1.2 試驗(yàn)材料
選取由星巴克合肥分公司提供的3個(gè)典型產(chǎn)區(qū)(埃塞俄比亞(Ethiopia)、哥斯達(dá)黎加(Costa Rica)以及印度尼西亞(Indonesia))的咖啡豆為試驗(yàn)樣品。所有樣品均在干燥器中存放(密封、避光的環(huán)境中)。試驗(yàn)編程軟件采用Matlab 2011a,試驗(yàn)前將埃塞俄比亞、印度尼西亞以及哥斯達(dá)黎加3個(gè)產(chǎn)地的咖啡豆樣本各隨機(jī)抽取40個(gè)共120個(gè)作為建模集,剩余各20個(gè)共60個(gè)作為預(yù)測(cè)集。使用粉碎機(jī)對(duì)不同產(chǎn)地的所有咖啡豆樣品進(jìn)行粉碎預(yù)處理,粉碎后的樣品經(jīng)孔徑0.074 mm的篩子過濾,然后使用壓片機(jī)將粉末樣品進(jìn)行壓片處理,用10 MPa的壓力壓制成厚度約為1 mm,直徑為13 mm、內(nèi)部均勻、上下表面互相平行的薄片,每種咖啡豆各制成60個(gè)壓片樣品。
1.3 光譜獲取與分析
試驗(yàn)前將TAS7500TS HF1預(yù)熱半小時(shí),以鋼制背景板為系統(tǒng)標(biāo)定板,調(diào)節(jié)螺旋測(cè)微器獲取最佳焦點(diǎn)。將壓制好的樣品片放置于TAS7500TS HF1系統(tǒng)的聚乙烯樣品臺(tái)上,掃描得到樣品的透射光譜圖像。為減少測(cè)量誤差,對(duì)同一樣品均從不同位置測(cè)量3次,取平均值作為樣品的光譜信號(hào),3種咖啡豆樣品及鋼制背景的時(shí)域光譜圖如圖2所示。
從圖2a可以看出,3種樣品與背景板的太赫茲時(shí)域光譜信號(hào)在幅值與相位上均有明顯差異,但樣品之間的差異相對(duì)較小。在太赫茲透射的頻域幅值上,哥斯達(dá)黎加咖啡豆的幅值整體上高于埃塞俄比亞和印度尼西亞的咖啡豆,而埃塞俄比亞與印度尼西亞咖啡豆在幅值上相當(dāng);在相位上,相對(duì)于參考背景哥斯達(dá)黎加咖啡豆約在16.3 ps產(chǎn)生波峰,滯后最小,印度尼西亞咖啡豆約在16.9 ps產(chǎn)生波峰,埃塞俄比亞咖啡豆約在17.3 ps產(chǎn)生波峰,滯后最大。通過對(duì)時(shí)域光譜信號(hào)進(jìn)行快速傅里葉變換(FFT)得到樣品的太赫茲透射頻域光譜,如圖2b所示。由圖2b可知,太赫茲信號(hào)的有效光譜頻域區(qū)域位于0.2~1.5 THz內(nèi),3種咖啡豆的頻譜曲線趨勢(shì)一致,不同頻率點(diǎn)下的透射能量有所差異,但在部分頻率點(diǎn)上存在交叉,與很多復(fù)雜生物體一樣,咖啡豆也沒有明顯特征吸收峰[20-21]。在0.2~1.5 THz范圍內(nèi),共有171個(gè)頻率點(diǎn),高維的太赫茲光譜特征在帶來豐富信息的同時(shí),部分與樣品品質(zhì)相關(guān)性較弱甚至無關(guān)的信息會(huì)影響建模效果。因此,本文首先應(yīng)用主成分分析方法,對(duì)不同產(chǎn)地咖啡豆的太赫茲光譜特征進(jìn)行降維并對(duì)鑒別效果進(jìn)行定性分析。
1.4 基于主成分的咖啡豆產(chǎn)地鑒別分析
主成分分析法是對(duì)多個(gè)變量間相關(guān)性進(jìn)行分析的一種多元統(tǒng)計(jì)方法,通過正交變換將一組可能存在相關(guān)性的變量轉(zhuǎn)換為一組線性不相關(guān)的變量。通過主成分分析所得新變量在減少變量數(shù)目的同時(shí),盡可能保持了原有的特征信息。本文運(yùn)用主成分分析法對(duì)3個(gè)不同產(chǎn)區(qū)共180個(gè)咖啡豆在0.2~1.5 THz頻域范圍內(nèi)的光譜數(shù)據(jù)進(jìn)行處理,選取前3個(gè)主成分所得三維得分分布圖如圖3所示。
Fig3 3D principal component analysis diagram of 3 kinds coffee bean samples from different producing areas
從圖3可以看出,3種咖啡豆具有較好的聚類效果,其中埃塞俄比亞與印度尼西亞咖啡豆之間沒有相互交錯(cuò),而哥斯達(dá)黎加咖啡豆與前兩者均有交錯(cuò),這與咖啡豆在太赫茲波段下沒有特征吸收峰,光譜特征分布較廣有關(guān)。從圖3還可以看出前3個(gè)主成分的累積貢獻(xiàn)率為68.75%(PC1、PC2、PC3的貢獻(xiàn)率分別為31.22%、30.31%、7.22%),不能完全包含太赫茲有效波段下的信息。為得到基于太赫茲光譜的咖啡豆產(chǎn)地識(shí)別最優(yōu)主成分?jǐn)?shù),本文參考文獻(xiàn)[22],應(yīng)用偏最小二乘判別(PLSDA)方法,選取前3、4、…、50個(gè)主成分(前50個(gè)主成分的累計(jì)貢獻(xiàn)率可達(dá)98.96%)分別進(jìn)行咖啡豆產(chǎn)地鑒別。結(jié)果表明,在選取前3至20個(gè)主成分時(shí),鑒別正確率處于上升趨勢(shì),大于20后鑒別效果開始下降。因此,本文選取前20個(gè)主成分(累積貢獻(xiàn)率為96.36%)作為建模的特征輸入量。
2.1 支持向量機(jī)
支持向量機(jī)是一種基于有限樣本統(tǒng)計(jì)學(xué)習(xí)理論的有監(jiān)督機(jī)器學(xué)習(xí)方法,通過非線性映射將輸入變量映射到一個(gè)高維的特征向量空間,并在高維空間構(gòu)造最優(yōu)分類超平面,較好解決了小樣本、非線性、高維數(shù)、局部極小點(diǎn)等問題[23-25]。SVM回歸用一個(gè)非線性映射函數(shù)將數(shù)據(jù)映射到高維特征空間,在高維特征空間進(jìn)行線性回歸,依據(jù)結(jié)構(gòu)風(fēng)險(xiǎn)最小化(structural risk minimization, SRL)原則,將其學(xué)習(xí)過程轉(zhuǎn)化為凸優(yōu)化問題,即
(2)
式中為核寬度參數(shù),>0。
對(duì)于RBF核函數(shù)的SVM,有2個(gè)參數(shù)需要優(yōu)化,即邊界參數(shù)和核參數(shù),這2個(gè)參數(shù)對(duì)SVM的分類性能具有相當(dāng)大的影響[25]。其中邊界參數(shù)是SVM模型對(duì)結(jié)構(gòu)風(fēng)險(xiǎn)和樣本無誤差的折中,與可容忍的誤差相關(guān);核參數(shù)反映了數(shù)據(jù)樣本在高維特征空間中分布的復(fù)雜程度,決定了線性分類面的復(fù)雜度。目前在采用交叉驗(yàn)證(cross validation,CV)的方法下,用網(wǎng)格劃分能夠找到CV意義下的最高預(yù)測(cè)準(zhǔn)確率,即全局最優(yōu)解,但過程比較耗時(shí)。粒子群優(yōu)化算法基于群體智能優(yōu)化理論,通過群體中粒子間的合作與競(jìng)爭(zhēng)產(chǎn)生的群體智能指導(dǎo)優(yōu)化搜索。在本文中為了能夠在更大范圍內(nèi)尋找最佳的參數(shù)和,提高搜索效率,采用了基于粒子群尋優(yōu)的支持向量機(jī)建模方法。
2.2 粒子群算法
粒子群優(yōu)化算法[26]是一種具有很強(qiáng)全局尋優(yōu)能力的群智能優(yōu)化算法,在一個(gè)維的目標(biāo)搜索空間,由個(gè)粒子組成一個(gè)種群{1,2,…,Z},其中每個(gè)粒子所處的位置Z={Z1,Z2,…,Z}都表示問題的潛在的一個(gè)解,并依據(jù)目標(biāo)函數(shù)計(jì)算每個(gè)粒子的適應(yīng)度。然后每個(gè)粒子都在解空間中迭代搜索,不斷調(diào)整自己的位置搜索新解[27]。在每次尋優(yōu)迭代過程中,粒子根據(jù)式(4)和(5)進(jìn)行位置Z和速度V={V1,V2…, V}的更新。
(5)
2.3 基于PSO參數(shù)優(yōu)化的支持向量機(jī)分類模型
構(gòu)建基于PSO參數(shù)優(yōu)化的支持向量機(jī)分類模型的具體步驟如下。
1)采用對(duì)3類咖啡豆太赫茲光譜進(jìn)行主成分分析所得前20個(gè)主成分變量作為咖啡豆產(chǎn)地鑒別的特征向量,設(shè)置SVM模型參數(shù)的搜索范圍和初始化粒子群的相關(guān)參數(shù),如種群規(guī)模、學(xué)習(xí)因子、慣性權(quán)重、最大迭代次數(shù)等;
2)初始化粒子群。隨機(jī)產(chǎn)生邊界參數(shù)和核寬度參數(shù)的值作為每個(gè)粒子的初始位置,同時(shí)隨機(jī)初始化每個(gè)粒子的初始速度。
3)計(jì)算每個(gè)粒子的當(dāng)前適應(yīng)度。定義適應(yīng)度函數(shù)如式(5),通過對(duì)訓(xùn)練樣本的學(xué)習(xí)訓(xùn)練,得到各個(gè)粒子的正確分類數(shù),用以計(jì)算各個(gè)粒子的適應(yīng)度函數(shù)值。
式中T和分別表示正確分類的樣本個(gè)數(shù)和樣本總數(shù)。
4)計(jì)算每個(gè)粒子的當(dāng)前適應(yīng)值(Z),并與該粒子當(dāng)前自身的最優(yōu)適應(yīng)值(P)進(jìn)行比較,如果(Z)<(P),則調(diào)整(P)(Z),將當(dāng)前位置作為此刻該粒子的最優(yōu)位置。
5)將每一個(gè)粒子自身當(dāng)前最優(yōu)位置的適應(yīng)值(P)與所有粒子當(dāng)前最優(yōu)位置的適應(yīng)值(P)進(jìn)行比較,若(P)(P),則調(diào)整(P)(P),將調(diào)整后的位置作為所有粒子的最優(yōu)位置。
6)利用PSO的進(jìn)化方程(4)、(5)調(diào)整粒子的速度和位置,進(jìn)而得到支持向量機(jī)的參數(shù)。
7)判斷是否滿足給定的最大迭代次數(shù),如果滿足則停止尋優(yōu),并返回當(dāng)前最優(yōu)的SVM模型參數(shù)和;否則轉(zhuǎn)到步驟3)。
8)將最優(yōu)參數(shù)代入SVM模型,對(duì)測(cè)試樣本集進(jìn)行有效的分類。
輸入特征向量選用0.2~1.5 THz太赫茲頻域光譜的前20個(gè)主成分,模型參數(shù)選擇采用粒子群算法進(jìn)行優(yōu)化。試驗(yàn)過程中先對(duì)粒子群進(jìn)行參數(shù)初始化,參考文獻(xiàn)[28-30]中的研究結(jié)果,PSO算法中的種群粒子設(shè)為50,學(xué)習(xí)因子1=2=1.5;設(shè)定變權(quán)重取為起始值strat=0.9,終止值end=0.4;,的搜索范圍為?[2-2,22],?[2-2,22],步長(zhǎng)為20.5;終止迭代次數(shù)為100。最終通過試驗(yàn),經(jīng)過粒子群尋優(yōu)算法得到支持向量機(jī)的最優(yōu)參數(shù)結(jié)果為=1.393 66,=0.01。
為驗(yàn)證PSO-SVM分類方法的優(yōu)越性,將PSO-SVM方法與最小二乘支持向量機(jī)(least-square-support vector machine, LS-SVM)[31]、反向神經(jīng)網(wǎng)絡(luò)算法(back propagation neural network, BPNN)[32]進(jìn)行比較,結(jié)果如表1所示。從表中可以看出,3種算法對(duì)不同產(chǎn)地咖啡豆的鑒別效果都在80%以上,說明不同產(chǎn)地的咖啡豆在太赫茲波段下存在明顯差異,太赫茲光譜技術(shù)可用于咖啡豆產(chǎn)地的鑒別;同時(shí)3種模型中,支持向量機(jī)的鑒別效果明顯優(yōu)于BPNN,而經(jīng)過PSO參數(shù)優(yōu)化的SVM分類效果優(yōu)于LS-SVM。其中通過PSO-SVM所得的最優(yōu)模型預(yù)測(cè)結(jié)果在建模集中的正確率可達(dá)100%,在預(yù)測(cè)集中的正確率可達(dá)95%。對(duì)BPNN學(xué)習(xí)算法來說,造成鑒別效果不佳的原因可能是BPNN學(xué)習(xí)算法對(duì)訓(xùn)練樣本數(shù)量要求較高,高維輸入特征會(huì)對(duì)神經(jīng)網(wǎng)絡(luò)的訓(xùn)練結(jié)果精度帶來影響。粒子群算法對(duì)支持向量機(jī)參數(shù)的優(yōu)化是連續(xù)的,而支持向量機(jī)本身具有小樣本學(xué)習(xí)和解決高維特征的能力,所以最后能得到使分類精度更好的優(yōu)化參數(shù),獲取最優(yōu)的鑒別模型。
表1 3種建模方法的分類結(jié)果比較
本文以太赫茲時(shí)域光譜為檢測(cè)手段,研究了不同產(chǎn)地咖啡豆的快速鑒別問題。試驗(yàn)樣本選取埃塞俄比亞、印度尼西亞以及哥斯達(dá)黎加3個(gè)典型產(chǎn)地的咖啡豆。采用透射式太赫茲光譜系統(tǒng)獲取咖啡豆壓片樣品的太赫茲時(shí)域光譜和頻域光譜信息,并結(jié)合主成分分析進(jìn)行光譜特征的降維和提取,利用粒子群算法對(duì)支持向量機(jī)進(jìn)行參數(shù)尋優(yōu),建立了基于太赫茲光譜特征的咖啡豆產(chǎn)地鑒別模型。試驗(yàn)結(jié)果中本論文所提方法對(duì)不同產(chǎn)地咖啡豆的鑒別準(zhǔn)確率在建模集和預(yù)測(cè)集中分別高達(dá)100%和95%,優(yōu)于BPNN和LS-SVM算法。本文的研究表明太赫茲光譜技術(shù)可用于不同產(chǎn)地咖啡豆的快速鑒別,采用PSO優(yōu)化的SVM方法結(jié)合太赫茲光譜技術(shù)能夠獲得理想的鑒別模型。本文為咖啡豆產(chǎn)地鑒別提供了一種新方法,也為太赫茲光譜技術(shù)在其他復(fù)雜農(nóng)產(chǎn)品/食品中的檢測(cè)應(yīng)用提供了思路。
[1] 胡雙芳,衛(wèi)亞西,邢精精,等. 咖啡豆的化學(xué)組分差異與感官品質(zhì)的相關(guān)性分析[J]. 食品工業(yè)科技,2013,34(24):125-129.
Hu Shuangfang, Wei Xiya, Xin Jingjing, et al. Correlation analysis between chemical components and sensory quality of coffee[J]. Science and Technology of Food industry, 2013, 34(24): 125-129. (in Chinese with English abstract)
[2] 顧文佳,李兆階. 我國(guó)焙炒咖啡行業(yè)質(zhì)量調(diào)研報(bào)告[J]. 質(zhì)量與標(biāo)準(zhǔn)化,2014(12):35-37.
Gu Wenjia, Li Zhaojie. A Survey report on the quality of roasted coffee in China[J]. Quality and Standardization, 2014(12): 35-37. (in Chinese with English abstract)
[3] Semmelroch P, Laskawy G, Blank I, et al. Determination of potent odorants in roasted coffee by stable isotope dilution assay[J]. Flavour & Fragrance Journal, 1995, 10(1): 1-7.
[4] Piccino S, Boulanger R, Descroix F, et al. Aromatic composition and potent odorants of the “specialty coffee” brew “Bourbon Pointu” correlated to its three trade classifications[J]. Food Research International, 2014, 61(61): 264-271.
[5] 何余勤,胡榮鎖,張海德,等. 基于電子鼻技術(shù)檢測(cè)不同焙烤程度咖啡的特征性香氣[J]. 農(nóng)業(yè)工程學(xué)報(bào),2015,31(18):247-255.
He Yuqin, Hu Rongsuo, Zhang Haide, et al. Characteristic aroma detection of coffee at different roasting degree based on electronic nose[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2015, 31(18): 247-255. (in Chinese with English abstract)
[6] Cho J S, Bae H J, Cho B K, et al. Qualitative properties of roasting defect beans and development of its classification methods by hyperspectral imaging technology[J]. Food Chemistry, 2017, 220: 505-509.
[7] Chen T, Li Z, Yin X, et al. Discrimination of genetically modified sugar beets based on terahertz spectroscopy y [J]. Spectrochimica Acta Part A Molecular & Biomolecular Spectroscopy, 2016, 153: 586-590.
[8] Lian F, Ge H, Xia S, et al. Identification of wheat quality using THz spectrum[J]. Optics Express, 2014, 22(10): 12533-12544.
[9] Gente R, Busch S F, Stübling E M, et al. Quality control of sugar beet seeds with THz time-domain spectroscopy[J]. IEEE Transactions on Terahertz Science & Technology, 2016, 6(5): 754-756.
[10] 楊靜琦,李紹限,趙紅衛(wèi),等. L-天冬酰胺及其一水合物的太赫茲光譜研究[J]. 物理學(xué)報(bào),2014,63(13):105-111.
Yang Jingqi, Li Shaoxian, Zhao Hongwei, et al. Terahertz study of L-asparagine and its monohydrate[J]. Acta Physica Sinica, 2014, 63(13): 105-111. (in Chinese with English abstract)
[11] Liu J, Li Z. The terahertz spectrum detection of transgenic food[J]. Optik - International Journal for Light and Electron Optics, 2014, 125(23): 6867-6869.
[12] Gowen A A, O’Sullivan C, O’Donnell C P. Terahertz time domain spectroscopy and imaging: Emerging techniques for food process monitoring and quality control[J]. Trends in Food Science & Technology, 2012, 25(1): 40-46.
[13] Liu W, Liu C, Chen F, et al. Discrimination of transgenic soybean seeds by terahertz spectroscopy[J]. Scientific Reports, 2016, doi: 10.1038/srep35799.
[14] Liu W, Liu C, Hu X, et al. Application of terahertz spectroscopy imaging for discrimination of transgenic rice seeds with chemometrics[J]. Food Chemistry, 2016, 210: 415-421.
[15] Qin J Y, Ying Y B, Xie L J. The detection of agricultural products and food using terahertz spectroscopy: A Review[J]. Applied Spectroscopy Reviews, 2013, 48(6): 439-457.
[16] Redo-Sanchez A, Laman N, Schulkin B, et al. Review of terahertz technology readiness assessment and applications[J]. Journal of Infrared, Millimeter, and Terahertz Waves, 2013, 34(9): 500-518.
[17] 謝麗娟,徐文道,應(yīng)義斌,等. 太赫茲波譜無損檢測(cè)技術(shù)研究進(jìn)展[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2013,44(7):246-255.
Xie Lijuan, Xu Wendao, Ying Yibin, et al. Advancement and trend of terahertz spectroscopy technique for non-destructive detection[J]. Transactions of The Chinese Society for Agricultural Machinery, 2013, 44(7): 246-255. (in Chinese with English abstract)
[18] Su T F, Zhao G Z, Ren T B, et al. Characterizations of physico-chemical changes of corn biomass by steam explosion[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2015, 31(6): 253-256.
[19] 沈曉晨,李斌,李霞,等. 基于太赫茲時(shí)域光譜的轉(zhuǎn)基因與非轉(zhuǎn)基因棉花種子鑒別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2017,33(增刊1):288-292.
Shen Xiaochen, Li Bin, Li Xia, et al. Identification of transgenic and non-transgenic cotton seed based on terahertz range spectroscopy[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(Supp.1): 288-292. (in Chinese with English abstract)
[20] Ge H, Jiang Y, Lian F, et al. Characterization of wheat varieties using terahertz time-domain spectroscopy[J]. Sensors, 2014, 15(6): 12560-12572.
[21] Liu J, Li Z, Hu F, et al. A THz spectroscopy nondestructive identification method for transgenic cotton seed based on GA-SVM[J]. Optical and Quantum Electronics, 2015, 47(2): 313-322.
[22] 郝勇,孫旭東,高榮杰,等. 基于可見/近紅外光譜與SIMCA和PLS-DA的臍橙品種識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2010,26(12):373-377.
Hao Yong, Sun Xudong, Gao Rongjie, et al. Application of visible and near infrared spectroscopy to identification of navel orange varieties using SIMCA and PLS-DA methods[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2010, 26(12): 373-377. (in Chinese with English abstract)
[23] Vapnik V N. An overview of statistical learning theory[J]. IEEE Transactions on Neural Networks, 1999, 10(10): 988-999.
[24] Burges C J C. A Tutorial on Support Vector Machines for Pattern Recognition[J]. Data Mining and Knowledge Discovery, 1998, 2(2): 121-167.
[25] V David S A. Advanced support vector machines and kernel methods[J]. Neurocomputing, 2003, 55(1/2): 5-20.
[26] 焦有權(quán),趙禮曦,鄧歐,等. 基于支持向量機(jī)優(yōu)化粒子群算法的活立木材積測(cè)算[J]. 農(nóng)業(yè)工程學(xué)報(bào),2013,29(20):160-167.
Jiao Youquan, Zhao Lixi, Deng Ou, et al. Calculation of live tree timber volume based on partical swarm optimization and support vector regression[J]. Transaction of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2013, 29(20): 160-167. (in Chinese with English abstract)
[27] Venter G, Sobieszczanskisobieski J. Particle Swarm Optimization [J]. Aiaa Journal, 2013, 41(8):129-132.
[28] 劉偉,王建平,劉長(zhǎng)虹,等. 基于粒子群尋優(yōu)的支持向量機(jī)番茄紅素含量預(yù)測(cè)[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2012,43(4):143-147.
Liu Wei, Wang Jianping, Liu Changhong, et al. Lycopene content prediction based on support vector machine with particle swarm optimization[J]. Transactions of the Chinese Society for Agricultural Machinery, 2012, 43(4): 143-147. (in Chinese with English abstract)
[29] 劉曉峰,陳通. PSO算法的收斂性及參數(shù)選擇研究[J]. 計(jì)算機(jī)工程與應(yīng)用,2007,43(9):14-17. Liu Xiaofeng, Chen Tong. Study on convergence analysis and parameter choice of Particle Swarm Optimization[J]. Computer Engineering and Applications, 2007, 43(9): 14-17. (in Chinese with English abstract)
[30] Shi Y, Eberhart R C. Parameter Selection in Particle Swarm Optimization[C]. // Proceeding EP '98 Proceedings of the 7th International Conference on Evolutionary Programming VII. 1998: 591-600.
[31] Borin A, Ferr?o M F, Mello C, et al. Least-squares support vector machines and near infrared spectroscopy for quantification of common adulterants in powdered milk[J]. Analytica Chimica Acta, 2006, 579(1): 25-32.
[32] Dai H, MacBeth C. Effects of learning parameters on learning procedure and performance of a BPNN[J]. Neural Networks, 1997, 10(8): 1505-1521.
Rapid identification of producing area of coffee bean based on terahertz spectroscopy and support vector machine
Hu Xiaohua1, Liu Wei2,3※, Liu Changhong3, Qian Yunhui3
(1.230009; 2.230009; 3.230009)
Coffee is a very popular beverage in many countries. Coffee bean from different producing area has different flavour and functional properties, and thus the identification of producing area of coffee bean is important to assure the quality of coffee bean. The feasibility of a rapid and precise determination method of producing area of coffee bean was examined by using the terahertz (THz) time-domain spectra system (TAS7500TS HF1, Advantest Co., Ltd, Japan). Coffee bean samples from 3 different typical producing areas (Ethiopia, Costa Rica, and Indonesia) were collected and pressed into pellets for THz measurements. A total of 180 pellet samples (3 classes, each had 60 pellet samples) were randomly divided into calibration set (40 pellet samples for each class) and prediction set (20 pellet samples for each class). THz time-domain spectroscopy system worked with the TAS7500TS equipment in transmission mode. Before the experiment, the dry air was injected until the relative humidity reached below 3% to reduce the absorption of the THz waves by water in air. The parameters of THz system were as follow: frequency range was from 0.1 to 4 THz, the resolution was 7.6 GHz, the short pulse width was less than 50 fs and the average power was 20 mW. For each sample, the THz time-domain spectra were measured for 3 times at different position and then the average values were obtained. The frequency-domain spectra were acquired by a fast Fourier transform (FFT). Principal component analysis (PCA) with frequency-domain spectral data was performed to examine the qualitative difference of these 3 classes of coffee beans using the first 3 score vectors. The 3 groups of different class of coffee beans were almost apart from each other in the space of the first 3 principal components (PCs), although there was some overlap among the groups, which may be due to that the first 3 PCs only accounted for the all spectral variations of 68.75%. Thus, to reduce the dimension of the model features and retain more information of the THz spectra of samples, the first 20 components were selected as the spectral characteristics for the determination of producing area of coffee bean. The support vector machine (SVM), as a learning algorithm used for classification and regression tasks, was used to get the identification model. During the iteration for the optimum parameters selection, the particle swarm optimization (PSO) was designed, which could enlarge search space and improve search efficiency. The identification results of the PSO-SVM were compared with the least squares - support vector machine (LS-SVM) and back propagation neural network (BPNN). From the comparison, it was showed that the discrimination accuracy of all 3 classes of coffee beans using the PSO-SVM was up to 95% in prediction set and 100% in calibration set, respectively, which was the best model among the 3 methods. It can be concluded that the THz frequency spectra can be used as important features to identify the producing area of the coffee bean. The model with SVM method based on PSO can get better parameters of SVM to improve the identification ability than the traditional LS-SVM. THz spectra system combined with the proposed algorithm has been proved to be a very powerful and attractive tool for identification of producing area of coffee bean.
spectroscopy; models; support vector machine; coffee bean; terahertz; particle swarm optimization
10.11975/j.issn.1002-6819.2017.09.040
TP274+.3; TP391.44
A
1002-6819(2017)-09-0302-06
2017-02-22
2017-04-16
國(guó)家重點(diǎn)研發(fā)計(jì)劃項(xiàng)目(2016YFD0401104)
胡曉華,男,江西婺源人,主要從事太赫茲光譜無損檢測(cè)研究。合肥 合肥工業(yè)大學(xué)計(jì)算機(jī)與信息學(xué)院,230009。 Email:xiaohuahu@mail.hfut.edu.cn
劉 偉,男,安徽壽縣人,高級(jí)實(shí)驗(yàn)師,博士,主要從事檢測(cè)技術(shù)與模式識(shí)別研究。合肥合肥學(xué)院機(jī)器視覺與智能控制實(shí)驗(yàn)室,230601。Email:lwei1524@163.com
胡曉華,劉 偉,劉長(zhǎng)虹,錢赟惠. 基于太赫茲光譜和支持向量機(jī)快速鑒別咖啡豆產(chǎn)地[J]. 農(nóng)業(yè)工程學(xué)報(bào),2017,33(9):302-307. doi:10.11975/j.issn.1002-6819.2017.09.040 http://www.tcsae.org
Hu Xiaohua, Liu Wei, Liu Changhong, Qian Yunhui. Rapid identification of producing area of coffee bean based on terahertz spectroscopy and support vector machine[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(9): 302-307. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2017.09.040 http://www.tcsae.org