張仲華,趙福媛,郭鈞楓,趙高長
柯西自適應(yīng)回溯搜索與最小二乘支持向量機(jī)的集成預(yù)測模型
張仲華,趙福媛*,郭鈞楓,趙高長
(西安科技大學(xué) 理學(xué)院,西安 710054)(*通信作者電子郵箱zhaofyzanx@foxmail.com)
針對在最小二乘支持向量機(jī)(LSSVM)的核函數(shù)參數(shù)和正則化參數(shù)優(yōu)化中回溯搜索優(yōu)化算法(BSA)易早熟、局部開采能力弱等問題,提出了一種集成預(yù)測模型CABSA-LSSVM。首先采用柯西種群生成策略增加歷史種群的多樣性使算法不易陷入局部最優(yōu)解,然后利用自適應(yīng)變異因子策略調(diào)節(jié)變異尺度系數(shù)以平衡算法的全局勘探和局部開采能力,最后運(yùn)用改進(jìn)后的柯西自適應(yīng)回溯搜索算法(CABSA)優(yōu)化LSSVM以形成新的集成預(yù)測模型。選取10個UCI數(shù)據(jù)集進(jìn)行數(shù)值實(shí)驗(yàn),結(jié)果表明所提模型CABSA-LSSVM在種群規(guī)模為80時回歸預(yù)測性能最優(yōu),且與標(biāo)準(zhǔn)BSA、粒子群優(yōu)化(PSO)算法、人工蜂群(ABC)算法、灰狼優(yōu)化(GWO)算法優(yōu)化的LSSVM相比,該模型的決定系數(shù)提升了1.21%~15.28%,預(yù)測誤差降低了6.36%~29.00%,運(yùn)行時間降低了5.88%~94.16%,可見該模型具有較高的預(yù)測精度和較快的計算速度。
集成預(yù)測模型;回溯搜索優(yōu)化算法;最小二乘支持向量機(jī);柯西種群生成策略;自適應(yīng)變異因子策略
近年來,最小二乘支持向量機(jī)(Least Square Support Vector Machine, LSSVM)被廣泛應(yīng)用于工程技術(shù)、醫(yī)療診斷、環(huán)境科學(xué)等諸多研究領(lǐng)域[1-2];然而LSSVM中核函數(shù)參數(shù)和正則化參數(shù)的選取會直接影響模型對樣本的擬合效果及泛化能力[3]。對此,在分類問題上,有研究者提出了基于完全有效隨機(jī)森林方法的類噪聲濾波學(xué)習(xí)框架(Complete and efficient Random Forest method based class Noise Filtering Learning framework, CRF-NFL)、多類別完全隨機(jī)森林(multiclass Complete Random Forest, mCRF)和多類別相對密度(multiclass Relative Density, mRD)等[4-6]方法來提高模型的泛化能力;在回歸問題上,研究者們采用多種智能優(yōu)化算法尋找核函數(shù)參數(shù)和正則化參數(shù)的最優(yōu)結(jié)果,包括遺傳算法(Genetic Algorithm, GA)[7]、粒子群優(yōu)化(Particle Swarm Optimization, PSO)算法[8]、自由搜索(Free Search, FS)算法[9]、蟻群優(yōu)化(Ant Colony Optimization, ACO)算法[10]、人工蜂群(Artificial Bee Colony, ABC)算法[11]、灰狼優(yōu)化(Grey Wolf Optimization,GWO)算法[12]和回溯搜索優(yōu)化算法(Backtracking Search optimization Algorithm, BSA)[3,13-14]等。其中BSA是于2013年提出的一種基于種群的新型元啟發(fā)式算法[15],其框架結(jié)構(gòu)由種群初始化、選擇Ⅰ、種群個體變異、種群個體交叉、選擇Ⅱ五部分組成。該算法結(jié)構(gòu)簡單、尋優(yōu)能力強(qiáng)、計算效率高,不僅沿用了經(jīng)典遺傳算法的優(yōu)化框架模式——初始化、擾動和選擇,而且在繁殖操作上能夠依概率地記憶前代種群并挑選更優(yōu)個體進(jìn)入下一代。BSA自提出以來在求解約束優(yōu)化問題、天線列陣、控制工程、電力系統(tǒng)和無線傳感網(wǎng)絡(luò)等多個領(lǐng)域得到了廣泛的應(yīng)用[16]。
本文擬圍繞BSA的改進(jìn)策略展開深入的研究,在綜合分析文獻(xiàn)[19]中針對易早熟問題采用柯西種群生成策略增加歷史種群多樣性和文獻(xiàn)[22]中針對局部開采能力差問題采用自適應(yīng)變異因子策略調(diào)節(jié)變異尺度系數(shù)的優(yōu)化機(jī)制的基礎(chǔ)上,提出一種基于集體智慧的柯西自適應(yīng)回溯搜索算法(Cauchy Adaptive Backtracking Search Optimization Algorithm,CABSA),并使用CABSA優(yōu)化LSSVM形成新的集成預(yù)測模型CABSA-LSSVM,最后選取10個UCI數(shù)據(jù)集進(jìn)行數(shù)值實(shí)驗(yàn),測試模型的回歸性能及回歸性能最優(yōu)時的種群規(guī)模。
1.1.1 初始化種群
BSA以均勻隨機(jī)的方式產(chǎn)生初始種群:
1.1.2 種群選擇Ⅰ
BSA的選擇Ⅰ階段決定初始?xì)v史種群:
采用if-them規(guī)則在每次迭代開始前更新操作,選擇重新定義歷史種群:
if 1.1.3 種群變異 1.1.4 種群交叉 1.1.5 種群選擇Ⅱ LSSVM將支持向量機(jī)(Support Vector Machine, SVM)求解凸二次規(guī)劃問題轉(zhuǎn)化為求解線性方程組問題,即把不等式約束轉(zhuǎn)化為等式約束,從而提高求解問題的速度及求解精度[1]。LSSVM的回歸問題可以表示為如下約束優(yōu)化問題 引入滿足Mercer條件的徑向基核函數(shù): 為了提高BSA的全局性能,本文提出一種基于集體智慧的柯西種群生成策略和自適應(yīng)變異因子策略的CABSA。主要包含兩個方面:1)采用柯西種群生成策略的多種群方法在不喪失歷史種群多樣性的前提下提高BSA的尋優(yōu)能力;2)采用自適應(yīng)變異因子策略設(shè)計一個自適應(yīng)參數(shù),平衡BSA的全局勘探和局部開采能力。 在標(biāo)準(zhǔn)BSA中采用了均勻隨機(jī)機(jī)制生成歷史種群,該機(jī)制隨著迭代次數(shù)的增加,歷史種群的信息和當(dāng)代的種群信息逐漸趨于相同,會導(dǎo)致歷史種群的多樣性降低,使算法極易陷入局部最優(yōu)[19]。 為了提高歷史種群的多樣性從而擴(kuò)大算法的搜索空間,利用柯西分布尺度系數(shù)對種群變異生成新的歷史種群。由柯西分布函數(shù) 可得到 改進(jìn)后的歷史種群生成方程可提高歷史種群的多樣性并生成更加優(yōu)良的新種群,從而使算法快速跳出局部最優(yōu),獲得比較精確的最優(yōu)解。 當(dāng)集成預(yù)測模型進(jìn)行回歸預(yù)測時,采用LSSVM的輸出誤差作為CABSA的適應(yīng)度值 CABSA-LSSVM的流程如圖1所示,具體步驟如下: 3)采用自適應(yīng)變異因子策略,進(jìn)行迭代尋優(yōu),按照BSA流程,更新種群個體最優(yōu)值和適應(yīng)度值。 圖1 CABSA-LSSVM流程 為了驗(yàn)證CABSA-LSSVM的回歸預(yù)測性能,選取10個UCI數(shù)據(jù)集進(jìn)行數(shù)值實(shí)驗(yàn),具體數(shù)據(jù)集信息如表1所示。 表1 UCI數(shù)據(jù)集基本信息 實(shí)驗(yàn)采用荷語魯汶天主教大學(xué)編寫的LS-SVMlab v1.8版本工具箱(http://www.esat.kuleuven.be/sista/lssvmlab),運(yùn)行環(huán)境為64位Windows 7操作系統(tǒng),Intel Core i5-5200U CPU @ 2.20GHz處理器和Matlab2013a。由于每次實(shí)驗(yàn)運(yùn)行隨機(jī)生成每個優(yōu)化方案的訓(xùn)練集和測試集,輸出的最優(yōu)結(jié)果都是一個最優(yōu)解的近似值,所以為了使實(shí)驗(yàn)結(jié)果更具有說服力,所有實(shí)驗(yàn)結(jié)果均采用獨(dú)立運(yùn)行30次的平均值。 本實(shí)驗(yàn)分為兩個部分:1)研究不同種群規(guī)模對CABSA-LSSVM回歸預(yù)測性能的影響,尋找模型回歸預(yù)測性能最優(yōu)的種群規(guī)模;2)在種群規(guī)模和最大迭代次數(shù)一定的條件下,對比分析CABSA-LSSVM與標(biāo)準(zhǔn)BSA-LSSVM、PSO-LSSVM、ABC-LSSVM和GWO-LSSVM的回歸預(yù)測性能。 模型的參數(shù)選擇中,種群規(guī)模是一個重要參數(shù)。研究過程中容易忽視種群規(guī)模對模型預(yù)測性能的影響,如何選擇合適的種群規(guī)模暫時沒有明確的方法。研究表明:種群規(guī)模過大,運(yùn)行時間會增加,迭代過程運(yùn)行緩慢;種群規(guī)模過小則會使種群個體多樣性降低,容易導(dǎo)致模型過早收斂于次優(yōu)解,即出現(xiàn)早熟現(xiàn)象,可能難以搜索到全局最優(yōu)解[24]。研究種群規(guī)模對模型性能的影響是本文研究的重點(diǎn)內(nèi)容之一,因此通過實(shí)驗(yàn)分析了不同種群規(guī)模對CABSA-LSSVM的預(yù)測精度和計算速度的影響。 圖2 種群規(guī)模與預(yù)測誤差和運(yùn)行時間的關(guān)系 由圖2可以看出,隨著種群規(guī)模的增大,10個UCI數(shù)據(jù)集的運(yùn)行時間在波動中呈上升趨勢,模型的預(yù)測誤差呈現(xiàn)先減小后增大趨勢。在種群規(guī)模為80時,電腦硬件、房地產(chǎn)估價、混凝土抗壓強(qiáng)度、翼型自噪聲、數(shù)值預(yù)測模型溫度預(yù)報的偏差校正、聯(lián)合循環(huán)發(fā)電廠數(shù)據(jù)集的預(yù)測誤差最小,每日需求預(yù)測訂單、混凝土塌落度測試、自動手脈、波士頓房價數(shù)據(jù)集的預(yù)測誤差相對較小,預(yù)測精度相對較高,計算速度相對較快。因此,實(shí)驗(yàn)結(jié)果表明CABSA-LSSVM的種群規(guī)模為80時,模型的回歸預(yù)測性能較好。 為了進(jìn)一步說明改進(jìn)后的CABSA-LSSVM的回歸預(yù)測性能,對比分析CABSA-LSSVM與標(biāo)準(zhǔn)BSA-LSSVM、PSO-LSSVM、ABC-LSSVM和GWO-LSSVM的收斂曲線圖,結(jié)果如圖3所示。 表2 不同預(yù)測模型的最優(yōu)參數(shù) 圖3 各模型的收斂曲線對比 由圖3可以看出,CABSA-LSSVM的收斂速度和收斂精度均優(yōu)于其他四種對比預(yù)測模型。為了衡量模型的回歸預(yù)測性能,引入以下性能指標(biāo): 運(yùn)行時間表示模型的計算速度,運(yùn)行時間越短則模型的計算速度越快。 表3 不同預(yù)測模型的性能指標(biāo) 10個UCI數(shù)據(jù)集的測試集中預(yù)測值與真實(shí)值的相關(guān)分析如圖4所示。由圖4可以看出10個UCI數(shù)據(jù)集中CABSA-LSSVM的預(yù)測值與真實(shí)值之間呈線性關(guān)系,且分布在擬合直線附近,說明數(shù)據(jù)集的預(yù)測值與真實(shí)值非常相近,模型的預(yù)測效果較好。 圖4 CABSA-LSSVM的預(yù)測值與真實(shí)值的相關(guān)圖 本文為了解決LSSVM核函數(shù)參數(shù)和正則化參數(shù)優(yōu)化中BSA易早熟、局部開采能力弱等問題,提出一種基于集體智慧的CABSA-LSSVM。該模型通過定義柯西種群生成方程增加種群多樣性和自適應(yīng)策略變異方程調(diào)整搜索步長平衡BSA的全局勘探和局部開采能力,然后對LSSVM進(jìn)行優(yōu)化得到最優(yōu)參數(shù)。在10個UCI數(shù)據(jù)集上進(jìn)行數(shù)值實(shí)驗(yàn),對比分析了五種預(yù)測模型的回歸性能,結(jié)果表明種群規(guī)模為80時模型的回歸預(yù)測性能最優(yōu),且該模型在預(yù)測精度和計算速度方面具有明顯優(yōu)勢。在后續(xù)的工作中將進(jìn)一步探索模型的調(diào)優(yōu)策略并將模型應(yīng)用于多標(biāo)簽分類問題。 [1] 仝玉婷.最小二乘支持向量回歸機(jī)的算法研究[D].金華:浙江師范大學(xué),2018:2-17.(TONG Y T. Study of least square support vector regression[D]. Jinhua: Zhejiang Normal University, 2018: 2-17.) [2] 郭新辰. 最小二乘支持向量機(jī)算法及應(yīng)用研究[D]. 長春:吉林大學(xué), 2008: 16-20.(GUO X C. Study on least square support vector algorithms and their applications[D]. Changchun: Jilin University, 2008: 16-20.) [3] TIAN Z D. Backtracking search optimization algorithm-based least square support vector machine and its applications[J]. Engineering Applications of Artificial Intelligence, 2020, 94: No.103801. [4] XIA S Y, WANG G Y, CHEN Z Z, et al. Complete random forest based class noise filtering learning for improving the generalizability of classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(11): 2063-2078. [5] XIA S Y, CHEN B Y, WANG G Y, et al. mCRF and mRD: two classification methods based on a novel multiclass label noise filtering learning framework[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021(Early Access): 1-15. [6] KIM H Y, WON C H. Forecasting the volatility of stock price index: a hybrid model integrating LSTM with multiple GARCH-type models[J]. Expert Systems with Applications, 2018, 103: 25-37. [7] ZENDEHBOUDI A. Implementation of GA-LSSVM modelling approach for estimating the performance of solid desiccant wheels[J]. Energy Conversion and Management, 2016, 127: 245-255. [8] CHAMKALANI A, ZENDEHBOUDI S, BAHADORI A, et al. Integration of LSSVM technique with PSO to determine asphaltene deposition[J]. Journal of Petroleum Science and Engineering, 2014, 124: 243-253. [9] TIAN Z D, LI S J, WANG Y H, et al. A prediction method based on wavelet transform and multiple models fusion for chaotic time series[J]. Chaos, Solitons and Fractals, 2017, 98: 158-172. [10] LIU W C, ZHANG X R. Research on the supply chain risk assessment based on the improved LSSVM algorithm[J]. International Journal of u- and e- Service, Science and Technology, 2016, 9(8): 297-306. [11] JAIN S, BAJAJ V, KUMAR A. Efficient algorithm for classification of electrocardiogram beats based on artificial bee colony-based least-squares support vector machines classifier[J]. Electronics Letters, 2016, 52(14): 1198-1200. [12] YANG A L, LI W D, YANG X. Short-term electricity load forecasting based on feature selection and least squares support vector machines[J]. Knowledge-Based Systems, 2019, 163: 159-173. [13] 董彩云. 刀具磨損狀態(tài)識別和預(yù)測方法研究[D]. 武漢:華中科技大學(xué), 2017: 34-54.(DONG C Y. Research on state recognition and prediction method of milling tool wear[D]. Wuhan: Huazhong University of Science and Technology, 2017: 34-54.) [14] 蔡力鋼,李海波,楊聰彬,等. 基于改進(jìn)VMD和自適應(yīng)BSA優(yōu)化LS-SVM的刀具磨損狀態(tài)監(jiān)測方法[J]. 北京工業(yè)大學(xué)學(xué)報, 2021, 47(1): 10-23.(CAI L G, LI H B, YANG C B, et al. Tool wear state recognition model based on modified variational mode decomposition and LS-SVM with the adaptive backtracking search algorithm[J]. Journal of Beijing University of Technology, 2021, 47(1): 10-23.) [15] CIVICIOGLU P. Backtracking search optimization algorithm for numerical optimization problems[J]. Applied Mathematics and Computation, 2013, 219(15): 8121-8144. [16] 王海龍. 基于自然啟發(fā)的回溯搜索優(yōu)化算法開采能力的改進(jìn)研究[D]. 荊州:長江大學(xué), 2018: 2-12.(WANG H L. Modification research on the exploitation capability of the backtracking search optimization algorithm based on natural inspiration[D]. Jingzhou: Yangtze University, 2018: 2-12.) [17] CHEN D B, ZOU F, LU R Q, et al. Learning backtracking search optimisation algorithm and its application[J]. Information Sciences, 2017, 376: 71-94. [18] DUAN H B, LUO Q N. Adaptive backtracking search algorithm for induction magnetometer optimization[J]. IEEE Transactions on Magnetics, 2014, 50(12): No.6001206. [19] 魏鋒濤,史云鵬,石坤. 具有組合變異策略的回溯搜索優(yōu)化算法[J]. 計算機(jī)工程與應(yīng)用, 2020, 56(9): 41-47.(WEI F T, SHI Y P, SHI K. Backtracking search optimization algorithm with combined mutation strategy[J]. Computer Engineering and Applications, 2020, 56(9): 41-47.) [20] CHEN D B, ZOU F, LU R Q, et al. Backtracking search optimization algorithm based on knowledge learning[J]. Information Sciences, 2019, 473: 202-226. [21] ZHOU J X, YE H, JI X Y, et al. An improved backtracking search algorithm for casting heat treatment charge plan problem[J]. Journal of Intelligent Manufacturing, 2019, 30(3): 1335-1350. [22] 胡率,肖治華,饒強(qiáng),等. 改進(jìn)回溯搜索優(yōu)化回聲狀態(tài)網(wǎng)絡(luò)時間序列預(yù)測[J]. 計算機(jī)系統(tǒng)應(yīng)用, 2020, 29(1): 236-243.(HU S, XIAO Z H, RAO Q, et al. Time series forecasting based on echo state network optimized by improved backtracking search optimization algorithm[J]. Computer Systems and Applications, 2020, 29(1): 236-243.) [23] ZHAO W T, WANG L J, YIN Y L, et al. Sequential quadratic programming enhanced backtracking search algorithm[J]. Frontiers of Computer Science, 2018, 12(2): 316-330. [24] 劉曉霞. 種群規(guī)模對遺傳算法性能影響的研究[D]. 保定:華北電力大學(xué)(河北), 2010: 13-23.(LIU X X. A research on population size impaction on the performance of genetic algorithm[D]. Baoding: North China Electric Power University, 2010: 13-23.) Integrated prediction model of Cauchy adaptive backtracking search and least square support vector machine ZHANG Zhonghua, ZHAO Fuyuan*, GUO Junfeng, ZHAO Gaochang (,’,’710054,) Aiming at the problem that Backtracking Search optimization Algorithm (BSA) is easy to premature and has weak local development ability in the optimization of kernel function parameters and regularization parameters of Least Square Support Vector Machine (LSSVM), an integrated prediction model named CABSA-LSSVM was proposed. Firstly, the Cauchy population generation strategy was used to improve the diversity of historical populations, so that the algorithm was not easy to fall into the local optimal solution. Then, the adaptive mutation factor strategy was used to balance the global exploration and local development abilities of the algorithm by adjusting the mutation scale coefficient. Finally, the improved Cauchy Adaptive Backtracking Search Algorithm (CABSA) was used to optimize the LSSVM to form a new integrated prediction model. Ten UCI datasets were selected for numerical experiments. The results show that the proposed model CABSA-LSSVM has the best regression prediction performance when the population size is 80. Compared with the LSSVMs optimized by the standard BSA, Particle Swarm Optimization (PSO) algorithm, Artificial Bee Colony (ABC) algorithm and Grey Wolf Optimization (GWO) algorithm, the proposed model has the coefficient of determination increased by 1.21%-15.28%, the prediction error reduced by 6.36%-29.00%, and the running time reduced by 5.88%-94.16%. In conclusion, the proposed model has high prediction accuracy and fast computation speed. integrated prediction model; Backtracking Search optimization Algorithm (BSA); Least Square Support Vector Machine (LSSVM); Cauchy population generation strategy; adaptive mutation factor strategy This work is partially supported by National Natural Science Foundation of China (11201277). ZHANG Zhonghua, born in 1977, Ph. D., professor. His research interests include pattern recognition, biomathematics. ZHAO Fuyuan, born in 1995, M. S. candidate. Her research interests include computational simulation, intelligent algorithms, biometrics. GUO Junfeng, born in 1996, M. S. candidate. Her research interests include grey prediction, biometrics. ZHAO Gaochang, born in 1965, professor. His research interests include pattern recognition, computational simulation, intelligent algorithms. TP301.6; TP181 A 1001-9081(2022)06-1829-08 10.11772/j.issn.1001-9081.2021040577 2021?04?14; 2021?06?11; 2021?06?11。 國家自然科學(xué)基金資助項(xiàng)目(11201277)。 張仲華(1977—),男,河南息縣人,教授,博士,主要研究方向:模式識別、生物數(shù)學(xué);趙福媛(1995—),女,河北承德人,碩士研究生,主要研究方向:計算仿真、智能算法、生物統(tǒng)計;郭鈞楓(1996—),女,山西臨汾人,碩士研究生,主要研究方向:灰色預(yù)測、生物統(tǒng)計;趙高長(1965—),男,陜西大荔人,教授,主要研究方向:模式識別、計算仿真、智能算法。1.2 最小二乘支持向量機(jī)
2 柯西自適應(yīng)回溯搜索算法
2.1 柯西種群生成策略
2.2 自適應(yīng)變異因子策略
3 集成預(yù)測模型
4 數(shù)值實(shí)驗(yàn)
4.1 不同種群規(guī)模的性能測試
4.2 回歸預(yù)測性能測試
5 結(jié)語