李南云 王旭光 吳華強 何青林
摘 要:對于復(fù)雜非配合情況下,視頻拼接中特征匹配對的數(shù)目和特征匹配準(zhǔn)確率無法同時達到后續(xù)穩(wěn)像和拼接的要求這一問題,提出一種基于灰度塔對特征點進行評分后構(gòu)建匹配模型來進行精準(zhǔn)特征匹配的方法。首先,利用灰度級壓縮后相近灰度級合并這一現(xiàn)象,建立灰度塔來實現(xiàn)對特征點的評分;而后,選取評分高的特征點建立基于位置信息的匹配模型;最后,依據(jù)匹配模型的定位進行區(qū)域分塊匹配來避免全局特征點的干擾和大誤差噪點匹配,選擇誤差最小的特征匹配對作為最終結(jié)果匹配對。另外,在運動的視頻流中,可通過前后幀信息建立掩模進行區(qū)域特征提取,匹配模型也可選擇性遺傳給后幀以節(jié)約算法時間。實驗結(jié)果表明,在運用了基于灰度塔評分的匹配模型后,特征匹配對準(zhǔn)確率在95%左右。相同幀特征匹配對的數(shù)目相較于隨機采樣一致性有近10倍的提升,在兼顧匹配數(shù)目和匹配準(zhǔn)確率的同時且無大誤差匹配結(jié)果,對于環(huán)境和光照有較好的魯棒性。
關(guān)鍵詞:特征提取;特征匹配;視頻拼接;灰度塔;匹配模型;分塊匹配
中圖分類號:TP391.4
文獻標(biāo)志碼:A
Abstract: Concerning the problem that in complex and noncooperative situations the number of matching feature pairs and the accuracy of feature matching results in video splicing can not meet the requirements of subsequent image stabilization and splicing at the same time, a method for constructing matching model based on grading of feature points by gradation tower is proposed. In the method of feature extraction, the phenomenon of merging gray scales after gray level compression is firstly used to establish the gray pyramid to realize the scoring of feature points. Then, the highscoring feature points are selected to establish the matching model based on position information. Finally, according to the positioning of the matching model, regional block matching is performed to avoid the influence of global interference and large error noise, and the feature matching pair with the smallest error is selected as the final result of matching pair. In addition, in the motion video stream, the feature extraction can be performed by using the previous frame information to establish a mask, and the matching model can be selectively passed to the back frame to save the algorithm time. The simulation test results show that the feature matching accuracy is about 95% after using this matching model based on the grayscale tower score. The number of matching feature pairs of the same frame is nearly 10 times higher than that of the traditional algorithm. The matching result has better robustness to the environment and illumination while keeping accuracy of matching result and getting rid of large error.
Concerning the problem that in complex and noncooperative situations the number of matching feature pairs and the accuracy of feature matching results in video stitching can not meet the requirements of subsequent image stabilization and stitching at the same time, a method of constructing matching model to match features accurately after feature points being scored by grayscale tower was proposed. Firstly, the phenomenon that the similiar grayscales would merged together after grayscale compression was used to establish a grayscale tower to realize the scoring of feature points. Then, the feature points with high score were selected to establish the matching model based on position information. Finally, according to the positioning of the matching model, regional block matching was performed to avoid the influence of global feature point interference and large error noise matching, and the feature matching pair with the smallest error was selected as the final result of matching pair. In addition, in a motion video stream, regional feature extraction could be performed by using the information of previous and next frames to establish a mask, and the matching model could be selectively passed on to the next frame to save the computation time. The simulation results show that after using this matching model based on grayscale tower score, the feature matching accuracy is about 95% and the number of matching feature pairs of the same frame is nearly 10 times higher than that of the traditional method. The proposed method has good robustness to environment and illumination while guaranteeing the matching number and the matching accuracy without large error matching result.
英文關(guān)鍵詞Key words: feature extraction; feature matching; video stitching; grayscale tower; matching model; block matching
0 引言
隨著近年圖像處理領(lǐng)域的飛速發(fā)展,圖像處理以及視頻處理在社會上的應(yīng)用也更加廣泛。單純的靜態(tài)圖像拼接已經(jīng)難以滿足目前社會的需求,靜態(tài)圖像難以讓人們獲取真實場景中的每個物體在時間流上的信息,而時間流上的信息在社會以及國防中是非常重要的一部分,因此近年來人們在視頻拼接領(lǐng)域中探討也愈發(fā)深入。
特征提取和匹配作為圖像拼接[1]中非常重要的一部分,也廣泛應(yīng)用于圖像識別[2]、三維模型構(gòu)建[3]等領(lǐng)域。提取出來的特征點的質(zhì)量直接影響著特征匹配的結(jié)果;而特征匹配的結(jié)果又直接關(guān)系到后續(xù)拼接、識別和模型構(gòu)建等最終的準(zhǔn)確率。特征匹配對作為圖像處理這個大廈中的基石,目前主要的難點為匹配對的數(shù)目以及匹配準(zhǔn)確率和匹配時間。
基于傳統(tǒng)圖像算法的特征提取方式主要有尺度不變特征變換(Scale Invariant Feature Transform, SIFT)[4]、加速穩(wěn)健特征(Speeded Up Robust Feature, SURF)[5]、ORB(Oriented FAST and Rotated BRIEF)[6]、BRISK(Binary Robust Invariant Scalable Keypoints)[7]、BRIEF(Binary Robust Independent Element Feature)[8]、FAST(Features form Accelerated Segment Test),其中SIFT和SURF的描述符具有尺度不變性和旋轉(zhuǎn)不變性,可以獲得較高的匹配準(zhǔn)確度,但是計算量較大,效率較低。而ORB、BRISK、FAST則是通過比較特征像素點周圍像素的差異來形成二值描述子,通過計算兩個特征描述子之間的Hamming距離[9]來比較兩個特征點之間的相關(guān)性。
特征匹配的篩選方式主要有K最近鄰(KNearest Neighbors, KNN)匹配和暴力匹配,然后加以隨機采樣一致性(RANdom Sample Consensus, RANSAC)[10]篩選出內(nèi)點集?;谌謫螒?yīng)矩陣的RANSAC從圖像全局進行考慮,保留圖像主平面的匹配結(jié)果,去除次平面以及視差平面的匹配對。也就是當(dāng)存在視差時,相機獲取的成像圖片中依據(jù)景物深度可以分為多個平面,RANSAC的篩選方式就是保留占比最大的深度平面的匹配結(jié)果,從原理上來說這種匹配方式會丟失一些平面的信息,在應(yīng)對存在視差較大情況下配準(zhǔn)能力不足。KNN匹配方式可以有效避免由于視差所造成的匹配對之間的差異,直接獲取匹配結(jié)果,但是由于不同成像圖片的不同光照信息和圖像信息,KNN的閾值一成不變會造成較大的匹配誤差,無法直接應(yīng)用于視頻拼接。上面兩種匹配方式無法避免由于噪聲影響造成的無規(guī)律錯誤匹配。
本文通過在視頻拼接中尋找特征匹配對數(shù)目較多且均勻分布的特征匹配方式,來解決視頻拼接中特征點數(shù)目不夠均勻無法考慮全局深度的問題。
3 提出的算法
由前文的視頻拼接介紹可知:想要獲得相機路徑和網(wǎng)格變換關(guān)系,就要一定程度上保證特征點對匹配的數(shù)目較多且盡量均勻分布,還要保證特征點對匹配的正確率,傳統(tǒng)的特征匹配方法難以達到這個要求, 所以為了達到這個目的,本文提出了基于特征點評分的匹配模型(Matching Model based on Grayscale Tower score, MMGT)構(gòu)建的特征匹配方式。
3.1 特征點評分
對輸入圖像進行特征提取時,優(yōu)秀的特征點提取方式提取的特征點具有尺度不變性和旋轉(zhuǎn)不變性,但是提取出的特征點包含噪點,數(shù)目巨大,因為光照影響并沒有完整的評分機制,從而對最終的匹配結(jié)果可信度和正確率造成影響。如果可以對特征點進行評分,那么評分較高的特征點匹配結(jié)果的可信度上升,在后續(xù)構(gòu)建匹配模型時,匹配模型的可信度也會提高,從而作用到最終的匹配結(jié)果上。引入灰度級下降的特征點的評分方式可以在一定程度上解決光照和錯誤特征點的影響。
灰度圖像中一般具有256個灰度級,在傳統(tǒng)的特征點提取過程中,首先構(gòu)建高斯金字塔,在高斯差分圖像中與其周圍相鄰的8個特征點和上下相鄰空間中的18個特征點比較灰度值,極值點就是最終獲得的特征點,此時的特征點具有尺度不變性。特征點定義為圖像中特殊的點,理想的特征點不會隨著尺度和光照的變換而消失,也就是說當(dāng)壓縮灰度級時,例如整體圖像的灰度級由250變成128,鄰近的灰度級會合并,但是優(yōu)秀的特征點仍然存在。
首先構(gòu)建高斯差分金字塔,比較其不同尺度空間的像素值后可以獲取到初始提取的特征點集P(n)和P′(n)。然后在這個特征點集中的每個特征點周圍取一個半徑為R的圓,圓內(nèi)的像素稱之為光照關(guān)聯(lián)像素,此時需要構(gòu)建一個尺度不變的光照金字塔:1)獲取特征點周圍光照信息關(guān)聯(lián)區(qū)域的直方圖;2)模擬光照,在不同的倍數(shù)下壓縮光照信息關(guān)聯(lián)區(qū)域的整體灰度級數(shù)目;3)比較不同灰度級金字塔上每一層的關(guān)聯(lián)區(qū)域的灰度下降梯度信息,當(dāng)梯度變換低于一定閾值后認(rèn)為該特征點消失;4)特征點P存在在金字塔上的層數(shù)越多,則認(rèn)為這個特征點的評分越高,同樣該特征點的質(zhì)量越高。如圖5所示。
3.2 匹配模型的構(gòu)建
經(jīng)過上述評分的步驟可以獲得帶有評分的特征點集P(n)和P′(n),評分越高代表特征點越可靠。下一步需要依據(jù)這個評分來構(gòu)建一個匹配模型,增加特征匹配對的數(shù)目和正確率,避免由于噪點引起的大誤差匹配。
5 結(jié)語
本文對多節(jié)點的UAV網(wǎng)絡(luò)采取高空視頻并進行實時拼接成全景視頻特征匹配進行了深入的研究,提出了基于灰度塔進行特征點評分,并構(gòu)建匹配模型的特征匹配方法,很好地解決了特征點提取的數(shù)目不夠且分布不夠均勻的問題,并利用視頻流中信息的遺傳機制,將匹配模型和感興趣區(qū)域進行遺傳,取得了很好的效果。但是該方法在實時性上還有一些不足,今后的研究應(yīng)該是如何對匹配模型匹配過程中的一些互相不相關(guān)的過程進行多線程并行處理加快匹配速度。另外在灰度塔的應(yīng)用上還可以進行擴展,但是要注意在圖像分割中偽輪廓的處理。
參考文獻 (References)
[1] IVAN S K, OLEG V P. Spherical video panorama stitching from multiple cameras with intersecting fields of view and inertial measurement unit[C]// Proceedings of the 2016 International Siberian Conference on Control and Communications. Piscataway, NJ: IEEE, 2016: 1-6.
[2] ZHANG L, HE Z, LIU Y. Deep object recognition across domains based on adaptive extreme learning machine[J]. Neurocomputing, 2017,239:194-203.
[3] YANG B, DONG Z, LIANG F, et al. Automatic registration of largescale urban scene point clouds based on semantic feature points[J]. ISPRS Journal of Photogrammetry & Remote Sensing, 2016, 113: 43-58.
[4] LOWE D G. Distinctive image features from scaleinvariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[5] HSU W Y, LEE Y C. Rat brain registration using improved speeded up robust features[J]. Journal of Medical and Biological Engineering, 2017, 37(1): 45-42.
[6] RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]// Proceedings of the 2011 International Conference on Computer Vision. Washington, DC: IEEE Computer Society, 2011:2564-2571.
[7] LEUTENEGGER S, CHLI M, SIEGWART Y. BRISK: binary robust invariant scalable keypoints[C]// Proceedings of the 2011 International Conference on Computer Vision. Washington, DC: IEEE Computer Society, 2011:2548-2555.
[8] CALONDER M, LEPETIT V, STRECHA C, et al. BRIEF: binary robust independent elementary features[C]// Proceedings of the 11th European Conference on Computer Vision. Berlin: Springer, 2010: 778-792.
[9] MOR M, FRAENKEL A S. A hash code method for detecting and correcting spellingerrors[J]. Communications of the ACM, 1982, 25(12): 935-938.
[10] SANTHA T, MOHANA M B V. The significance of realtime,biomedical and satellite image processing in understanding the objects & application to computer vision[C]// Proceedings of the 2nd IEEE International Conference on Engineering & Technology. Piscataway, NJ: IEEE, 2016: 661-670.
[11] BROWN M, LOWE D G. Automatic panoramic image stitching using invariant features[J]. International Journal of Computer Vision, 2007, 74(1):59-73.
[12] GUO H, LIU S, HE T, et al. Joint video stitching and stabilization from moving cameras[J]. IEEE Transactions on Image Processing, 2016, 25(11):5491-5503.
[13] 倪國強, 劉瓊. 多源圖像配準(zhǔn)技術(shù)分析與展望[J].光電工程, 2004, 31(9):1-6.(NI G Q,LIU Q. Analysis and prospect of multisource image registration techniques[J].OptoElectronic Engineering, 2004,31(9):1-6.)
[14] ZARAGOZA J, CHIN T J, BROWN M S, et al. Asprojectiveaspossible image stitching with moving DLT[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7):1285-1298.
[15] 朱云芳, 葉秀清, 顧偉康.視頻序列的全景圖拼接技術(shù)[J].中國圖象圖形學(xué)報,2006,11(8):1150-1155.(ZHU Y F,YE X Q,GU W K. Mosaic panorama technique for videos[J]. Journal of Image and Graphics,2006,11(8):1150-1155.)