• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

      基因預(yù)測(cè)算法中閾值的傅里葉質(zhì)譜分析

      2014-07-02 19:06:35劉平等
      湖北農(nóng)業(yè)科學(xué) 2014年6期
      關(guān)鍵詞:信噪比

      劉平等

      摘要:蛋白質(zhì)編碼區(qū)預(yù)測(cè)中閾值選擇對(duì)預(yù)測(cè)結(jié)果的影響不容忽視。研究提出以歸一化的功率譜密度作為判別DNA序列編碼區(qū)和非編碼區(qū)的閾值,以FIR(Finite impulse response,F(xiàn)IR)窄通帶濾波器NPBF(Narrow pass band filter,NPBF)作為編碼區(qū)預(yù)測(cè)算法核心,采用DNA序列集HMR195和ALLSEQ作為測(cè)試集,以堿基層的近似相關(guān)系數(shù) (Approximate correlation,AC)為預(yù)測(cè)準(zhǔn)確率測(cè)度指標(biāo),對(duì)所提出方法與現(xiàn)有方法的預(yù)測(cè)結(jié)果做了比較。結(jié)果表明,采用新閾值得到的預(yù)測(cè)準(zhǔn)確率最高,算法簡(jiǎn)單直觀。

      關(guān)鍵詞:蛋白質(zhì)編碼區(qū)預(yù)測(cè);窄通帶濾波器;歸一化的功率譜密度值;信噪比;近似相關(guān)系數(shù)

      中圖分類號(hào):TP391.9;TN713 文獻(xiàn)標(biāo)識(shí)碼:A 文章編號(hào):0439-8114(2014)06-1432-04

      Analysis on Threshold Used in Gene Prediction Algorithm Based on Fourier Spectrum

      LIU Ping1,MA Yu-tao1,SUN Xue-hong1,ZHANG Cheng1,DU Yong2

      (1.School of Physics & Electrical Information Engineering/Ningxia Key Laboratory of Intelligent Sensing for Desert Information, Ningxia University,Yinchuan 750021,China;2.Department of Pediatric Surgery,General Hospital of Ningxia Medical University,Yinchuan 750004,China)

      Abstract: Threshold selection of protein coding regions prediction algorithm has important influence on the prediction accuracy. In this paper, a new threshold and normalized value of power spectrum density was proposed to differentiate protein coding regions and non-coding regions. Using the FIR (Finite impulse response) NPBF (Narrow pass-band filter) as the kernel of the prediction algorithm and taking the DNA sequences data sets HMR195 and ALLSEQ as the test sets, the prediction results of the NPBF algorithm with new threshold was compared with those of the same algorithm using other two thresholds. The results were discussed with the AC(Approximate correlation) used as a base level prediction accuracy measure. It was indicated that the proposed threshold was the best choice for higher AC and less amount of computation.

      Key words: protein coding regions prediction; narrow pass-band filter; normalized value of power spectrum density; ratio of signal to noise; approximate correlation

      蛋白質(zhì)編碼區(qū)預(yù)測(cè)對(duì)于DNA序列的注釋和標(biāo)注工作具有很重要的指導(dǎo)意義[1-3]。在現(xiàn)有的蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法中,Tiwari等[4]提出的SDFT(Sliding discrete fourier transform,SDFT)算法使用了信噪比RSN(Ratio of signal to noise,RSN)作為區(qū)分編碼區(qū)和非編碼區(qū)的閾值;Mena-Chalco等[5]使用預(yù)測(cè)非編碼率PNCR(Predicted non-coding ratio,PNCR)作為閾值;Ambikairajah等[6]、Akhtar等[7]在作DNA序列的PSD(Power spectral density,PSD)曲線圖時(shí)對(duì)曲線的幅度作了歸一化處理。面對(duì)兩種不同的閾值選擇,在基因預(yù)測(cè)時(shí)哪一個(gè)能給出最好的預(yù)測(cè)結(jié)果,是否還有更好的閾值選擇,還需進(jìn)行研究并確定。

      本研究提出采用歸一化的功率譜密度(Power spectrum sensity normalized by its maximum value, PSDN)作為區(qū)分編碼區(qū)和非編碼區(qū)的閾值,采用FIR (Finite impulse response,F(xiàn)IR) NPBF (Narrow pass-band filter,NPBF)蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法作為平臺(tái)[8,9],采用DNA序列集HMR195[10]和ALLSEQ[11]作為算法的測(cè)試序列集,采用Sn(Sensitivity)、Sp(Specificity)、FPR(False positive rate)、AC(Approximate correlation)和CC(Correlation coefficient)作為預(yù)測(cè)結(jié)果的指標(biāo)[11],比較了RSN、PNCR和PSDN分別作為閾值時(shí)的預(yù)測(cè)結(jié)果,為獨(dú)立基因預(yù)測(cè)算法中的閾值選擇提供參考。

      1 材料與方法

      1.1 材料

      采用基因序列AB003730(序列集HMR195中的一個(gè)DNA序列)作為標(biāo)準(zhǔn)序列來(lái)比較采用前述3種閾值時(shí)蛋白質(zhì)編碼區(qū)預(yù)測(cè)的結(jié)果;采用ALLSEQ和HMR195 DNA序列集來(lái)驗(yàn)證閾值選擇對(duì)預(yù)測(cè)結(jié)果產(chǎn)生影響的廣泛性。

      1.2 NPBF基因預(yù)測(cè)算法

      基于FIR窄通帶濾波器的編碼區(qū)預(yù)測(cè)算法主要包括以下步驟:①采用Voss法將DNA序列映射成數(shù)值序列(信號(hào));②使用FIR窄通帶濾波器對(duì)前一步得到的數(shù)值信號(hào)進(jìn)行濾波,濾除非周期為3的信號(hào);③計(jì)算信號(hào)的功率譜密度(PSD);④對(duì)PSD曲線進(jìn)行滑動(dòng)平均濾波和幅度歸一化;⑤用非編碼率作為閾值對(duì)DNA序列進(jìn)行分類,確定DNA序列中的編碼區(qū)和非編碼區(qū),并以一種或多種預(yù)測(cè)準(zhǔn)確率指標(biāo)給出預(yù)測(cè)結(jié)果。

      采用Voss法將由堿基Adenine (A),Thymine (T),Cytosine (C)和Guanine (G)組成的DNA序列映射為數(shù)值序列x1[n],l={A,T,C,G}[1-9],讓其通過(guò)FIR窄通帶濾波器濾波后,得到了周期為3的信號(hào)y1[n],l={A,T,C,G}。DNA序列編碼信號(hào)的功率譜密度

      PSD[n]=■■y■[n]■,l=A,T,C,G;n=1,…,L

      (1)

      式中,N為FIR濾波器長(zhǎng)度,L為DNA序列的長(zhǎng)度。

      在實(shí)際編碼區(qū)預(yù)測(cè)算法仿真中存在濾波輸出序列不夠光滑的問(wèn)題,因此,在統(tǒng)計(jì)預(yù)測(cè)結(jié)果之前先采用1個(gè)110階的移動(dòng)平均濾波器對(duì)預(yù)測(cè)進(jìn)行平滑處理。式(2)為1個(gè)Nma階的移動(dòng)平均濾波器的差分方程。

      PSD■[n]=■■PSD(n-i)(2)

      在計(jì)算出序列的移動(dòng)平均功率譜后,采用其最大值作為標(biāo)準(zhǔn)進(jìn)行歸一化以便于不同算法結(jié)果的比較。之后,采用預(yù)測(cè)非編碼率作為閾值,使得閾值范圍為1~99,且改變?yōu)V波器的長(zhǎng)度,以便獲得算法的最好預(yù)測(cè)準(zhǔn)確率閾值。本研究用敏感度(Sn)、特異度(Sp)、近似相關(guān)系數(shù)(AC)和相關(guān)系數(shù)(CC)來(lái)評(píng)估算法對(duì)編碼區(qū)的識(shí)別性能[11]。其中,AC作為整體預(yù)測(cè)準(zhǔn)確率的測(cè)度,便于與其他文獻(xiàn)的研究結(jié)果進(jìn)行比較;Sn、Sp作為參考測(cè)度,用于對(duì)標(biāo)準(zhǔn)序列進(jìn)行研究。

      1.3 3種閾值運(yùn)算量的比較

      以RSN為閾值的預(yù)測(cè)需要計(jì)算每個(gè)序列PSD的均值,然后根據(jù)RSN計(jì)算出與之對(duì)應(yīng)的一個(gè)PSDN作為閾值;以PNCR為閾值的預(yù)測(cè)實(shí)際上需要將DNA序列的PSD排序,然后根據(jù)指定的PNCR計(jì)算出一個(gè)與其對(duì)應(yīng)的PSDN作為閾值;以PSDN作為閾值只需要選擇一個(gè)PSDN即可。因此,以PSDN作為閾值的預(yù)測(cè)算法的運(yùn)算量最小。

      2 結(jié)果與分析

      2.1 窄通帶濾波器的實(shí)現(xiàn)

      在編碼區(qū)預(yù)測(cè)試驗(yàn)中使用了119和 599兩種窗長(zhǎng)的APNPBF(All phase NPBF)窄通帶濾波器,圖1為窗長(zhǎng)為599的APNPBF的頻率響應(yīng)。對(duì)于DNA序列集中長(zhǎng)度小于600 bp的DNA序列,在預(yù)測(cè)時(shí)使用的是窗長(zhǎng)為119的濾波器,以減少由于輸入序列進(jìn)行補(bǔ)零等延拓處理造成對(duì)預(yù)測(cè)結(jié)果的失真。

      2.2 編碼區(qū)預(yù)測(cè)結(jié)果

      采用3種閾值在序列AB003730上進(jìn)行預(yù)測(cè)分析,以RSN、PNCR和PSDN為閾值得到的預(yù)測(cè)曲線分別見(jiàn)圖2a、圖2b、圖2c;3種閾值對(duì)應(yīng)預(yù)測(cè)結(jié)果的ROC曲線見(jiàn)圖2d;對(duì)ROC曲線左上角的局部進(jìn)行了放大(圖2e)。對(duì)于閾值RSN來(lái)說(shuō),其ROC曲線是通過(guò)令RSN以0.08為步長(zhǎng),取0.08~8.00共100個(gè)值,將這些閾值獲得預(yù)測(cè)結(jié)果的FPR和TPR配對(duì)在二維平面上描出的曲線。對(duì)于閾值PNCR和PSDN來(lái)說(shuō),其ROC曲線的獲得與RSN相類似,取值范圍分別為1≤PNCR≤100和0.01≤PSDN≤1.00,步長(zhǎng)都取0.01。ROC曲線下的AUC(Area under the ROC curve)面積越大則表明算法對(duì)編碼區(qū)和非編碼區(qū)的區(qū)分能力越強(qiáng)。

      基于3種閾值的序列AB003730的最好預(yù)測(cè)結(jié)果見(jiàn)表1。最好預(yù)測(cè)結(jié)果是指在閾值的某個(gè)變化范圍內(nèi),選用某個(gè)具體數(shù)值時(shí)獲得的預(yù)測(cè)準(zhǔn)確率AC最高。對(duì)于閾值RSN來(lái)說(shuō),其選擇范圍建議取1

      采用上述3種閾值分別對(duì)ALLSEQ和HMR195DNA序列集進(jìn)行預(yù)測(cè), 結(jié)果見(jiàn)表2。由表2可知,PSDN作為閾值在ALLSEQ和HMR195上都獲得了最高的預(yù)測(cè)準(zhǔn)確率,同時(shí)采用RSN作為閾值預(yù)測(cè)結(jié)果要好于采用PNCR。

      RSN作為閾值能夠?qū)⒕幋a信號(hào)強(qiáng)度較強(qiáng)的區(qū)域預(yù)測(cè)為編碼區(qū),強(qiáng)調(diào)了一個(gè)DNA序列中編碼區(qū)和非編碼區(qū)PSD大小的差別,但對(duì)編碼信號(hào)較弱和編碼信號(hào)較強(qiáng)且編碼區(qū)占DNA序列完整長(zhǎng)度比較高的編碼區(qū)則都不能正確識(shí)別;PNCR作為閾值則限定任何序列都只有某一個(gè)固定的百分比是編碼區(qū),這與實(shí)際情況不符;而PSDN作為閾值則只強(qiáng)調(diào)了一個(gè)DNA序列中編碼區(qū)具有的周期性的強(qiáng)弱,忽視了非編碼區(qū)和噪聲的作用,在3種閾值中最大限度地提高了編碼區(qū)被識(shí)別的可能性。

      3 小結(jié)

      對(duì)獨(dú)立基因預(yù)測(cè)算法中的閾值問(wèn)題進(jìn)行了研究,提出了一種新的閾值PSDN。結(jié)果表明,以PSDN作為閾值獲得的預(yù)測(cè)準(zhǔn)確率最好,使NPBF預(yù)測(cè)算法得到了簡(jiǎn)化。與以RSN和PNCR為閾值的預(yù)測(cè)算法相比,明顯改善了編碼區(qū)長(zhǎng)度占DNA序列長(zhǎng)度比值較高情況下的預(yù)測(cè)結(jié)果。

      參考文獻(xiàn):

      [1] CHEN B,JI P.Visualization of the protein-coding regions with a self adaptive spectral rotation approach[J]. Nucleic Acids Research,2011,39(1):e3.

      [2] MEHER J, MEHER P K,DASH G.Improved comb filter based approach for effective prediction of protein coding regions in DNA sequences[J]. Journal of Signal and Information Processing,2011,2(2):88-99.

      [3] MA Y T,CHE J,LU X G,et al. A new algorithm for predicting protein coding regions based on the hybrid threshold [A]. The 2012 5th International Conference on Biomedical Engineering and Informatics[C]. Chongqing:IEEE Engineering in Medicine and Biology Society,2012.846-849.

      [4] TIWARI S,RAMACHANDRAN S, BHATTACHARYA A, et al. Prediction of probable genes by fourier analysis of genomic sequences[J]. Computer Applications in the Bioscience,1997, 13(3):263-270.

      [5] MENA-CHALCO J P, CARRER H, ZANA Y, et al. Identification of protein coding regions using the modified Gabor-Wavelet transform[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2008,5(2):198-206.

      [6] AMBIKAIRAJAH E, EPPS J,AKHTAR M.Gene and exon prediction using time domain algorithms[A]. IEEE 8th Int Symp Symposium on Proceedings of the Eighth International Signal Processing and its Applications[C]. Sydney:Signal Processing and its Applications,2005.199-202.

      [7] AKHTAR M, EPPS J,AMBIKAIRAJAH E. Signal processing in sequence analysis:Advances in Eukaryotic gene prediction [J].IEEE Journal of Selected Topics in Signal Processing,2008, 2(3):310-321.

      [8] 馬玉韜,車 進(jìn),關(guān) 欣,等.加窗窄通帶濾波器蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法[J].數(shù)據(jù)采集與處理,2013,28(2):129-135.

      [9] 馬玉韜,軒秀巍,車 進(jìn),等.基于全相位濾波理論的基因預(yù)測(cè)研究[J].上海交通大學(xué)學(xué)報(bào),2013,47(7):1149-1154.

      [10] ROGIC S,MACKWORTH A K,OUELLETTE B F.Evaluation of gene-finding programs on mammalian sequences[J].Genome Research,2001,11(5):817-832.

      [11] BURSET M,GUIGO R.Evaluation of gene structure prediction programs[J].Genomics,1996,34(3):353-367.

      [2] MEHER J, MEHER P K,DASH G.Improved comb filter based approach for effective prediction of protein coding regions in DNA sequences[J]. Journal of Signal and Information Processing,2011,2(2):88-99.

      [3] MA Y T,CHE J,LU X G,et al. A new algorithm for predicting protein coding regions based on the hybrid threshold [A]. The 2012 5th International Conference on Biomedical Engineering and Informatics[C]. Chongqing:IEEE Engineering in Medicine and Biology Society,2012.846-849.

      [4] TIWARI S,RAMACHANDRAN S, BHATTACHARYA A, et al. Prediction of probable genes by fourier analysis of genomic sequences[J]. Computer Applications in the Bioscience,1997, 13(3):263-270.

      [5] MENA-CHALCO J P, CARRER H, ZANA Y, et al. Identification of protein coding regions using the modified Gabor-Wavelet transform[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2008,5(2):198-206.

      [6] AMBIKAIRAJAH E, EPPS J,AKHTAR M.Gene and exon prediction using time domain algorithms[A]. IEEE 8th Int Symp Symposium on Proceedings of the Eighth International Signal Processing and its Applications[C]. Sydney:Signal Processing and its Applications,2005.199-202.

      [7] AKHTAR M, EPPS J,AMBIKAIRAJAH E. Signal processing in sequence analysis:Advances in Eukaryotic gene prediction [J].IEEE Journal of Selected Topics in Signal Processing,2008, 2(3):310-321.

      [8] 馬玉韜,車 進(jìn),關(guān) 欣,等.加窗窄通帶濾波器蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法[J].數(shù)據(jù)采集與處理,2013,28(2):129-135.

      [9] 馬玉韜,軒秀巍,車 進(jìn),等.基于全相位濾波理論的基因預(yù)測(cè)研究[J].上海交通大學(xué)學(xué)報(bào),2013,47(7):1149-1154.

      [10] ROGIC S,MACKWORTH A K,OUELLETTE B F.Evaluation of gene-finding programs on mammalian sequences[J].Genome Research,2001,11(5):817-832.

      [11] BURSET M,GUIGO R.Evaluation of gene structure prediction programs[J].Genomics,1996,34(3):353-367.

      [2] MEHER J, MEHER P K,DASH G.Improved comb filter based approach for effective prediction of protein coding regions in DNA sequences[J]. Journal of Signal and Information Processing,2011,2(2):88-99.

      [3] MA Y T,CHE J,LU X G,et al. A new algorithm for predicting protein coding regions based on the hybrid threshold [A]. The 2012 5th International Conference on Biomedical Engineering and Informatics[C]. Chongqing:IEEE Engineering in Medicine and Biology Society,2012.846-849.

      [4] TIWARI S,RAMACHANDRAN S, BHATTACHARYA A, et al. Prediction of probable genes by fourier analysis of genomic sequences[J]. Computer Applications in the Bioscience,1997, 13(3):263-270.

      [5] MENA-CHALCO J P, CARRER H, ZANA Y, et al. Identification of protein coding regions using the modified Gabor-Wavelet transform[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2008,5(2):198-206.

      [6] AMBIKAIRAJAH E, EPPS J,AKHTAR M.Gene and exon prediction using time domain algorithms[A]. IEEE 8th Int Symp Symposium on Proceedings of the Eighth International Signal Processing and its Applications[C]. Sydney:Signal Processing and its Applications,2005.199-202.

      [7] AKHTAR M, EPPS J,AMBIKAIRAJAH E. Signal processing in sequence analysis:Advances in Eukaryotic gene prediction [J].IEEE Journal of Selected Topics in Signal Processing,2008, 2(3):310-321.

      [8] 馬玉韜,車 進(jìn),關(guān) 欣,等.加窗窄通帶濾波器蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法[J].數(shù)據(jù)采集與處理,2013,28(2):129-135.

      [9] 馬玉韜,軒秀巍,車 進(jìn),等.基于全相位濾波理論的基因預(yù)測(cè)研究[J].上海交通大學(xué)學(xué)報(bào),2013,47(7):1149-1154.

      [10] ROGIC S,MACKWORTH A K,OUELLETTE B F.Evaluation of gene-finding programs on mammalian sequences[J].Genome Research,2001,11(5):817-832.

      [11] BURSET M,GUIGO R.Evaluation of gene structure prediction programs[J].Genomics,1996,34(3):353-367.

      猜你喜歡
      信噪比
      兩種64排GE CT冠脈成像信噪比與劑量對(duì)比分析研究
      基于經(jīng)驗(yàn)分布函數(shù)快速收斂的信噪比估計(jì)器
      一種基于2G-ALE中快速信噪比的估計(jì)算法
      無(wú)線通信中的信噪比估計(jì)算法研究
      信噪比在AR模型定階方法選擇中的研究
      自跟蹤接收機(jī)互相關(guān)法性能分析
      基于深度學(xué)習(xí)的無(wú)人機(jī)數(shù)據(jù)鏈信噪比估計(jì)算法
      低信噪比下LFMCW信號(hào)調(diào)頻參數(shù)估計(jì)
      低信噪比下基于Hough變換的前視陣列SAR稀疏三維成像
      不同信噪比下的被動(dòng)相控陣?yán)走_(dá)比幅測(cè)角方法研究
      封丘县| 漳浦县| 松江区| 杭州市| 桦甸市| 浦江县| 易门县| 青川县| 临猗县| 中山市| 历史| 崇仁县| 白银市| 黎平县| 丹阳市| 文昌市| 维西| 河北区| 通州市| 治县。| 贞丰县| 两当县| 新闻| 台东县| 洪洞县| 新巴尔虎右旗| 河源市| 康定县| 措勤县| 榆中县| 南部县| 娄烦县| 富蕴县| 陇川县| 文山县| 华蓥市| 丹巴县| 邢台县| 淳化县| 东安县| 洞头县|