關(guān)?欣,趙昊月,李?鏘
用于指法估計(jì)的音高差數(shù)據(jù)增強(qiáng)方法
關(guān)?欣,趙昊月,李?鏘
(天津大學(xué)微電子學(xué)院,天津 300072)
指法估計(jì)模型的性能除了與自身的結(jié)構(gòu)有關(guān),數(shù)據(jù)本身的數(shù)量和質(zhì)量也是其重要影響因素.然而,樂譜指法的標(biāo)注需要標(biāo)注者有一定演奏經(jīng)驗(yàn),且標(biāo)注過程費(fèi)時(shí)費(fèi)力,導(dǎo)致現(xiàn)有樂譜-指法數(shù)據(jù)集稀少,且增速緩慢.為解決數(shù)據(jù)集樣本數(shù)量有限帶來的模型表現(xiàn)不佳、參數(shù)過擬合等問題,提出了兩種針對(duì)鍵盤類樂器樂譜音高差指法數(shù)據(jù)的數(shù)據(jù)增強(qiáng)方法.通過分析樂譜-指法數(shù)據(jù)的統(tǒng)計(jì)特征,一方面結(jié)合鍵盤類樂器和指法的映射關(guān)系,提出了基于隱馬爾可夫模型的數(shù)據(jù)增強(qiáng)方法,另一方面結(jié)合雙手手部生理學(xué)特性,提出了左右手鏡像變換的數(shù)據(jù)增強(qiáng)方法.將本文提出的兩種增強(qiáng)方法生成的數(shù)據(jù)加入訓(xùn)練集,經(jīng)過與人工確定指法思路相近的雙向長短期記憶網(wǎng)絡(luò)學(xué)習(xí)后,一般匹配率提高了2.24%,最高匹配率提升了3.73%.結(jié)果表明數(shù)據(jù)增強(qiáng)有助于模型更好地學(xué)習(xí)音指特征.將基于隱馬爾可夫模型的數(shù)據(jù)增強(qiáng)方法生成的“再采樣數(shù)據(jù)集”和基于手部生理學(xué)特性生成的“左右手鏡像變換數(shù)據(jù)集”分別加入訓(xùn)練,對(duì)指法估計(jì)結(jié)果中單音和復(fù)音占比75%以上的樂譜匹配率分別進(jìn)行統(tǒng)計(jì),結(jié)果表明再采樣數(shù)據(jù)可以增強(qiáng)數(shù)據(jù)集本身的統(tǒng)計(jì)特征,左右手鏡像變換數(shù)據(jù)可以彌補(bǔ)一些數(shù)據(jù)集原先沒有的音指規(guī)律,說明了兩種數(shù)據(jù)增強(qiáng)方法在鍵盤樂器指法估計(jì)任務(wù)中的有效性.
指法;數(shù)據(jù)增強(qiáng);統(tǒng)計(jì)學(xué)習(xí);左右手變換
自動(dòng)指法估計(jì)已成為音樂符號(hào)信息處理的新興主題之一,對(duì)于指法估計(jì)的研究也在蓬勃發(fā)展.但其數(shù)據(jù)集的搜集還處于初級(jí)階段,很多研究者還在使用小規(guī)模的私人非公用數(shù)據(jù)集[1].建立指法數(shù)據(jù)時(shí)需要大量專業(yè)演奏者手工逐一標(biāo)注,并且對(duì)于同一樂譜的指法數(shù)據(jù)并不唯一,搜集階段耗時(shí)較長且工作量大.目前基于數(shù)據(jù)驅(qū)動(dòng)模型的數(shù)據(jù)增強(qiáng)方法已經(jīng)成為解決小數(shù)據(jù)集過擬合問題、提高模型識(shí)別準(zhǔn)確率的有效手段之一[2].現(xiàn)階段還沒有針對(duì)鍵盤樂器指法數(shù)據(jù)的數(shù)據(jù)增強(qiáng)方法.
數(shù)據(jù)增強(qiáng)(data augmentation)是指通過引入未觀測(cè)到的數(shù)據(jù)或隱變量構(gòu)建迭代優(yōu)化或采樣算法的方法[3].?dāng)?shù)據(jù)增強(qiáng)的目的是通過應(yīng)用轉(zhuǎn)換來人為地創(chuàng)建新的訓(xùn)練數(shù)據(jù),將轉(zhuǎn)換(例如音頻片段的拉伸或壓縮)應(yīng)用到輸入數(shù)據(jù),同時(shí)保留類標(biāo)簽.這種轉(zhuǎn)換有許多潛在的好處:數(shù)據(jù)增強(qiáng)可以編碼以前有關(guān)數(shù)據(jù)或任務(wù)特定不變的知識(shí),作為規(guī)范模板使生成的模型更加穩(wěn)健,并為以數(shù)據(jù)為基礎(chǔ)的深度學(xué)習(xí)模型提供資源[4].
指法來源于演奏者對(duì)樂譜的認(rèn)知,受約束于其手部條件與鍵盤排布.樂譜-指法數(shù)據(jù)中的指法靈活多變,不同于自然語言處理(natural language process-ing,NLP)中詞性標(biāo)注任務(wù)中一個(gè)詞有且只有一個(gè)正確標(biāo)簽,即便受制于指法規(guī)則,指法也往往可以有多種排列組合形式.研究人員通常通過改善模型結(jié)?構(gòu)[1,5]或設(shè)置一些模型約束條件[6-7]來獲得更優(yōu)的指法估計(jì)結(jié)果,但數(shù)據(jù)本身的不足是模型參數(shù)學(xué)習(xí)受限的根本原因.?dāng)?shù)據(jù)增強(qiáng)能在不增加人工成本的情況下擴(kuò)展數(shù)據(jù)集,節(jié)約大量時(shí)間和精力.
本文主要提出了兩種關(guān)于鍵盤類樂器指法數(shù)據(jù)的數(shù)據(jù)增強(qiáng)方法,其中基于隱馬爾可夫的方法適用于所有鍵盤類樂器,基于左右手變換的方法適用于雙手演奏的鍵盤類樂器,例如鋼琴、電子琴等.基于隱馬爾可夫的方法和左右手變換方法分別對(duì)一般匹配率提升了1.75%和1.39%,最高匹配率提升了2.43%和2.16%,兩種增強(qiáng)方法同時(shí)使用也對(duì)結(jié)果有2.24%與3.73%的提升.
數(shù)據(jù)增強(qiáng)是在不實(shí)質(zhì)性的增加數(shù)據(jù)的情況下,從原始數(shù)據(jù)加工出更多的表示,提高原數(shù)據(jù)的數(shù)量及質(zhì)量,以接近于更多數(shù)據(jù)量產(chǎn)生的價(jià)值[8].?dāng)?shù)據(jù)增強(qiáng)有助于加強(qiáng)數(shù)據(jù)內(nèi)在特征,減少模型過擬合,提升泛化能力.
數(shù)據(jù)增強(qiáng)主要分為基于樣本變換的數(shù)據(jù)增強(qiáng)和基于樣本特征的數(shù)據(jù)增強(qiáng).前者是利用數(shù)據(jù)變換規(guī)則對(duì)已有數(shù)據(jù)擴(kuò)增,例如圖像的幾何操作[9]、文本的同義詞替換[10]、時(shí)間序列的添加噪聲[11]等.基于樣本特征的數(shù)據(jù)增強(qiáng)是通過神經(jīng)網(wǎng)絡(luò)或某種模型將樣本數(shù)據(jù)的特征提取為具體參數(shù),利用參數(shù)對(duì)數(shù)據(jù)進(jìn)行擴(kuò)增,例如基于神經(jīng)風(fēng)格遷移的數(shù)據(jù)增強(qiáng)、構(gòu)造生成網(wǎng)絡(luò)生成數(shù)據(jù)等.
基于樣本變換數(shù)據(jù)增強(qiáng)是基礎(chǔ)的數(shù)據(jù)增強(qiáng)方法,目前在機(jī)器視覺以及自然語言處理中已有廣泛的應(yīng)用[8,12].基于樣本變換的數(shù)據(jù)增強(qiáng)方法立足于數(shù)據(jù)本身特性.例如在圖像數(shù)據(jù)中,圖像的位置信息、顏色信息、局部信息對(duì)標(biāo)簽影響不大,通過幾何操作、顏色變換、隨機(jī)擦除等方法[9,13-14]增加的數(shù)據(jù)能使系統(tǒng)更好地區(qū)分背景與物體,著眼于顏色間的梯度變化和整體輪廓;在序列數(shù)據(jù)中,有添加噪聲、窗口裁剪、頻率域變換等方法[11].基于樣本變換的數(shù)據(jù)增強(qiáng)方法會(huì)根據(jù)數(shù)據(jù)類型的不同有不同的表現(xiàn)形式,可以讓系統(tǒng)降低或消除數(shù)據(jù)中冗余信息的權(quán)重,從而提高模型泛化能力.
基于樣本特征的數(shù)據(jù)增強(qiáng)依托系統(tǒng)或網(wǎng)絡(luò)對(duì)數(shù)據(jù)的提取.例如moment exchange(MoEx)方法[15]利用神經(jīng)網(wǎng)絡(luò)特征空間中可以捕獲圖像底層結(jié)構(gòu)的特征矩(標(biāo)準(zhǔn)差和方差),通過樣本間隨機(jī)交換特征矩實(shí)現(xiàn)特征空間的數(shù)據(jù)增強(qiáng);生成模型如變分自編碼網(wǎng)絡(luò)(variational auto-encoding network,VAEN)[16]和生成對(duì)抗網(wǎng)絡(luò)(generative adversarial network,GAN)[17]通過訓(xùn)練樣本固定其內(nèi)部參數(shù)用于數(shù)據(jù)增強(qiáng)[18];神經(jīng)風(fēng)格遷移(neural style transfer)[19]通過隨機(jī)初始化的生成圖像和其定義的代價(jià)函數(shù)()生成新樣本.基于樣本特征的數(shù)據(jù)增強(qiáng)方法是通過強(qiáng)化有效信息來提高模型精確度基于樣本變換和特征的數(shù)據(jù)增強(qiáng)方法正交,可以聯(lián)合使用.以上研究啟示筆者可以從樂譜-指法數(shù)據(jù)特點(diǎn)和適合該數(shù)據(jù)的模型的特征空間探索鋼琴指法數(shù)據(jù)的數(shù)據(jù)增強(qiáng)方法.
數(shù)據(jù)增強(qiáng)需要從已知數(shù)據(jù)本身出發(fā),通過增強(qiáng)方法突出樣本與標(biāo)簽間的顯性聯(lián)系.針對(duì)指法估計(jì)任務(wù),筆者從數(shù)據(jù)本身帶來的信息結(jié)合人類確定指法的邏輯思路來設(shè)計(jì)增強(qiáng)方法.
結(jié)合基本鍵盤樂器演奏常識(shí),分析音高及指法數(shù)據(jù)特征的優(yōu)化點(diǎn).
(1) 音符指法數(shù)據(jù)集是一種一維音符序列到一維指法序列的數(shù)據(jù)集.鍵盤樂器的音符指法數(shù)據(jù)集,例如鋼琴、手風(fēng)琴、管風(fēng)琴、電子琴等鍵盤樂器,其指法與樂譜上的音符往往是一一對(duì)應(yīng)的關(guān)系,也就是可以將鍵盤樂器的指法估計(jì)問題看成是序列到序列的映射.
(2) 音程隱含著琴鍵相對(duì)位置的信息.鍵盤上琴鍵是由低音到高音依次排布的,那么鍵盤樂器的樂譜上的音符可以對(duì)應(yīng)到每個(gè)琴鍵,音高信息確定了哪個(gè)位置的琴鍵被彈奏.音程是兩個(gè)音高之間的跨?度[20],進(jìn)一步說,演奏時(shí)后一琴鍵對(duì)于前一琴鍵的相對(duì)位置與當(dāng)前樂譜的音程信息有關(guān).
(3) 對(duì)于雙手演奏的鍵盤樂器,例如鋼琴或電子琴,若左右手彈奏升降調(diào)類型相同的同一琴鍵間物理距離,指法是相反的;若左右手分別彈奏升降調(diào)相反的同一物理距離,則指法相同.人的左右手的生理結(jié)構(gòu)是對(duì)稱的,如果分別用左右手彈奏同一首曲子,指法肯定不相同,如果在左手彈奏時(shí)將琴鍵的高低音順序交換,則因?yàn)樽笥沂謱?duì)應(yīng)的琴鍵分布也對(duì)稱,所以指法相同.研究表明,對(duì)于鍵盤樂器來說,彈奏樂曲時(shí)琴鍵的絕對(duì)位置并不重要,琴鍵的相對(duì)位置才是影響指法的確定的關(guān)鍵[21].綜上兩點(diǎn),即使不彈同一首曲子,只要琴鍵的相對(duì)位置一直相同,在左右手琴鍵也對(duì)稱的情況下,左右手指法相同.而在實(shí)際演奏中琴鍵是固定的,在跨度也就是相對(duì)物理距離一致的情況下(琴鍵絕對(duì)位置可能不同),如圖1所示,在左手彈奏大跨度升調(diào)樂譜時(shí)使用2指到1指的指法,如果換為右手彈奏,變?yōu)榱?指到2指,指法因手的類型不同而互換了.
(4) 對(duì)于彈奏和弦,左右手指法具有相似性,如圖2所示,彈奏琴鍵位置均間隔一個(gè)白鍵的和弦,左右手均使用1指、3指和5指彈奏.
演奏者確定指法時(shí)主要以舒適度和流暢性為目標(biāo),受手部人體工程學(xué)和琴鍵位置的自然約束.琴鍵所在位置是指法確定的基礎(chǔ),同時(shí),樂譜對(duì)應(yīng)的前后琴鍵的位置關(guān)系、前后指法的指法狀態(tài)也限制了當(dāng)前的指法選擇.
圖1?左右手演奏同一音程的升調(diào)示意
圖2?左右手演奏三和弦示意
指法是靈活的,針對(duì)某個(gè)樂譜的指法并不唯一,針對(duì)同一片段的指法可能會(huì)根據(jù)樂曲的速度和節(jié)奏不同而有所變化.在哈農(nóng)指法練習(xí)中為了訓(xùn)練某種專門設(shè)立針對(duì)性的指法,甚至可以用單指彈奏整首練習(xí)[22].演奏者會(huì)根據(jù)自身手部條件,結(jié)合樂曲的音樂表現(xiàn)性對(duì)指法統(tǒng)籌規(guī)劃,設(shè)計(jì)出具有連貫流暢的指法.
數(shù)據(jù)處理思路主要有兩種,一種是將音高信息轉(zhuǎn)換為其對(duì)應(yīng)的MIDI號(hào)作為輸入來估計(jì)指法[1,5],一種是將表征琴鍵間的相對(duì)位置信息作為輸入[6-7].經(jīng)第2.1節(jié)分析,相對(duì)位置信息比原始音高信息更有利于指法估計(jì)任務(wù),本文音高信息的表示方式參照文?獻(xiàn)[10].該文將處理過的數(shù)據(jù)稱為音高差(pitch-difference,PD)數(shù)據(jù),這種表示方法包含相鄰音高的物理位置間隔信息與時(shí)間信息,更利于模型對(duì)音指關(guān)系的探索.在該處理方式中,左右手?jǐn)?shù)據(jù)是分開處理,獨(dú)立訓(xùn)練的.具體表示方式為
式中:表示音符音高對(duì)應(yīng)的Midi號(hào);表示音符的順序索引;表示當(dāng)前時(shí)刻音符的數(shù)量,即復(fù)音包含的音符個(gè)數(shù).當(dāng)前音符為單音時(shí),規(guī)定為0.
雖然樂譜-指法數(shù)據(jù)增強(qiáng),不能直接應(yīng)用自然語言處理領(lǐng)域的數(shù)據(jù)增強(qiáng)方法,但其中蘊(yùn)含的一些思路仍具有啟發(fā)意義.例如:在文本序列詞性標(biāo)注或機(jī)器翻譯等任務(wù)中,同義替換的數(shù)據(jù)增強(qiáng)方法十分常見.同義替換以保持兩個(gè)詞之間語義等價(jià)性的方式實(shí)現(xiàn)數(shù)據(jù)增強(qiáng).受此啟發(fā),對(duì)于樂譜-指法數(shù)據(jù)來說,可以通過保持樂譜數(shù)據(jù)和指法間映射關(guān)系的等價(jià)性實(shí)現(xiàn)數(shù)據(jù)增強(qiáng).
在指法標(biāo)注任務(wù)中,除非音符前面或后面有其他音符,否則任何指法都是可行的,即指法本質(zhì)是尋找一個(gè)平滑狀態(tài)轉(zhuǎn)換的最佳序列問題.因此,鋼琴演奏可以解釋為從手指位置的狀態(tài)序列中生成演奏音符序列的過程[5].基于此,筆者提出了一種基于隱馬爾可夫模型(hidden Markov model,HMM)[23]的數(shù)據(jù)增強(qiáng)方法,HMM之前也直接用于指法估計(jì)任務(wù)[5-6]并取得良好效果.
接下來的指法為
對(duì)應(yīng)的音高差為
以此便能生成指定長度的增強(qiáng)序列.
根據(jù)第2.1節(jié)可得,左右手的指法有一定相似性,且在彈奏中對(duì)于左右手的轉(zhuǎn)指規(guī)則是相同的[6,21],這啟示筆者在一定條件下,左(右)手的指法數(shù)據(jù)可以在升降調(diào)類型轉(zhuǎn)換后用于右(左)手?jǐn)?shù)據(jù)的訓(xùn)練.但由于數(shù)據(jù)處理中區(qū)分了單音與復(fù)音,筆者在處理中也保留這種數(shù)據(jù)處理帶來的優(yōu)勢(shì),具體轉(zhuǎn)換公式如下.
本文以鋼琴的指法數(shù)據(jù)集PIG[1]為實(shí)驗(yàn)數(shù)據(jù)集來驗(yàn)證本文提出的兩種數(shù)據(jù)增強(qiáng)方法的有效性.PIG數(shù)據(jù)集包含150首樂曲,其中有30首包含了4種以上的不同指法標(biāo)注,共有309例音符-指法數(shù)據(jù)樣本,是目前關(guān)于鍵盤類樂器的唯一公開樂譜-指法數(shù)據(jù)集,對(duì)于樂曲成分中的單音、各類和弦成分的統(tǒng)計(jì)結(jié)果如表1所示.
其中連續(xù)和弦為除單音和前后都是單音的復(fù)音外的音,連續(xù)單音是指除復(fù)音和前后都是復(fù)音的單音外的其他音.通過表1可以看出左右手演奏的樂譜成分在單音、和聲、三和弦、以及連續(xù)和弦、連續(xù)單音上差別較大.這個(gè)規(guī)律和演奏鋼琴的實(shí)際認(rèn)知是相符合的,右手通常主導(dǎo)旋律,一些快節(jié)奏的演奏也通常由右手完成,故單音占比較大,左手的演奏作為鋪墊,增添色彩的和弦通常由左手完成.
表1?樂譜數(shù)據(jù)集音高成分分布
Tab.1?Pitch composition distribution of the music database
使用PIG數(shù)據(jù)中30首樂曲對(duì)應(yīng)的150例音高差-指法數(shù)據(jù)作為測(cè)試集,其余120首樂曲拓展為的159例樣本作為訓(xùn)練集.考慮到整體數(shù)據(jù)量,實(shí)驗(yàn)使用一階HMM,生成50例長度在150~300之間的數(shù)據(jù),稱為“再采樣數(shù)據(jù)集”.將基于左右手變換后的增強(qiáng)數(shù)據(jù)稱為“左右手鏡像變換數(shù)據(jù)集”.
本文以與數(shù)據(jù)真值的一致程度為評(píng)價(jià)標(biāo)準(zhǔn),即模型標(biāo)注結(jié)果與指法標(biāo)簽一致的數(shù)量在整個(gè)樂譜中的占比,本文稱為匹配率,匹配率的計(jì)算公式為
式中:為人工標(biāo)注結(jié)果;為算法標(biāo)注結(jié)果;len為樂譜總長度.
由于涉及到一個(gè)樂譜對(duì)應(yīng)多個(gè)指法的問題,個(gè)樂譜加入不同的指法真值序列后變?yōu)間en個(gè)測(cè)試樣本的情況下,本文參考文獻(xiàn)[1]中的評(píng)價(jià)方法使用兩種指標(biāo).計(jì)算估計(jì)結(jié)果與每個(gè)指法真值序列匹配率平均值記為一般匹配率gen,計(jì)算每首曲子中和估計(jì)結(jié)果最相似的指法真值序列的匹配率,將其平均值記為最高匹配率high,表示方式為
音高差數(shù)據(jù)是一維序列,加入時(shí)序考量的循環(huán)神經(jīng)網(wǎng)絡(luò)[24](recurrent neural networks,RNN),比更普遍用于分類的前饋神經(jīng)網(wǎng)絡(luò)、卷積神經(jīng)網(wǎng)絡(luò)更適合指法標(biāo)注任務(wù),但原始的RNN在訓(xùn)練中容易出現(xiàn)梯度爆炸和梯度消失的問題導(dǎo)致其無法捕捉長距離依賴.故選用在原始RNN基礎(chǔ)上發(fā)展起來的長短期記憶(long short-term memory,LSTM)[25]網(wǎng)絡(luò)作為驗(yàn)證實(shí)驗(yàn)的基礎(chǔ).
LSTM主要功能由其內(nèi)部3個(gè)門實(shí)現(xiàn),如圖3所示.針對(duì)指法估計(jì)問題,其內(nèi)部功能可做如下具體化:遺忘門可以控制之前的音高差和估計(jì)出的指法對(duì)目前指法估計(jì)的影響,輸入門控制當(dāng)前音高差對(duì)指法的影響,輸出門控制包括當(dāng)前的音高差在內(nèi)的長期音高差記憶與之前的長期指法記憶是否對(duì)當(dāng)前估計(jì)指法產(chǎn)生影響.LSTM單元前向過程貼近人工確定指法的思路,根據(jù)第2.2節(jié)所述,后文的音高信息與指法信息對(duì)當(dāng)前指法也有所約束,故采用雙向長短期記憶(bidirectional LSTM,Bi-LSTM)網(wǎng)絡(luò)作為實(shí)驗(yàn)?zāi)P停?/p>
圖3?LSTM單元示意
在模型訓(xùn)練中使用Adam自適應(yīng)學(xué)習(xí)率優(yōu)化器進(jìn)行自動(dòng)優(yōu)化,使用3層LSTM網(wǎng)絡(luò).模型初始學(xué)習(xí)率為0.005且每經(jīng)過10輪(Epoch)訓(xùn)練使其乘以0.8,通過不斷降低學(xué)習(xí)率,找到損失的最小值.
由于左右手轉(zhuǎn)換數(shù)據(jù)會(huì)生成309個(gè)樣本,約為真實(shí)訓(xùn)練集樣本的2倍,因此在有左右手變換數(shù)據(jù)集訓(xùn)練過程中,將真實(shí)訓(xùn)練樣本復(fù)制擴(kuò)容為原來的3倍,以防增強(qiáng)集在訓(xùn)練集中占比過大導(dǎo)致系統(tǒng)模型學(xué)習(xí)音指關(guān)系偏差.
為了體現(xiàn)兩種數(shù)據(jù)增強(qiáng)方法的作用,筆者設(shè)計(jì)了4組對(duì)比實(shí)驗(yàn),均使用BI-LSTM模型,結(jié)果如表2?所示.
表2?增強(qiáng)方法對(duì)比結(jié)果
Tab.2?Comparison results of augmentation methods
圖4?單音占比75%以上的數(shù)據(jù)結(jié)果統(tǒng)計(jì)
圖5?復(fù)音占比75%以上的數(shù)據(jù)結(jié)果統(tǒng)計(jì)
在單音占比75%以上的曲譜中,對(duì)于右手來說,再采樣數(shù)據(jù)集加入的效果優(yōu)于左右手變換數(shù)據(jù)集的加入,對(duì)于左手?jǐn)?shù)據(jù)則正好相反;在復(fù)音占比75%以上的樂譜中,對(duì)于右手來說,加入左右手變換數(shù)據(jù)后效果由于再采樣數(shù)據(jù),甚至超過全部數(shù)據(jù)集訓(xùn)練的結(jié)果,對(duì)于左手?jǐn)?shù)據(jù)則是加入再采樣數(shù)據(jù)的效果優(yōu)于左右手變換數(shù)據(jù).
結(jié)果進(jìn)一步說明了兩種增強(qiáng)方法的優(yōu)勢(shì),結(jié)合數(shù)據(jù)樣本的分布,再采樣數(shù)據(jù)集增強(qiáng)了原數(shù)據(jù)集的特色,從而使優(yōu)勢(shì)得到更好的發(fā)揮,例如其在單音較多右手?jǐn)?shù)據(jù)與復(fù)音較多的左手?jǐn)?shù)據(jù)中取得了良好表現(xiàn);左右手變換數(shù)據(jù)集彌補(bǔ)了原數(shù)據(jù)集中指法多樣性的缺陷,例如其在復(fù)音占比75%以上的右手?jǐn)?shù)據(jù)與單音占比75%以上的左手?jǐn)?shù)據(jù)中取得了更好的表現(xiàn).在右手多復(fù)音數(shù)據(jù)統(tǒng)計(jì)中,兩種增強(qiáng)方法的數(shù)據(jù)集全部加入后,再采樣數(shù)據(jù)對(duì)參數(shù)的影響匹配率使得有所減少,這一結(jié)果也驗(yàn)證了左右手?jǐn)?shù)據(jù)的差異性較大.
本文提出了兩種針對(duì)鍵盤樂器指法估計(jì)的數(shù)據(jù)增強(qiáng)方法,其中一種是基于數(shù)據(jù)集樣本與標(biāo)簽的統(tǒng)計(jì)分布特性的增強(qiáng)方法,另一種是基于左右手指法的鏡像對(duì)稱性的增強(qiáng)方法,并通過鋼琴指法數(shù)據(jù)集PIG驗(yàn)證了這兩種數(shù)據(jù)增強(qiáng)方法的有效性.鑒于鍵盤樂器的相似性,這兩種數(shù)據(jù)增強(qiáng)方法也適用于各類雙手彈奏鍵盤的樂器的指法估計(jì).再采樣數(shù)據(jù)可以增強(qiáng)數(shù)據(jù)集本身統(tǒng)計(jì)特征,左右手鏡像變換數(shù)據(jù)可以彌補(bǔ)一些數(shù)據(jù)集原先沒有的音指規(guī)律.同時(shí)實(shí)驗(yàn)結(jié)果也表明,數(shù)據(jù)增強(qiáng)有助于模型自然地整合域知識(shí)間的關(guān)系,使音高差更容易地轉(zhuǎn)換為指法數(shù)據(jù).
基于HMM的數(shù)據(jù)增強(qiáng)方法通過將數(shù)據(jù)的統(tǒng)計(jì)分布特征轉(zhuǎn)換為具有馬爾可夫和獨(dú)立輸出特性的模型參數(shù)實(shí)現(xiàn)數(shù)據(jù)增強(qiáng),該思路也可遷移用于其他非鍵盤類樂器指法標(biāo)注的數(shù)據(jù)增強(qiáng)中.其統(tǒng)計(jì)指法轉(zhuǎn)移頻率的矩陣是靈活的,在數(shù)據(jù)量充足的情況下,可以聯(lián)系更長的指法關(guān)系,在該情況下,增強(qiáng)后的數(shù)據(jù)更符合手部人體工程學(xué),甚至其片段有出現(xiàn)在真實(shí)樂譜里的可能性.
[1] Nakamura E,Saito Y,Yoshii K. Statistical learning and estimation of piano fingering[J]. Information Sciences,2020,517:68-85.
[2] 陳文兵,管正雄,陳允杰. 基于條件生成式對(duì)抗網(wǎng)絡(luò)的數(shù)據(jù)增強(qiáng)方法[J]. 計(jì)算機(jī)應(yīng)用,2018,38(11):3305-3311.
Chen Wenbing,Guan Zhengxiong,Chen Yunjie. Data enhancement method based on conditional generative confrontation network[J]. Computer Applications,2018,38(11):3305-3311(in Chinese).
[3] Van Dyk D A,Meng X L. The art of data augmentation[J]. Journal of Computational & Graphical Statistics,2001,10(1):1-50.
[4] Da O T,Gu A,Ratner A J,et al. A kernel theory of modern data augmentation[J]. Proc Mach Learn Res,2019,97:1528-1537.
[5] Yonebayashi Y,Kameoka H,Sagayama S. Automatic piano fingering decision based on hidden Markov models with latent variables in consideration of natural hand motions[J]. Ipsj Sig Notes,2007,2007(81):179-184.
[6] 李?鏘,李晨曦,關(guān)?欣. 基于判決HMM和改進(jìn)Viterbi的鋼琴指法自動(dòng)標(biāo)注方法[J]. 天津大學(xué)學(xué)報(bào)(自然科學(xué)與工程技術(shù)版),2020,53(8):814-824.
Li Qiang,Li Chenxi,Guan Xin. Automatic fingering annotation for piano score via judgement-HMM and improved Viterbi [J]. Journal of Tianjin University (Science and Technology),2020,53(8):814-824(in Chinese).
[7] Guan Xin,Zhao Haoyue,Li Qiang. Estimation of playable piano fingering by pitch-difference fingering match model[EB/OL]. https://asmp-eurasipjournals.springero-pen.com/articles/10.1186/s13636-022-00237-8,2022-04-11.
[8] 葛軼洲,許?翔,楊鎖榮,等. 序列數(shù)據(jù)的數(shù)據(jù)增強(qiáng)方法綜述[J]. 計(jì)算機(jī)科學(xué)與探索,2021,15(7):1207-1219.
Ge Yizhou,Xu Xiang,Yang Suorong,et al. A review of data enhancement methods for sequence data[J]. Computer Science and Exploration,2021,15(7):1207-1219(in Chinese).
[9] Chatfield K,Simonyan K,Vedaldi A,et al. Return of the devil in the details:Delving deep into convolutional nets[EB/OL].https://arxiv.org/abs/1405.3531v4,2014-10-05.
[10] Wei Jason,Zou Kai. EDA:Easy data augmentation techniques for boosting performance on text classification tasks[EB/OL]. https://arxiv.org/abs/1901.11196,2019-08-05.
[11] Wen Q,Sun L,Song X,et al. Time series data augmentation for deep learning:A survey[EB/OL]. https://arxiv.org/abs/2002.12478,2021-09-18.
[12] Simard P,Steinkraus D,Platt J C. Best practices for convolutional neural networks applied to visual document analysis[C]// Proceedings Seventh International Conference on Document Analysis and Recognition. Edinburgh,UK,2003:958-963.
[13] Lecun Y,Bottou L. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE,1998,86(11):2278-2324.
[14] Alex K,Ilya S,Geoffrey E,et al. Image net classification with deep convolutional neural networks[J]. Commun of the ACM,2017,60:84-90.
[15] Li B,Wu F,Lim S N,et al. On feature normalization and data augmentation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Nashville,USA,2021:12378-12387.
[16]翟正利,梁振明,周?煒,等. 變分自編碼器模型綜述[J]. 計(jì)算機(jī)工程與應(yīng)用,2019,55(3):1-9.
Zhai Zhengli,Liang Zhenming,Zhou Wei,et al. Overview of variational autoencoder model[J]. Computer Engineering and Applications,2019,55(3):1-9(in Chinese).
[17] Goodfellow I,Pouget-Abadie J,Mirza M,et al. Generative adversarial networks[J]. Advances in Neural Information Processing Systems,2014,3:2672-2680.
[18] Luis P,Jason W. The effectiveness of data augmentation in image classification using deep learning[EB/OL]. https://arxiv.org/abs/1712.04621,2019-10-13.
[19] Gatys L A,Ecker A S,Bethge M. Image style transfer using convolutional neural networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas,USA,2016:2414-2423.
[20] Hasegawa R. Tone representation and just intervals in contemporary music[J]. Contemporary Music Review,2006,25(3):263-281.
[21] Parncutt R,Sloboda J A,Clarke E F,et al. An ergonomic model of keyboard fingering for melodic fragments[J]. Music Perception,1997,14(4):341-382.
[22]朱?斌. 追尋鋼琴聲音美的必由之路——以鋼琴指法練習(xí)《哈農(nóng)》為例[J]. 音樂探索,2009,25(4):90-92.
Zhu Bin. On the finger-excising issue by taking Hanon as an example[J]. Music Exploration,2009,25(4):90-92(in Chinese).
[23] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE,1989,77(2):257-286.
[24] Graves A,Mohamed A,Hinton G. Speech recognition with deep recurrent neural networks[C]//2013 IEEE International Conference on Acoustics,Speech andSignal Processing. Vancouver,Canada,2013:6645-6649.
[25] Hochreiter S,Schmidhuber J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780.
Pitch-Difference Data Augmentation for Fingering Estimation
Guan Xin,Zhao Haoyue,Li Qiang
(School of Microelectronics,Tianjin University,Tianjin 300072,China)
Fingering estimation performance for keyboard instruments is mostly affected by the quantity and quality of data, in addition to the structure of the model. However, fingering labeling requires annotators to have playing experience, and the labeling process is time-consuming and labor-intensive, which leads to scarcity and slow development of fingering-score datasets. Two data augmentation methods are proposed to extend the fingering-score dataset for keyboard instruments to address the problem of poor model performance and parameter overfitting caused by limited datasets. By analyzing statistical features of fingering-score data, a data augmentation method based on hidden Markov models is proposed by combining the mapping relationship between keyboard instruments and fingerings. Another method using hands mirror transformation is proposed that combines the physiological characteristics of hands. The bidirectional long short-term memory network is used, which is similar to the manual process of fingering labeling for experimental verification. After adding the generated data to the training set using the proposed methods, the general and highest matching rates were reported to have increased by 2.24% and 3.73%, respectively. The results show that the data augmentation strategy assists the model in learning finger-score features well. The “resampling data” generated by the data augmentation method based on the hidden Markov model and the “hand mirror transformation data” based on the physiological characteristics of the hand were added to the training. The matching rates of scores with more than 75% monophonic ratio and more than 75% polyphonic ratio in the estimation results were calculated. The results show that resampling data can enhance the statistical characteristics of the original data set, and the hand mirror transformation data can provide certain fingering knowledge that were not in the original data set. The results demonstrate the effectiveness of the two data augmentation methods presented in this study for the task of fingering estimation for keyboard instruments.
fingering;data augmentation;statistical learning;physical transformation of hands
10.11784/tdxbz202112047
TP391
A
0493-2137(2023)02-0200-07
2021-12-29;
2022-03-21.
關(guān)?欣(1977—??),女,博士,副教授,guanxin@tju.edu.cn.
李?鏘,liqiang@tju.edu.cn.
國家自然科學(xué)基金資助項(xiàng)目(61471263);天津市自然科學(xué)基金資助項(xiàng)目(16JCZDJC31100).
Supported by the National Natural Science Foundation of China(No.61471263),the Natural Science Foundation of Tianjin,China (No.16JCZDJC31100).
(責(zé)任編輯:孫立華)