Robust Phonetic Segmentation Using Spectral Transition measure for Non-Standard Recording Environments

Bhavik Vachhani, Chitralekha Bhat, Sunil Kopparapu

Phone level localization of mis-articulation is a key requirement for an automatic articulation error assessment system. A robust phone segmentation technique is essential to aid in real-time assessment of phone level mis-articulations of speech, wherein the audio is recorded on mobile phones or tablets. This is a non-standard recording set-up with little control over the quality of recording. We propose a novel post processing technique to aid Spectral Transition Measure(STM)-based phone segmentation under noisy conditions such as environment noise and clipping, commonly present during a mobile phone recording. A comparison of the performance of our approach and phone segmentation using traditional MFCC and PLPCC speech features for Gaussian noise and clipping is shown. The proposed approach was validated on TIMIT and Hindi speech corpus and was used to compute phone boundaries for a set of speech, recorded simultaneously on three devices - a laptop, a stationarily placed tablet and a handheld mobile phone, to simulate different audio qualities in a real-time non-standard recording environment. F-ratio was the metric used to compute the accuracy in phone boundary marking. Experimental results show an improvement of 7% for TIMIT and 10% for Hindi data over the baseline approach. Similar results were seen for the set of three of recordings collected in-house.

Knowledge Graph



Sign up or login to leave a comment