Presentation is loading. Please wait.

Presentation is loading. Please wait.

Institute of Information Science, Academia Sinica 12 July, IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.

Similar presentations


Presentation on theme: "Institute of Information Science, Academia Sinica 12 July, IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee."— Presentation transcript:

1 Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee ( 李鴻欣 ) Based on Chen and Wang in ISCSLP’08 and Interspeech’09

2 Page-2 Detection-Based ASR Knowledge Detection Knowledge Detection Integration Knowledge (Higher Level) Knowledge (Higher Level) Phonological attr. Prosodic attr. Acoustic attr. … Human SR HMM CRF … HMM CRF … DB ASR Detectors Integrator Results Phone Syllable Word Sentence Semantic info … Phone Syllable Word Sentence Semantic info …

3 Page-3 Phonological Systems SPE (Sound Pattern of English) MV (Multi-valued Feature) GP (Government Phonology) Literatures (N. Chomsky & M. Halle, 1968) (S. King, 2000)?(J. Harris, 1994) Feature Types Production-based, Binary Production-based, 2-10 values Sound structure primes, Binary Feature Number 13611 Examples anterior, nasal, round centrality, front back, manner, phonation, place, roundness

4 Page-4 Phonological Feature Detection (1) MLP (Detectors) hidden layer posterior probability quantization SPE_14 0101...010101...01 0101...010101...01 GP_11 011..01011..01 011..01011..01 ii-4i+4 9 frames 13 MFCCs input layer recurrent time-delay

5 Page-5 Phonological Feature Detection (2) ii-4i+4 9 frames 13 MFCCs MLP (Centrality) MLP (Front-Back) MLP (Roundness) 01000100 01000100 100100 100100 010010 010010 0100100.........0100100100.........010 0100100.........0100100100.........010 MV_29 time-delay 6 MV Features

6 Page-6 Conditional Random Field (CRF) Integrator General Chain CRF state feature functiontransition feature function λ j, μ k : feature function weight parameters............ X y i-1 Output (phone) Input (phonological features) yiyi x i-1 xixi x i+1 Y........................

7 Page-7 CRF Integrator – Training Issues Required Label for CRF Training –Phone: y –Phonological features: x Detectors MLP Detectors MLP Speech Detected-data trained CRF Phonological features (with errors) DT CRF DT CRF Phone labels Mapping phones → phonological features Mapping phones → phonological features Phone labels Oracle-data trained CRF Phonological features OT CRF OT CRF Training Data

8 Page-8 Experiments Corpus: TIMIT –No SA1, SA2 –Training set (3296 utts), Dev set (400 utts) –Test set (1344 utts) Phone set: TIMIT61 –Evaluation: CMU/MIT 39 Baseline –CI-HMM Toolkits –Nico Toolkit (for MLP), CRF++ (for CRF)

9 Page-9 Results (1) Phone Corr. %Phone Acc. % SPE1493.2893.20 GP1198.3998.36 MV2988.7588.56 Model:OT CRF Test:OD Features Phone Corr. %Phone Acc. % HMM-baseline69.0263.45 OT CRF SPE1466.1929.68 GP1169.0331.38 MV2959.2430.33 DT CRF SPE1456.5655.27 GP1155.7454.53 MV2951.8450.68 Model:OT/DT CRF Test:DD Features

10 Page-10 Results (2) Methods# SystemPhone Corr. (%)Phone Acc. (%) HMM baseline169.0263.45 OT: SPE+GP+MV361.9760.65 DT: SPE+GP+MV352.9052.06 OT+DT: SPE+GP+MV660.8159.20 OT: SPE+GP+MV +HMM465.5364.31 DT: SPE+GP+MV +HMM459.5758.64 OT+DT: SPE+GP+MV +HMM764.2262.59 System Fusion

11 Page-11 System Fusion with CRF............ X y i-1 Combined Results (Phone) Phone Sequence yiyi x i-1 xixi x i+1 Y........................ SPE Sys. MV Sys. GP Sys. HMM Sys.

12 Page-12 Two Types of AFDT Imperfection h# n eh ow kcl k w eh ae eh s tcl t ix n Phone AF(A) AF(A’) AF asynchronyAFDT errors

13 Page-13 CRF Training (1) Phone y AFs x t Mapping Table Phone AFs Oracle Data Training Phone y AFs x t AFDT Detected Data Training Detected Errors

14 Page-14 CRF Training (2) Phone y AFs x t AFDT Aligned Data Training AF Sequence

15 Page-15 Results (3) SystemPhone Corr. (%)Phone Acc. (%) Upper Bound OT CRF98.3198.28 AT CRF71.4970.31 Real Case OT CRF70.5534.38 DT CRF57.3056.14 AT CRF64.8762.32 27.97 % acc. drops on the introduction of AF asynchrony Detection Error causes further 7.99 % acc. drop

16 Page-16 AF Asynchrony Compensation AF asynchrony is caused by context variation We can reduce AF asynchrony by letting our systems learn context variation directly – Long-Term information Windows + DCTs MLP Windows + DCTs Right Context Left Context 23 dim Mel MLP 310ms 144Dim 72Dim

17 Page-17 Results (4) Test Data TypeSystemCorrAcc - CI-HMM69.0263.45 - CD-HMM75.7665.78 Detected (real case) OT CRF (±3)75.2447.97 Long Term AFDT + DT CRF (±3)64.5863.12 Ideal (upper bound) Long Term AFDT + AT CRF74.9673.64 MFCC AFDT + AT CRF (±3)72.8771.62 Long Term AFDT + AT CRF (±3)76.8374.97 Detected (real case) Long Term AFDT + AT CRF69.8366.97 MFCC AFDT + AT CRF (±3)66.2163.16 Long Term AFDT + AT CRF (±3)71.0167.67

18 Page-18 Conclusions A well-designed phonological feature system is important –AF asynchrony minimization training and AF-phone synchronization could also be investigated Oracle Trained CRF is able to retrieve more phonological information from speech –High phone correction rate (but sensitive to detection error) –Helpful for combination Detection-Based ASR is promising –A front-end detector is a major issue

19 Page-19 AF and Phone Alignment Using AFDT t t t t t phone sequence AF sequence


Download ppt "Institute of Information Science, Academia Sinica 12 July, IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee."

Similar presentations


Ads by Google