Presentation is loading. Please wait.

Presentation is loading. Please wait.

Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Similar presentations


Presentation on theme: "Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos."— Presentation transcript:

1 Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos Faúndez-Zanuy (faundez@eupmt.es) COST-277 Chairman

2 Non-linear speech processing: overview of COST-277 current research2 OUTLINE 1. Overview: what means “nonlinear”? 2. Organization of COST-277 3. Report activity june’01 – june’03

3 Non-linear speech processing: overview of COST-277 current research3 OUTLINE 1. Overview: what means “nonlinear”? 2. Organization of COST-277 3. Report activity june’01 – june’03

4 Non-linear speech processing: overview of COST-277 current research4 What means “Non-linear”? (Strict sense) Superposition principle does not hold: Given: f(x 1 )=y 1, f(x 2 ) =y 2 => f(ax 1 )=ay 1, f (x 1 +x 2 ) =y 1 +y 2

5 Non-linear speech processing: overview of COST-277 current research5 What means “Non-linear”? Strict sense: Really almost “everything” is nonlinear AcquisitionParameterization Models Quantizer (linear, A-law, etc.) CepstrumHMM, VQ

6 Non-linear speech processing: overview of COST-277 current research6 Non-linearities are always present Nonlinearities of the systems that generate the signal and/ or noise Nonlinearities of the signal acquisition system Nonlinearities of the transmission channel Nonlinearities of the human perception mechanism.

7 Non-linear speech processing: overview of COST-277 current research7 Classical approach Wide sense: linear speech processing Speech signal model consists of a pulse/ noise source and a linear filter where both change their characteristics on a frame- by-frame basis. This approach neglects structure known to be present in the speech signal.

8 Non-linear speech processing: overview of COST-277 current research8 Evidences of nonlinearities Residue comparison Correlation dimension Higher order statistics Probability density functions

9 Non-linear speech processing: overview of COST-277 current research9 Example: Linear vs NL

10 Non-linear speech processing: overview of COST-277 current research10 Drawbacks with NOLISP approaches A lack of a unifying theory of the different nonlinear processing tools (nnets, homomorphic, polynomial, morphological, ordered statistics filters, and so on) High computational burden Well known analysis tools are not applicable Usually, a closed-form formulation does not exist, and iterative methods (with local minima problems) must be used.

11 Non-linear speech processing: overview of COST-277 current research11 What are we mainly looking for? The replacement of the linear filter (or parts thereof) with nonlinear operators (models) should enable us to obtain an accurate description of the speech signal with a lower number of parameters. This in turn should lead to better performance of practical speech processing applications.

12 Non-linear speech processing: overview of COST-277 current research12 OUTLINE 1. Overview: what means “nonlinear”? 2. Organization of COST-277 3. Report activity june’01 – june’03

13 Non-linear speech processing: overview of COST-277 current research13 What is COST ? Intergovernmental Cooperation –Created in 1971 –17 Scientific and Technical Domains Participation –33 COST Countries –European Commission –International Organisations –Organizations from Non-COST Countries on Mutual Benefit Basis COST Actions –Concerted Actions of Nationally Funded R&D

14 Non-linear speech processing: overview of COST-277 current research14 COST TIST Telecommunications, Information Science and Technologies

15 Non-linear speech processing: overview of COST-277 current research15 COST Countries The fifteen EU Member States u u The EFTA Member States ä Iceland ä Norway ä Switzerland u Central and Eastern countries ä Estonia ä Latvia ä Lithuania ä Poland ä the Czech republic ä Slovakia ä Slovenia ä Croatia ä Romania ä Bulgaria u Other countries ä Cyprus ä Malta ä Turkey ä Hungary

16 Non-linear speech processing: overview of COST-277 current research16 Evolution of COST Actions 0 50 100 150 200 250 80 8182 838485868788899091929394959697989900 Total Actions Starting Actions

17 Non-linear speech processing: overview of COST-277 current research17 WHAT IS A COST ACTION? Concerted Action Pan-European “NON-COMPETITIVE” Research R&D Financed Nationally Flexibility Bottom-up A la carte participation Commission funds only coordination activities

18 Non-linear speech processing: overview of COST-277 current research18 COST Senior Officials (CSO) Responsible for the overall strategy of COST Decides on the launching of each individual COST Action Approves participation from non-COST countries institutes Approves prolongation of COST Actions

19 Non-linear speech processing: overview of COST-277 current research19 COST Technical Committee (TC) Selection of new COST Actions Monitoring of ongoing COST Actions Evaluation of completed COST Actions Dissemination and Valorisation of COST activities Provide Advice to EC on Budget Planning

20 Non-linear speech processing: overview of COST-277 current research20 Management Committee (MC) Supervises and coordinates the implementation of the Action Composed of : –Maximum two representatives of each signatory country they ensure the scientific coordination at national level –One representative of any non-COST institution admitted to participate –The Scientific Secretary –Representatives of the Commission services Each signatory has one vote

21 Non-linear speech processing: overview of COST-277 current research21 Working Group (WG) Small number of researchers per working group Working group members may be: –Management Committee members –Other scientists from the signatory countries

22 Non-linear speech processing: overview of COST-277 current research22 COST TIST ~ 28 Actions, ~ 2000 Organisations Covering Basic Research on –Antennas and Radio Propagation –Satellite Technologies and Services –Mobile Technologies and Services –Optical Networking Components and Services –Internet & Multimedia Network Services –Speech Technologies –Information and Computer Science Strong Relationship with IST Program

23 Non-linear speech processing: overview of COST-277 current research23 Evolution of COST TIST Actions

24 Non-linear speech processing: overview of COST-277 current research24 Special Needs & User Requirements COST 219bis, 269 COST TIST Research Domains & Actions Antennas/ Radio Propagation COST 244bis, 255, 260, 261, 271 Mobile & Personal Comm. COST 259, 273 Satellite Tech. & Services COST 272 Optical Networking COST 265, 266, 267, 268, 270 New Internet & Multimedia Services COST 211 Quad, 256, 257, 263, 264, 269, 275, 279 Speech Technologies COST 258, 277, 278 Information & Computer Science COST 274, 276

25 Non-linear speech processing: overview of COST-277 current research25 Other COST Actions in Speech Technologies COST 275: Biometrics-Based Recognition of People over the Internet –Involves the use of both voice and face recognition for user authentification over the Internet COST 278: Spoken Language Interaction in Telecommunications –Improve knowledge regarding issues and problems related to spoken language interaction, including robustness and multi-lingual aspects –Human-computer interaction using spoken language in multi-modal context, including dialoque theories and application evaluation

26 Non-linear speech processing: overview of COST-277 current research26 Relationship between COST Actions 275, 277 and 278 275: Biometrics based Recognition of People over the Internet 277: Non-linear Speech Processing 278: Spoken Language Interaction in Telecommunication Speaker Recognition Speech Recognition Natural Language Processing Multi Modality & Data Fusion Speech Analysis & Coding Image Analysis & Graphics Speech Synthesis Dialogue Application Fields Interface Components Generic Functions

27 Non-linear speech processing: overview of COST-277 current research27 GRANT CONTRACTS COST TIST support is provided through annual Grant Contracts with coordinating organisation Contract covers costs for: –Secretariat (manpower to cover administration) –Meetings (WG and MC) –Seminars and workshops –Short Term Scientific Missions –Publications

28 Non-linear speech processing: overview of COST-277 current research28 SECRETARIAT Contract Management, Payments Reimbursement of Meetings Rebuilding of WWW site –Repository of Official Documents –TC and Action Activities and Events Enhancing Dissemination –News Letter –Central Index and Storage of Reports for Retrieval Links with EC (IST) and National Programmes

29 Non-linear speech processing: overview of COST-277 current research29 Overview: COST-277 DISCRETE MODELS SYNTHETIC SPEECH HUMAN SPEECH CODED SPEECH WRITTEN SPEECH TtS StT StC CtS Analysis Synthesis Recogn. Coding © ukl 2002

30 Non-linear speech processing: overview of COST-277 current research30 Organization Chair: Marcos Faúndez Vice-Chair: Gernot Kubin Secretary: Stephen McLaughlin –WG1: Bastiaan Kleijn –WG2: Bojan Petek –WG3: Stephen McLaughlin –WG4: Gerard Chollet

31 Non-linear speech processing: overview of COST-277 current research31 Countries Austria Belgium Czech Republic France Germany Greece Ireland Italy Lithuania Portugal Slovakia Slovenia Spain Sweden Switzerland UK Canada

32 Non-linear speech processing: overview of COST-277 current research32 Dissemination of info e-mail distribution list: Cost277@sitma.net Subscribe/unsubscribe majordomo@sitma.net majordomo@sitma.net Website: http://www.ee.ed.ac.uk/  cost277/

33 Non-linear speech processing: overview of COST-277 current research33 Future Meetings of the management committee

34 Non-linear speech processing: overview of COST-277 current research34 Publications and reports International Journal of control and intelligent systems, special issue on Non-linear Speech processing techniques and applications ACTAPRESS. Invited editor: A. Hussain (COST-277 MC member) Special sessions in EUSIPCO’02, IWANN’01, IWANN’03, EUSIPCO’04 (TBC)

35 Non-linear speech processing: overview of COST-277 current research35 COST Actions in Speech Technologies COST 275: Biometrics-Based Recognition of People over the Internet –Involves the use of both voice and face recognition for user authentification over the Internet COST 277: Nonlinear speech processing COST 278: Spoken Language Interaction in Telecommunications –Improve knowledge regarding issues and problems related to spoken language interaction, including robustness and multi-lingual aspects –Human-computer interaction using spoken language in multi- modal context, including dialoque theories and application evaluation

36 Non-linear speech processing: overview of COST-277 current research36 Relationship between COST Actions 275, 277 and 278 275: Biometrics based Recognition of People over the Internet 277: Non-linear Speech Processing 278: Spoken Language Interaction in Telecommunication Speaker Recognition Speech Recognition Natural Language Processing Multi Modality & Data Fusion Speech Analysis & Coding Image Analysis & Graphics Speech Synthesis Dialogue Application Fields Interface Components Generic Functions

37 Non-linear speech processing: overview of COST-277 current research37 COST-277: A different approach “ The four classical areas of speech processing:  Speech Recognition (Speech-to-Text, StT)  Speech Synthesis (Text-to-Speech, TtS and Code-to-Speech, CtS)  Speech Coding (Speech-to-Code, StC with CtS) and  Speaker Verification and Identification (SV) have all developed their own methodology almost independently from the neighboring areas. This has led to a plurality of tools and methods that are hard to integrate to any small multifunctional speech processing system (a mobile phone performing speaker verification and continuous speech recognition in addition to speech coding should have many separate processes running in parallel).

38 Non-linear speech processing: overview of COST-277 current research38 Relations between different fields DISCRETE MODELS SYNTHETIC SPEECH HUMAN SPEECH CODED SPEECH WRITTEN SPEECH TtS StT StC CtS Analysis Synthesis Recogn. Coding © ukl 2002

39 Non-linear speech processing: overview of COST-277 current research39 COST277 Non-linear speech processing PROGRESS REPORT Period: from (June-2001) to (June-2003)

40 Speech coding40 LINEAR PREDICTION Scalar linear prediction AR modeling of order P : where a i are the scalar prediction coefficients. obtained with the levinson-durbin recursion. Vectorial linear prediction AR-vector modeling of order P: where are matrices

41 Speech coding41 NL SCALAR PREDICTION WITH NNET input layer hidden layer output layer x[n-1]x[n-p]x[n-p+1]inputs:x[n] output

42 Speech coding42 NLVECTORIAL PREDICTION WITH NNET input layer hidden layer output layer inputs: outputs x[n-p]x[n-p+1]x[n-1] x[n] x[n+1]

43 Speech coding43 ADPCM NNET PREDICTION

44 Speech coding44 VECTORIAL NL-ADPCM RESULTS

45 Non-linear speech processing: overview of COST-277 current research45 Very low bit rate speech coder Demonstration !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

46 Non-linear speech processing: overview of COST-277 current research46 Broadcast news audio segmentation, classification, clustering and speech recognition Demonstration demo Available at http://193.126.86.80http://193.126.86.80

47 Non-linear speech processing: overview of COST-277 current research47 SPEAKER RECOGNITION Current systems rely on low-level information in speech. –Short time extent analysis windows (20-30 ms) –Spectral energy based (MFCC) Another possibility: High level information –Speaking rate –Pitch patterns –Word/ Phrase usage –Idiosyncratic pronunciation

48 Non-linear speech processing: overview of COST-277 current research48 SPEAKER RECOGNITION: Possibilities of NOLISP Low level information: –Non-linear predictive models instead of LPCC –Parameters: Fractal, Lyapunov exponents, correlation dimension, etc. High level information: –To take advantage of the other working groups. For instance intonation is fundamental in speech synthesis and useful for speaker recognition.

49 Non-linear speech processing: overview of COST-277 current research49 Why to use NL-models? Listening to the residual signal of an LPC analysis it is possible to identify who is speaking. –Usually the residual signal is discarded. –NL models offer a better fit and whiter residual signal. NL models can offer an improvement in coding and synthesis, so there is room for speaker recognition improvement.

50 Non-linear speech processing: overview of COST-277 current research50 BANDWIDTH EXTENSION: An example of NL processing A speech signal that has passed through the public switched telephony network (PSTN) has generally a limited frequency range between 0.3 and 3.4 kHz. The Bandwidth extension algorithms aim at recovering the lost low- (0 - 0.3 kHz) and/or high- (3.4 –8 kHz) frequency band given the narrow-band speech signal

51 Non-linear speech processing: overview of COST-277 current research51 SPECTRAL BAND REPLICATION 0f s /4f s /2 0f s /4f s /2f s /8 0f s /4f s /2 0f s /4f s /2 initial final f [kHz] 510 LPF

52 Non-linear speech processing: overview of COST-277 current research52 BANDWIDTH EXTENSION Databases: –Original fullband: [0.3, 7] kHz –Narrow band: [0.3, 3.4] kHz –Bandwidth extended: [0.3, 7] kHz LPF Bandwidth extension

53 Non-linear speech processing: overview of COST-277 current research53 MIC database: DCF for several MELCEPS-l

54 Non-linear speech processing: overview of COST-277 current research54 Bandwidth extension For human beings it’s more easy to recognize using full band signals. No new information is added Experimental results reveal that: –The bandwidth extension algorithm does not introduce any damaging artifacts –With MELCEPS parameterization, the results are better than using the narrow band signal.


Download ppt "Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos."

Similar presentations


Ads by Google