Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos.

Slides:

Advertisements

Similar presentations

How to form a consortium

Advertisements

Programme: 145 sessions & social events

COST277: Non-linear speech processing1 COST277 Non-linear speech processing PROGRESS REPORT Period: from (June-2001) to (June-2002)

COST ACTION 275 BIOMETRICS-BASED RECOGNITION OF PEOPLE OVER THE INTERNET Progress Report June 2001-May 2002 Aladdin Ariyaeeinia University of Hertfordshire,

From CESSDA to European Research Infrastructure Developments in cross-European data sharing.

Profitable Partnership Opportunities

FP6-ERA-NET/March CA and SSA Results of 2nd ERA- NET cut-off date CA and SSA Proposal Evaluation March 2004 and Negotiations/Selection April-August.

Paide, Estonia 28/08/2014 Annual COST Seminar - News from COST Office Dr Luule Mizera COST ISCH Science Officer.

EIPA Headquarters Maastricht (NL) Antenna Barcelona (ES) Antenna Barcelona (ES) Antenna Luxembourg (LU) Antenna Milan (IT) Ann Stoffels © EIPA 2006 EIPA.

European CO operation in the Field of S cientific and T echnical Research.

Erasmus Thematic Network Sanne Hirs, Project coordinator Faculty of Law, Utrecht University.

EASAC science-policy dialogue project: phase 2 – 2011 Report of phone interviews with Academies Gill Petrokofsky October 2012.

COST Telecommunications and Information Science and Technologies.

UNICA RECTORS’SEMINAR Brussels 5th of June 2007 Poul Petersen Tel

December 2005 European COoperation in the field of Scientific and Technical research Gabriela Cristea Administrative Officer for ICT COST Office.

Delegations III KAM, Bratislava 4th to 8th September 2013.

Erasmus+ Placement (internship) Program Ceren Genç Director, International Cooperation Office Ipek University

Study Visits IV KAM Prague, 3 rd to 7 th September 2014.

Knowledge Management LXV International Council Meeting Qawra, Malta 16 th - 23 rd of March 2014.

Study Visits ICM Croatia, Opatija, 27th October to 3th November 2013.

1 Survey Data in ECA : Frequency, Coverage, Consistency and Access By Victor Sulla ECS-PE.

Institutional Visits IV KAM Prague, 3 rd to 7th September.

Delegations IV KAM Prague 3rd to 7th September 2014.

Where it all starts - RESEARCH LXIV International Council Meeting Opatija, Croatia October 28 th - November 3 rd 2013.

Institutional Visits ICM Cluj Napoca, 19 th to 26 th April 2015 Patrick Zischeck, Assistant for IV and SV.

Scholarship Opportunities in the European Union and EFTA for Western Balkans Students Ministry of Education, Youth and Sports the Czech Republic.

Assessing child-well-being: perspectives and experiences of Health Behaviour in School- Aged Children (HBSC) Study A World Health Organization Cross- National.

ENTERPRISE EUROPE NETWORK Business Support on Your Doorstep EUROPEAN INFORMATION AND INNIOVATION CENTER IN MACEDONIA Ass. Prof. dr Kole Vasilevski Vice-rector.

Participation of the Slovak Universities on the 7 FP by Prof. Daniel Kluvanec Constantine Philosopher University in Nitra Bratislava 6/12/2005.

Recruitment Kick-Off Meeting in Geneva. Kick-Off Meeting in Geneva: Budget and FinancesSlide Recruitment Outline 1.Who can be recruited?

EAVI Founding Conference „Advancing the European Viewers Interests“ Session I: Television Viewers Participation in Europe Uwe Hasebrink.

OECD Review of Russian Statistics Peer Review Mission to Russia April 2012 Tim Davis Head, Global Relations, Statistics Directorate.

SODICO project 13/6/2013, Zagreb (HR) « FIEC : the voice of construction employers at the EU level » Domenico Campogrande Director Social Affairs - FIEC.

Bureau for International Language Coordination

ELSA Law Schools ICM Cluj-Napoca, 21st April 2015.

European Business Register Congress of the Notaries of Europe, Brussels, 28 June 2011.

European COoperation in the field of Scientific and Technical research A New Era has begun for COST Dr.-Ing. Martin Grabert Director COST Office, Brussels.

THE EUROPEAN UNION. HISTORY 28 European states after the second world war in 1951 head office: Brussels 24 different languages Austria joined 1995.

Erasmus+ Work together with European higher education institutions Ms. Piia Heinamaki Project adviser, European Commission - Education, Audiovisual and.

CAF Resource Centre at EIPA Open Days Patrick Staes Senior Expert European Institute of Public Administration THE COMMON ASSESSMENT FRAMEWORK.

COST Workshop on Developing Knowledge- Sharing Partnerships in Europe and Central Asia Orsolya Tóth National Innovation Office Gödöllő, 4 December, 2013.

MD RMDCN within RA VI – status and perspectives CBS/ET-IMTN, June 2001 RMDCN within RA VI - status and perspectives - Matteo Dell’Acqua ECMWF.

EIPA CAF Resource Centre CAF CAF activities – state of affairs Patrick Staes & Ann Stoffels EIPA CAF Resource Centre Berlin, 8-9 February 2007.

STATE OF PLAY : ESF FINANCIAL EXECUTION. 2 Overall 2012 ESF Budget Execution on 20/11/2012 Programmin g period 2012 Payment appropriation s mil.€ 2012.

TÜBİTAK SOCRATES II European Community action programme in the field of education Duration: 1 January December 2006 Budget: 1,850 mEuro over.

AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.

Shaping tomorrow’s innovations today EUREKA EUREKA – Eurostars: a support to European innovation INNOVATION 2009 Prague, 1-3 December 200.

Erasmus+ Work together with European higher education institutions Erasmus+

© Enterprise Europe Network South West 2009 The Eurostars Programme Kenny Legg R&D Funding for the Environmental Sector – 29 June 2010 European Commission.

Bureau for International Research and Technology Cooperation Herlitschka 1 Warsaw FP6 Launch Conference - 26 Nov Small and Medium Enterprises -

E u r o g u i d a n c e A Network of National Resource and Information Centres for Guidance Established in 1992.

Supporting Professional Higher Education in Europe EUROPEAN ASSOCIATION OF INSTITUTIONS IN HIGHER EDUCATION Johan Cloet, Secretary General.

E u r o g u i d a n c e A Network of National Resource and Information Centres for Guidance Established in 1992.

Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.

The European Law Students’ Association Albania ˙ Austria ˙ Azerbaijan ˙ Belgium ˙ Bosnia and Herzegovina ˙ Bulgaria ˙ Croatia ˙ Cyprus ˙ Czech Republic.

EU’s Lifelong Learning Programme Erasmus Higher Education Mobility Charter and bilaterals So where can you go?

Maps of Topic 2B Multilingualism in Europe Europe A Story of Empire (a united Europe) & Language.

The Role of the Rectors’ Conferences in Europe Henriette Stöber Central European University & University of York Erasmus Mundus MAPP - Master of Public.

European Innovation Scoreboard European Commission Enterprise and Industry DG EPG DGs meeting, May 2008.

JPI Connecting Climate Knowledge for Europe (JPI Climate)

EUROPEAN UNION – MAKING OFF European Economic Community

DISTRIBUTION AUTOMATIC - GENERATION

The European Parliament – voice of the people

The European Parliament – voice of the people

HEDIC Health expenditures by diseases and conditions

EU: First- & Second-Generation Immigrants

Enterprise and Industry Directorate General

Lifelong Learning Programme

Why a KETs Observatory ? Commission Communication on Key Enabling Technologies (2012) : "There is no validated market data on development and take-up of.

COST - European Cooperation in Science and Technology

Presentation transcript:

Non-linear speech processing: overview of COST-277 current research1 Nonlinear speech processing (NOLISP) Overview of COST-277 current research Marcos Faúndez-Zanuy COST-277 Chairman

Non-linear speech processing: overview of COST-277 current research2 OUTLINE 1. Overview: what means “nonlinear”? 2. Organization of COST Report activity june’01 – june’03

Non-linear speech processing: overview of COST-277 current research3 OUTLINE 1. Overview: what means “nonlinear”? 2. Organization of COST Report activity june’01 – june’03

Non-linear speech processing: overview of COST-277 current research4 What means “Non-linear”? (Strict sense) Superposition principle does not hold: Given: f(x 1 )=y 1, f(x 2 ) =y 2 => f(ax 1 )=ay 1, f (x 1 +x 2 ) =y 1 +y 2

Non-linear speech processing: overview of COST-277 current research5 What means “Non-linear”? Strict sense: Really almost “everything” is nonlinear AcquisitionParameterization Models Quantizer (linear, A-law, etc.) CepstrumHMM, VQ

Non-linear speech processing: overview of COST-277 current research6 Non-linearities are always present Nonlinearities of the systems that generate the signal and/ or noise Nonlinearities of the signal acquisition system Nonlinearities of the transmission channel Nonlinearities of the human perception mechanism.

Non-linear speech processing: overview of COST-277 current research7 Classical approach Wide sense: linear speech processing Speech signal model consists of a pulse/ noise source and a linear filter where both change their characteristics on a frame- by-frame basis. This approach neglects structure known to be present in the speech signal.

Non-linear speech processing: overview of COST-277 current research8 Evidences of nonlinearities Residue comparison Correlation dimension Higher order statistics Probability density functions

Non-linear speech processing: overview of COST-277 current research9 Example: Linear vs NL

Non-linear speech processing: overview of COST-277 current research10 Drawbacks with NOLISP approaches A lack of a unifying theory of the different nonlinear processing tools (nnets, homomorphic, polynomial, morphological, ordered statistics filters, and so on) High computational burden Well known analysis tools are not applicable Usually, a closed-form formulation does not exist, and iterative methods (with local minima problems) must be used.

Non-linear speech processing: overview of COST-277 current research11 What are we mainly looking for? The replacement of the linear filter (or parts thereof) with nonlinear operators (models) should enable us to obtain an accurate description of the speech signal with a lower number of parameters. This in turn should lead to better performance of practical speech processing applications.

Non-linear speech processing: overview of COST-277 current research12 OUTLINE 1. Overview: what means “nonlinear”? 2. Organization of COST Report activity june’01 – june’03

Non-linear speech processing: overview of COST-277 current research13 What is COST ? Intergovernmental Cooperation –Created in 1971 –17 Scientific and Technical Domains Participation –33 COST Countries –European Commission –International Organisations –Organizations from Non-COST Countries on Mutual Benefit Basis COST Actions –Concerted Actions of Nationally Funded R&D

Non-linear speech processing: overview of COST-277 current research14 COST TIST Telecommunications, Information Science and Technologies

Non-linear speech processing: overview of COST-277 current research15 COST Countries The fifteen EU Member States u u The EFTA Member States ä Iceland ä Norway ä Switzerland u Central and Eastern countries ä Estonia ä Latvia ä Lithuania ä Poland ä the Czech republic ä Slovakia ä Slovenia ä Croatia ä Romania ä Bulgaria u Other countries ä Cyprus ä Malta ä Turkey ä Hungary

Non-linear speech processing: overview of COST-277 current research16 Evolution of COST Actions Total Actions Starting Actions

Non-linear speech processing: overview of COST-277 current research17 WHAT IS A COST ACTION? Concerted Action Pan-European “NON-COMPETITIVE” Research R&D Financed Nationally Flexibility Bottom-up A la carte participation Commission funds only coordination activities

Non-linear speech processing: overview of COST-277 current research18 COST Senior Officials (CSO) Responsible for the overall strategy of COST Decides on the launching of each individual COST Action Approves participation from non-COST countries institutes Approves prolongation of COST Actions

Non-linear speech processing: overview of COST-277 current research19 COST Technical Committee (TC) Selection of new COST Actions Monitoring of ongoing COST Actions Evaluation of completed COST Actions Dissemination and Valorisation of COST activities Provide Advice to EC on Budget Planning

Non-linear speech processing: overview of COST-277 current research20 Management Committee (MC) Supervises and coordinates the implementation of the Action Composed of : –Maximum two representatives of each signatory country they ensure the scientific coordination at national level –One representative of any non-COST institution admitted to participate –The Scientific Secretary –Representatives of the Commission services Each signatory has one vote

Non-linear speech processing: overview of COST-277 current research21 Working Group (WG) Small number of researchers per working group Working group members may be: –Management Committee members –Other scientists from the signatory countries

Non-linear speech processing: overview of COST-277 current research22 COST TIST ~ 28 Actions, ~ 2000 Organisations Covering Basic Research on –Antennas and Radio Propagation –Satellite Technologies and Services –Mobile Technologies and Services –Optical Networking Components and Services –Internet & Multimedia Network Services –Speech Technologies –Information and Computer Science Strong Relationship with IST Program

Non-linear speech processing: overview of COST-277 current research23 Evolution of COST TIST Actions

Non-linear speech processing: overview of COST-277 current research24 Special Needs & User Requirements COST 219bis, 269 COST TIST Research Domains & Actions Antennas/ Radio Propagation COST 244bis, 255, 260, 261, 271 Mobile & Personal Comm. COST 259, 273 Satellite Tech. & Services COST 272 Optical Networking COST 265, 266, 267, 268, 270 New Internet & Multimedia Services COST 211 Quad, 256, 257, 263, 264, 269, 275, 279 Speech Technologies COST 258, 277, 278 Information & Computer Science COST 274, 276

Non-linear speech processing: overview of COST-277 current research25 Other COST Actions in Speech Technologies COST 275: Biometrics-Based Recognition of People over the Internet –Involves the use of both voice and face recognition for user authentification over the Internet COST 278: Spoken Language Interaction in Telecommunications –Improve knowledge regarding issues and problems related to spoken language interaction, including robustness and multi-lingual aspects –Human-computer interaction using spoken language in multi-modal context, including dialoque theories and application evaluation

Non-linear speech processing: overview of COST-277 current research26 Relationship between COST Actions 275, 277 and : Biometrics based Recognition of People over the Internet 277: Non-linear Speech Processing 278: Spoken Language Interaction in Telecommunication Speaker Recognition Speech Recognition Natural Language Processing Multi Modality & Data Fusion Speech Analysis & Coding Image Analysis & Graphics Speech Synthesis Dialogue Application Fields Interface Components Generic Functions

Non-linear speech processing: overview of COST-277 current research27 GRANT CONTRACTS COST TIST support is provided through annual Grant Contracts with coordinating organisation Contract covers costs for: –Secretariat (manpower to cover administration) –Meetings (WG and MC) –Seminars and workshops –Short Term Scientific Missions –Publications

Non-linear speech processing: overview of COST-277 current research28 SECRETARIAT Contract Management, Payments Reimbursement of Meetings Rebuilding of WWW site –Repository of Official Documents –TC and Action Activities and Events Enhancing Dissemination –News Letter –Central Index and Storage of Reports for Retrieval Links with EC (IST) and National Programmes

Non-linear speech processing: overview of COST-277 current research29 Overview: COST-277 DISCRETE MODELS SYNTHETIC SPEECH HUMAN SPEECH CODED SPEECH WRITTEN SPEECH TtS StT StC CtS Analysis Synthesis Recogn. Coding © ukl 2002

Non-linear speech processing: overview of COST-277 current research30 Organization Chair: Marcos Faúndez Vice-Chair: Gernot Kubin Secretary: Stephen McLaughlin –WG1: Bastiaan Kleijn –WG2: Bojan Petek –WG3: Stephen McLaughlin –WG4: Gerard Chollet

Non-linear speech processing: overview of COST-277 current research31 Countries Austria Belgium Czech Republic France Germany Greece Ireland Italy Lithuania Portugal Slovakia Slovenia Spain Sweden Switzerland UK Canada

Non-linear speech processing: overview of COST-277 current research32 Dissemination of info distribution list: Subscribe/unsubscribe Website:  cost277/

Non-linear speech processing: overview of COST-277 current research33 Future Meetings of the management committee

Non-linear speech processing: overview of COST-277 current research34 Publications and reports International Journal of control and intelligent systems, special issue on Non-linear Speech processing techniques and applications ACTAPRESS. Invited editor: A. Hussain (COST-277 MC member) Special sessions in EUSIPCO’02, IWANN’01, IWANN’03, EUSIPCO’04 (TBC)

Non-linear speech processing: overview of COST-277 current research35 COST Actions in Speech Technologies COST 275: Biometrics-Based Recognition of People over the Internet –Involves the use of both voice and face recognition for user authentification over the Internet COST 277: Nonlinear speech processing COST 278: Spoken Language Interaction in Telecommunications –Improve knowledge regarding issues and problems related to spoken language interaction, including robustness and multi-lingual aspects –Human-computer interaction using spoken language in multi- modal context, including dialoque theories and application evaluation

Non-linear speech processing: overview of COST-277 current research36 Relationship between COST Actions 275, 277 and : Biometrics based Recognition of People over the Internet 277: Non-linear Speech Processing 278: Spoken Language Interaction in Telecommunication Speaker Recognition Speech Recognition Natural Language Processing Multi Modality & Data Fusion Speech Analysis & Coding Image Analysis & Graphics Speech Synthesis Dialogue Application Fields Interface Components Generic Functions

Non-linear speech processing: overview of COST-277 current research37 COST-277: A different approach “ The four classical areas of speech processing:  Speech Recognition (Speech-to-Text, StT)  Speech Synthesis (Text-to-Speech, TtS and Code-to-Speech, CtS)  Speech Coding (Speech-to-Code, StC with CtS) and  Speaker Verification and Identification (SV) have all developed their own methodology almost independently from the neighboring areas. This has led to a plurality of tools and methods that are hard to integrate to any small multifunctional speech processing system (a mobile phone performing speaker verification and continuous speech recognition in addition to speech coding should have many separate processes running in parallel).

Non-linear speech processing: overview of COST-277 current research38 Relations between different fields DISCRETE MODELS SYNTHETIC SPEECH HUMAN SPEECH CODED SPEECH WRITTEN SPEECH TtS StT StC CtS Analysis Synthesis Recogn. Coding © ukl 2002

Non-linear speech processing: overview of COST-277 current research39 COST277 Non-linear speech processing PROGRESS REPORT Period: from (June-2001) to (June-2003)

Speech coding40 LINEAR PREDICTION Scalar linear prediction AR modeling of order P : where a i are the scalar prediction coefficients. obtained with the levinson-durbin recursion. Vectorial linear prediction AR-vector modeling of order P: where are matrices

Speech coding41 NL SCALAR PREDICTION WITH NNET input layer hidden layer output layer x[n-1]x[n-p]x[n-p+1]inputs:x[n] output

Speech coding42 NLVECTORIAL PREDICTION WITH NNET input layer hidden layer output layer inputs: outputs x[n-p]x[n-p+1]x[n-1] x[n] x[n+1]

Speech coding43 ADPCM NNET PREDICTION

Speech coding44 VECTORIAL NL-ADPCM RESULTS

Non-linear speech processing: overview of COST-277 current research45 Very low bit rate speech coder Demonstration !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Non-linear speech processing: overview of COST-277 current research46 Broadcast news audio segmentation, classification, clustering and speech recognition Demonstration demo Available at

Non-linear speech processing: overview of COST-277 current research47 SPEAKER RECOGNITION Current systems rely on low-level information in speech. –Short time extent analysis windows (20-30 ms) –Spectral energy based (MFCC) Another possibility: High level information –Speaking rate –Pitch patterns –Word/ Phrase usage –Idiosyncratic pronunciation

Non-linear speech processing: overview of COST-277 current research48 SPEAKER RECOGNITION: Possibilities of NOLISP Low level information: –Non-linear predictive models instead of LPCC –Parameters: Fractal, Lyapunov exponents, correlation dimension, etc. High level information: –To take advantage of the other working groups. For instance intonation is fundamental in speech synthesis and useful for speaker recognition.

Non-linear speech processing: overview of COST-277 current research49 Why to use NL-models? Listening to the residual signal of an LPC analysis it is possible to identify who is speaking. –Usually the residual signal is discarded. –NL models offer a better fit and whiter residual signal. NL models can offer an improvement in coding and synthesis, so there is room for speaker recognition improvement.

Non-linear speech processing: overview of COST-277 current research50 BANDWIDTH EXTENSION: An example of NL processing A speech signal that has passed through the public switched telephony network (PSTN) has generally a limited frequency range between 0.3 and 3.4 kHz. The Bandwidth extension algorithms aim at recovering the lost low- ( kHz) and/or high- (3.4 –8 kHz) frequency band given the narrow-band speech signal

Non-linear speech processing: overview of COST-277 current research51 SPECTRAL BAND REPLICATION 0f s /4f s /2 0f s /4f s /2f s /8 0f s /4f s /2 0f s /4f s /2 initial final f [kHz] 510 LPF

Non-linear speech processing: overview of COST-277 current research52 BANDWIDTH EXTENSION Databases: –Original fullband: [0.3, 7] kHz –Narrow band: [0.3, 3.4] kHz –Bandwidth extended: [0.3, 7] kHz LPF Bandwidth extension

Non-linear speech processing: overview of COST-277 current research53 MIC database: DCF for several MELCEPS-l

Non-linear speech processing: overview of COST-277 current research54 Bandwidth extension For human beings it’s more easy to recognize using full band signals. No new information is added Experimental results reveal that: –The bandwidth extension algorithm does not introduce any damaging artifacts –With MELCEPS parameterization, the results are better than using the narrow band signal.