Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language based Interaction

Similar presentations


Presentation on theme: "Language based Interaction"— Presentation transcript:

1 Language based Interaction
ICT 2008, Lyon, 26 Nov 08 Roberto Cencioni Kimmo Rossi Challenge 2 – Objective 2.2 Language based Interaction DG Information Society and Media Unit INFSO.E1 Language Technologies & Machine Translation

2 Outline Opening remarks FP7 ICT Call 4 – Essence
FP7 ICT Call 4 – Ingredients Q&A CIP ICT-PSP Call 3 – Opportunities Q&A, close

3 Here we are a new unit established in July 2008 language is everywhere
Language Technologies & Machine Translation (INFSO.E1) high expectations vs. low rate of EC S&T activity in the last few years language is everywhere written & spoken; documents, messages, databases, webpages, multimedia objects etc; information as well as meta-information but our resources are limited, so initial focus on multilingual technologies, services, applications two instruments in 2009: Research: FP7 ICT, call 4 Objective 2.2 – Language based Interaction Innovation: CIP ICT-PSP, call 3 Theme 5 – Multilingual Web total budget of 40 Meuro

4 Baseline Why? Purpose: support & enhance across languages
new online paradigms centred around communication, collaboration, co-creation … but significant language barriers remain EU comprises 27 countries & 23 official languages single European Information Space – one of the i2010 objectives EC communication on Multilingualism (Sept ‘08) calls for a broader policy framework & joint action Purpose: support & enhance interpersonal & business communication information access & publishing across languages

5 A few facts EU official languages: 23 x 22 = 506 pairs
EC MT (Systran core engine) has 18 pairs in operation & 10 more pairs at prototype stage 60+ national, regional & minority languages within the EU English accounts for 30% of today’s Web content 50% in 2000, 35% in 2004 Arabic, Chinese, Portuguese … growing very fast nearly 1,5 billion internet users worldwide (2008) c 320 million native EN speakers in the world basic requirements for the “digital translation market”: volume access personalisation real quick, real cheap

6 Can’t this be done elsewhere?
indeed EU RTD projects often exhibit multilingual features yet approaches are too often naïve, short term, sectoral hence a dedicated focal point stimulating upstream research enhancing research capacity thus enabling more ambitious & impactful domain specific actions

7 Research vs. Innovation division of labour
from long term foundational research (FP7) through applied research & technology development (FP7) to integration & demonstration (FP7 + PSP) infrastructure & resources (FP7 + PSP) different scale of ambition (€) different level of maturity (technologyservice) different timescales & partnerships

8 I. Workprogramme R&D topics & outcomes
FP7-ICT Call I. Workprogramme R&D topics & outcomes

9 What technology can offer today
machine translation & translation memory making sense of online content improving productivity of human translation automatic translation of “acceptable” quality in specific domains / language pairs information search & retrieval find relevant information across languages information extraction, filtering, categorisation incl. summarization, routing & alert services, … for a variety of purposes eg business intelligence speech technology command & control, dictation systems call center services, conversational systems

10 Trends new requirements, new approaches what happens to online content
from Web 1.X to Web 2.0 – we are all content producers from static & uni-directional to dynamic, volatile, collaborative from service to self-service, translations are needed “on the fly” are language technologies up to the task? what happens to online content disappearing document? Europa website: 6 million “documents” elusive distinction between content & service how to manage effectively multilingual content multilingualism on the rise in the EU (from 4 to 23 languages) and globally English gains ground but mother tongues remain online content becomes even more multilingual

11 What technology might offer tomorrow
machine translation MT that learns from its mistakes embedded in products/services, can cover any use context esp.online: chats, blogs, dynamic content ... broader coverage, fill in missing languages information search & retrieval truly multilingual access to information: query in any language, content automatically translated website content development & management new content is translated automatically changes automatically applied in all language versions speech technology real-time speech-to-speech translation (eg phone call, in a conference)

12 Challenges for MT bring MT to the users MT that learns & adapts
understand what users need novel use scenarios communication rather than translation better evaluation metrics MT that learns & adapts how to exploit feedback from users how to use readily available “world knowledge” towards a paradigm shift? inspiration from: machine learning, cognitive systems, psycho-linguistics, sociology, semantic web, data mining, new computing paradigms ...

13 FP7-ICT Call 4 at a glance Core research exploring new avenues for machine translation (IP) ground breaking, multidisciplinary, high risk – high promise research architectures & technologies that learn and adapt flexibly & effectively to different languages, domains & tasks catering for new forms of language & communication (eg online communities; dynamic, volatile …) Problem oriented research for specific tasks & usage contexts (STR) online translation for the masses translation in distributed collaborative environments managing multilingual communication & content automatic acquisition & annotation of language resources Community building & networking (NOE) reinvigorate European machine translation (MT) community build bridges between MT & MLT and other relevant disciplines help develop & coordinate shared technical infrastructure, promote reusability & interoperability, foster evaluation

14 Outcome a) IP Core research. Explore new research avenues (one IP, up to 8 M) break new ground, foster a novel multi-disciplinary approach to machine translation architectures & technologies that can learn and adapt flexibly & effectively to different languages, domains & tasks catering for new forms of language & communication (eg online communities) high risk but high promise (accuracy, speed, scalability) language & translation models coupled with data driven, machine learning methods automatic acquisition & representation of linguistic facts semantics, models of world knowledge relevant for translation approaches inspired from social networks …

15 Outcome b) STR Problem oriented research. A clearly defined usage context (~5 STR’s, c 12 M) online translation for the masses wide coverage (beyond GoogleTranslate); adequate quality, suitable at least for gisting/browsing; language embedded in documents, web pages, multimedia objects … translation in distributed environments support non-linear collaborative interplay between authors, translators, editors/publishers & active users; innovative integration of automatic, interactive & human translation beyond current practice; technologies as well as processes & social interaction managing multilingual content & communication a superset of the above addressing the development & management of online content & services esp. their versioning & maintenance in multiple languages acquisition & annotation of language resources (nearly-)automatic, high volume, high performance mining the web as well available repositories (eg corpora) and public information sources

16 Outcome b) managing multilingual Web content
methods, techniques, metrics … for developing & managing multilingual web content & services much more than translation; significant cultural elements think of one big website in many languages, or several interrelated websites, one country/language each now think of how to maintain the integrity & consistency of such resources, effectively & over a long period of time and how to detect & repair gaps or inconsistencies so, beyond the “translation” step: design, authoring, versioning & maintenance of (multiple, parallel, interconnected …) websites, portals or repositories in a distributed collaborative environment, possibly across organisational boundaries

17 Outcome c) NOE Community building & networking (1 or 2 NoEs, up to 6 M) reinvigorate Europe’s machine translation (MT) community bring together key players from scientific, technical & commercial circles (esp. SMEs) stimulate cross-border cooperation (teams, institutions, national initiatives) assess skills, foster training & exchanges; support smaller teams & not well-served languages identify gaps, establish roadmap encompassing technologies, resources & applications build bridges between MT & MLT community and other relevant disciplines stimulate dialogue between diverse communities; identify opportunities & bottlenecks initiate integrative research, prepare the ground for further collaboration explore medium to long term approaches, identify possible shifts in paradigm develop & coordinate shared technical infrastructure, reusability & interoperability, evaluation infrastructural support: portal services, inventories & repositories of general interest tools & raw/annotated datasets, their documentation active promotion of reusability & open-source; harmonisation of representation & annotation schemes foster widely recognized benchmarks ...

18 What we don’t do Not supported under Call 4:
approaches that do not promise to deliver performance along with portability, scalability & maintenability yes: emphasis on automation, flexibility & cost effectiveness developments addressing immediate commercial concerns no: adding a language pair to an existing product proposals that do not address « language transfer » yes: focus on mapping a source language into one or several target languages issues covered by other Challenges and Objectives no: HMI, interaction with robots, ambient intelligence … topics well covered by recent & ongoing projects no: sign languages, dialogue systems …

19 Email: infso-e1@ec.europa.eu
Practical info FP7-ICT Objective 2.2 – Language based interation budget: 26 Meuro under Call 4 managed by: Unit E1 EC contact: Mr Kimmo Rossi inquiries: available pre-proposals: from Dec 1st until 3 weeks before the call closing date (Apr 1st) Language Technology Days: January 2009, Luxbg ICT Proposers’ Day: 22 January 2009, Budapest

20 Web sources INFSO.E1 website (under construction):
cordis.europa.eu/fp7/ict/language-technologies/.. FP7-ICT: ../fp7-call4_en.html ICT-PSP: ../cip-psp_en.html Events & Presentations Call guidance notes Background material & useful Links … EC contact: Mrs Susan Fraser

21 II. Practicalities & Success Factors
FP7-ICT Call II. Practicalities & Success Factors

22 LT Days 14-15 January, 2009 Luxembourg, JMO conference complex EC presentations, sessions w/ext speakers, proposal clinics, self-presentations & posters Agenda & registrations: cordis.europa.eu/fp7/ict/ language-technologies/fp7-call4_en.html

23 Pre-proposals & Clinics
3 pages max, mail to: describe the problem your proposal addresses, in particular specify the intended user profile and related tasks describe actual or prospective applications detail data sets: source(s), typology, volume how will the proposed project contribute to the outcomes and impacts set out in the work programme? what are the key innovations? what will be the main concrete results? what public outputs are foreseen? what impact do you expect? describe the consortium give partners' names or profiles and the intended skills mix indicate the intended instrument (if known) indicate the scale of your ambition what is the estimated effort (man-months) how long will the proposed project last? what amount of EU funding are you looking for?

24 Overall approach research for a purpose, problem driven
centred around people & tasks, data & flows a compelling use case is as important as the underlying research meaningful demonstrator(s) field validation & assessment active promotion & dissemination of results beyond purely scientific circles public outputs, public final showcase

25 Instruments IP up to 4 years, 5-8 Meuro (EU funding) NoE

26 Partnerships keep the consortium manageable: IPs 7-11 partners
STRs 5-7 partners NoEs “core” partners select competent, committed & reliable partners; geography not an issue! industry, SME, academia … participation as dictated by project needs user/industrial/commercial organisations to provide a demanding problem & validation context

27 Language coverage most of the work is expected to be language independent flexibility & ease of adaptation to other languages are indeed key factors many of the ancillary tasks & tools are language independent anyway project outcomes must however be validated in 3+ languages preferably belonging to different linguistic families target languages are chosen & justified by the proposers bearing in mind the following priorities (from high to low): EU official languages nationally recognised languages regional languages minority languages Non-EU world languages linked to global markets & exports can be considered as well on a proposal by proposal basis

28 Target industrial sectors
look for huge & growing data volumes competitive pressure high growth & innovation international markets obvious candidates ICT & media manufacturing process industries eg pharmaceuticals energy & utilities engineering & construction financial services …

29 Reasons for failure RTD content planning management
narrow scope, little or no EU dimension lack of focus, aims too general lack of innovation, current state of art missing planning links missing between objectives & work plan milestones missing or too general risk factors not addressed, no contingency plans no monitorable indicators, no metrics management consortium not balanced, gaps in the skills mix lack of integration between partners vague management structure weak or narrow dissemination plans ill-defined exploitation prospects

30 Success factors .1 Quality Impact Effectiveness but also
Relevance wrt. WP Credibility Evaluators will have access to Web sources: previous projects, teams & skills, background & reference documents …

31 Success factors .2 It’s a project, not a dissertation: problem? user?
data? outputs (incl. public ones)? metrics? impact? exploitation channels?

32 Success factors .3 preserve your credibility: select one proposal & make it win ensure that the proposal brings out both innovation & exploitation potential full depth of participation rather than long list of organisations with limited involvement key individuals, expertise & achievements rather than long list of previous projects make the proposal compelling for a busy reader (the first 5-10 pages are key!)

33 Time schedule call due to close 1 April, 2009
evaluation & selection until end June negotiation from mid-July on contract awarding in December projects due to start Q1 2010 … highly selective & demanding process

34 ICT-PSP Call Overview (subject to forthcoming adoption of WP, call budget & schedule)

35 ICT-PSP Call 3, Q1 09 geared towards innovation & ICT uptake:
ICT Policy Support Programme (PSP) within the Competitiveness & Innovation Framework Programme (CIP) (adopted in October 2006) geared towards innovation & ICT uptake: development of the Single European information space strengthening of the internal market for ICT products and services and ICT-based products and services stimulation of innovation through the wider adoption of and investment in ICT ensure seamless access to ICT-based services improve the conditions for the development of digital content, taking into account multilingualism & cultural diversity Takes over eContentplus activities from Jan 2009

36 “Europe’s language is Translation”
translation & interpretation market (exc. in-house): c $15 billion; €1.1 billion for EU institutions alone (2006) top EU-based translation company posted a revenue of $175 million in 2006 market fragmentation big players < 1000 employees est. 300,000 full time salaried translators worldwide (37% in Europe) a good European base SDL, Star, RWS, XRX, Euroscript, Logos, Moravia, VistaTEC, Semantix … ESTeam, Lucy Software … a largely untapped potential 4x according to some companies

37 Business world new models: Most companies follow the age-old translate-edit-proofread model of translation. Collaborative, web-based technologies allow translation to become more agile, faster, and better with fewer steps (CSA Inc.) new markets: Language Weaver is entering the three new strategic markets – Web Content, Business Intelligence and Customer Care – to provide high-volume, high-speed, and accurate automated translation solutions at a price that would have been unfathomable just a few years ago new approaches: If you don't see your native language here, you can help Google create it by becoming a volunteer translator. Check out our Google in Your Language program and then of course: Unfortunately for Google as a person with 7 years of translation experience myself I can tell that you will hardly ever find a translator who will agree that machine translation can be useful for anything. (a Russian translator)

38 ICT-PSP Call 3, Theme 5: Multilingual Web
3 objectives: machine translation for the multilingual Web (pilot projects) multilingual Web content management (pilot projects) standards & best practices for the multilingual Web (thematic network) 14 Meuro in total, around 6 projects “The duration of the pilot is expected to be 24 to 36 months within which there should be a 12-month operational phase.”

39 ICT-PSP Call 3, Theme 5: Multilingual Web
research: no, at least not ICT research … development/engineering: optimisation, customisation, integration … of existing (state of the art) methods, tools & services with a view to defining new approaches, offerings & practices demonstration: innovative combination is key; new business models, processes & services, organisational setups, usability … evaluation along user, technical & (socio-)economic dimensions problem orientation: useful & useable although possibly not perfect; think ROI

40 Scope & defs MT as defined in the ICT-PSP workprogramme encompasses
fully automatic machine translation, whatever the technology interactive computer-aided translation (eg TM) a suitable combination of 1. and/or 2. with web based human translation, proof-reading & post-editing incl. where relevant methods inspired from social networks workflow & content management systems, … innovative & effective combination of people, processes & technology; the end result is not science, rather more and/or better output save time cut cost emphasis on language transfer, from source language to target language(s) language input-output (e.g. speech-to-text) is not the focus cross-platform, multi-format content access/delivery is key

41 Language coverage some of the work is expected to be language independent flexibility & ease of adaptation to other languages are key factors content authoring & management, collaboration & workflow … are language independent anyway project outcomes must be validated in 3+ languages preferably belonging to different linguistic families target languages are chosen & justified by the proposers bearing in mind the following priorities (from high to low): EU official languages nationally recognised languages regional languages minority languages Non-EU world languages linked to global markets & exports can be considered as well on a proposal by proposal basis

42 Cont’d project’s language coverage driven by the need to:
address gaps & overcome barriers e.g. cross-border communication for less-developed languages, or exploit opportunities e.g. address emerging markets & sizeable language communities impact is key, so: viability, sustainability, exploitation channels, deployment prospects … main findings must be pro-actively disseminated some form of public showcase is mandatory participants should include private or public sector content owners & aggregators providers of language services, technology suppliers (online) communities of interest where relevant 6-7 partners/project, up to €2.5 million funding, up to 36 months

43 ICT-PSP Call 3 Feb 09 3 intertwined objectives:
5.1 machine translation for the multilingual Web (projects) information access: MT and other multilingual solutions for information access & use, esp. cross-lingual search & retrieval information publishing: MT to create, distribute and (re-)use more widely & effectively online content in a multilingual environment 5.3 multilingual Web content management (projects) communication: multilingual Web content development & management; design, authoring, versioning & maintenance of multilingual Web sites, portals or repositories 5.2 standards & best practices for the multilingual Web (network) conventions & best practices for multilingual Web content

44 ICT-PSP, 5.3 multilingual Web content management
methods, techniques, metrics … for developing & managing multilingual web content & services much more than translation; significant cultural elements think of one big website in many languages, or several interrelated websites, one country/language each now think of how to maintain the integrity & consistency of such resources, effectively & over a long period of time and how to detect & repair gaps or inconsistencies so, beyond the “translation” step (obj 5.1): design, authoring, versioning & maintenance of (multiple, parallel, interconnected …) websites, portals or repositories in a distributed collaborative environment, possibly across organisational boundaries so as to turn a multi-million endeavour into a viable proposition for a much broader range of companies & administrations

45 ICT-PSP, 5.1 machine translation for the multilingual Web
5.1 can be seen as a subset & central component of obj 5.3 (its “translation box”) different usages: web at large, enterprise, public information repositories … different users: teams as well as individuals, engineers as well as analysts, sales & marketing, language professionals, … you & me different content rich, information bound sectors, private & public quality depends on task & user from raw translation & “gisting” up to error-free translation two important conditions: widely recognised, well argued problem; clearly identified target community thorough validation in a given domain / for a given task volume metrics

46 ICT-PSP, 5.2 standards & best practices
Thematic network covers the same broad issues as 5.3 “the web as THE vehicle for multilingual content & services” provides a forum for multilateral exchange of experience & consensus building structure & tasks to be defined by the proposers, indicative list: bring together a meaningful subset of the main stakeholders, possibly through their own groups & associations ICT & language industries, content aggregators/distributors, e-services, multinational agencies, industry & de-jure standards bodies … analyse current situation, identify gaps & bottlenecks; assess market failures if any, specify technical & non-technical conditions to be met and the respective actors establish roadmap (trends, requirements, dependencies …) for further developments in the coming years stimulate consensus & active involvement/coordination; take part in leading conferences, liaise with primary associations etc. explore means to promote best practice (conferences, portals, publications, training …) beyond current channels propose suitable follow-on actions

47 ICT-PSP Instruments & Funding
pilot B projects: min. 4 partners from 4 different countries 50% of eligible direct costs flat 30% overhead rate of personnel costs thematic networks: min. 7 partners from 7 different countries lump sum; for 3 years and 1+10 participants: coordinator: 95 Keuro other participants: 24 Keuro each ec.europa.eu/information_society/activities/ict_psp/participating/index_en.htm

48 Email: infso-e1@ec.europa.eu
Practical info ICT-PSP Theme 5 – Multilingual Web budget: 14 Meuro under Call 3 managed by: Unit E1 EC contact: Mr Kimmo Rossi inquiries: from the call publication date (~Feb) pre-proposals: from publication until 3 weeks before the call closing date

49 Email: INFSO-E1@ec.europa.eu
Events Language Technology Days: 14-15 Jan 2009, Luxbg ICT-PSP Info Day: 26 Jan 2009, Brussels (tbc) URL: cordis.europa.eu/fp7/ict/language-technologies/.. FP7-ICT: ../fp7-call4_en.html ICT-PSP: ../cip-psp_en.html

50 Thank you!


Download ppt "Language based Interaction"

Similar presentations


Ads by Google