Download presentation
Presentation is loading. Please wait.
1
Language based Interaction
ICT 2008, Lyon, 26 Nov 08 Roberto Cencioni Kimmo Rossi Challenge 2 – Objective 2.2 Language based Interaction DG Information Society and Media Unit INFSO.E1 Language Technologies & Machine Translation
2
Outline Opening remarks FP7 ICT Call 4 – Essence
FP7 ICT Call 4 – Ingredients Q&A CIP ICT-PSP Call 3 – Opportunities Q&A, close
3
Here we are a new unit established in July 2008 language is everywhere
Language Technologies & Machine Translation (INFSO.E1) high expectations vs. low rate of EC S&T activity in the last few years language is everywhere written & spoken; documents, messages, databases, webpages, multimedia objects etc; information as well as meta-information but our resources are limited, so initial focus on multilingual technologies, services, applications two instruments in 2009: Research: FP7 ICT, call 4 Objective 2.2 – Language based Interaction Innovation: CIP ICT-PSP, call 3 Theme 5 – Multilingual Web total budget of 40 Meuro
4
Baseline Why? Purpose: support & enhance across languages
new online paradigms centred around communication, collaboration, co-creation … but significant language barriers remain EU comprises 27 countries & 23 official languages single European Information Space – one of the i2010 objectives EC communication on Multilingualism (Sept ‘08) calls for a broader policy framework & joint action Purpose: support & enhance interpersonal & business communication information access & publishing across languages
5
A few facts EU official languages: 23 x 22 = 506 pairs
EC MT (Systran core engine) has 18 pairs in operation & 10 more pairs at prototype stage 60+ national, regional & minority languages within the EU English accounts for 30% of today’s Web content 50% in 2000, 35% in 2004 Arabic, Chinese, Portuguese … growing very fast nearly 1,5 billion internet users worldwide (2008) c 320 million native EN speakers in the world basic requirements for the “digital translation market”: volume access personalisation real quick, real cheap
6
Can’t this be done elsewhere?
indeed EU RTD projects often exhibit multilingual features yet approaches are too often naïve, short term, sectoral hence a dedicated focal point stimulating upstream research enhancing research capacity thus enabling more ambitious & impactful domain specific actions
7
Research vs. Innovation division of labour
from long term foundational research (FP7) through applied research & technology development (FP7) to integration & demonstration (FP7 + PSP) infrastructure & resources (FP7 + PSP) different scale of ambition (€) different level of maturity (technologyservice) different timescales & partnerships
8
I. Workprogramme R&D topics & outcomes
FP7-ICT Call I. Workprogramme R&D topics & outcomes
9
What technology can offer today
machine translation & translation memory making sense of online content improving productivity of human translation automatic translation of “acceptable” quality in specific domains / language pairs information search & retrieval find relevant information across languages information extraction, filtering, categorisation incl. summarization, routing & alert services, … for a variety of purposes eg business intelligence speech technology command & control, dictation systems call center services, conversational systems
10
Trends new requirements, new approaches what happens to online content
from Web 1.X to Web 2.0 – we are all content producers from static & uni-directional to dynamic, volatile, collaborative from service to self-service, translations are needed “on the fly” are language technologies up to the task? what happens to online content disappearing document? Europa website: 6 million “documents” elusive distinction between content & service how to manage effectively multilingual content multilingualism on the rise in the EU (from 4 to 23 languages) and globally English gains ground but mother tongues remain online content becomes even more multilingual
11
What technology might offer tomorrow
machine translation MT that learns from its mistakes embedded in products/services, can cover any use context esp.online: chats, blogs, dynamic content ... broader coverage, fill in missing languages information search & retrieval truly multilingual access to information: query in any language, content automatically translated website content development & management new content is translated automatically changes automatically applied in all language versions speech technology real-time speech-to-speech translation (eg phone call, in a conference)
12
Challenges for MT bring MT to the users MT that learns & adapts
understand what users need novel use scenarios communication rather than translation better evaluation metrics MT that learns & adapts how to exploit feedback from users how to use readily available “world knowledge” towards a paradigm shift? inspiration from: machine learning, cognitive systems, psycho-linguistics, sociology, semantic web, data mining, new computing paradigms ...
13
FP7-ICT Call 4 at a glance Core research exploring new avenues for machine translation (IP) ground breaking, multidisciplinary, high risk – high promise research architectures & technologies that learn and adapt flexibly & effectively to different languages, domains & tasks catering for new forms of language & communication (eg online communities; dynamic, volatile …) Problem oriented research for specific tasks & usage contexts (STR) online translation for the masses translation in distributed collaborative environments managing multilingual communication & content automatic acquisition & annotation of language resources Community building & networking (NOE) reinvigorate European machine translation (MT) community build bridges between MT & MLT and other relevant disciplines help develop & coordinate shared technical infrastructure, promote reusability & interoperability, foster evaluation
14
Outcome a) IP Core research. Explore new research avenues (one IP, up to 8 M) break new ground, foster a novel multi-disciplinary approach to machine translation architectures & technologies that can learn and adapt flexibly & effectively to different languages, domains & tasks catering for new forms of language & communication (eg online communities) high risk but high promise (accuracy, speed, scalability) language & translation models coupled with data driven, machine learning methods automatic acquisition & representation of linguistic facts semantics, models of world knowledge relevant for translation approaches inspired from social networks …
15
Outcome b) STR Problem oriented research. A clearly defined usage context (~5 STR’s, c 12 M) online translation for the masses wide coverage (beyond GoogleTranslate); adequate quality, suitable at least for gisting/browsing; language embedded in documents, web pages, multimedia objects … translation in distributed environments support non-linear collaborative interplay between authors, translators, editors/publishers & active users; innovative integration of automatic, interactive & human translation beyond current practice; technologies as well as processes & social interaction managing multilingual content & communication a superset of the above addressing the development & management of online content & services esp. their versioning & maintenance in multiple languages acquisition & annotation of language resources (nearly-)automatic, high volume, high performance mining the web as well available repositories (eg corpora) and public information sources
16
Outcome b) managing multilingual Web content
methods, techniques, metrics … for developing & managing multilingual web content & services much more than translation; significant cultural elements think of one big website in many languages, or several interrelated websites, one country/language each now think of how to maintain the integrity & consistency of such resources, effectively & over a long period of time and how to detect & repair gaps or inconsistencies so, beyond the “translation” step: design, authoring, versioning & maintenance of (multiple, parallel, interconnected …) websites, portals or repositories in a distributed collaborative environment, possibly across organisational boundaries
17
Outcome c) NOE Community building & networking (1 or 2 NoEs, up to 6 M) reinvigorate Europe’s machine translation (MT) community bring together key players from scientific, technical & commercial circles (esp. SMEs) stimulate cross-border cooperation (teams, institutions, national initiatives) assess skills, foster training & exchanges; support smaller teams & not well-served languages identify gaps, establish roadmap encompassing technologies, resources & applications build bridges between MT & MLT community and other relevant disciplines stimulate dialogue between diverse communities; identify opportunities & bottlenecks initiate integrative research, prepare the ground for further collaboration explore medium to long term approaches, identify possible shifts in paradigm develop & coordinate shared technical infrastructure, reusability & interoperability, evaluation infrastructural support: portal services, inventories & repositories of general interest tools & raw/annotated datasets, their documentation active promotion of reusability & open-source; harmonisation of representation & annotation schemes foster widely recognized benchmarks ...
18
What we don’t do Not supported under Call 4:
approaches that do not promise to deliver performance along with portability, scalability & maintenability yes: emphasis on automation, flexibility & cost effectiveness developments addressing immediate commercial concerns no: adding a language pair to an existing product proposals that do not address « language transfer » yes: focus on mapping a source language into one or several target languages issues covered by other Challenges and Objectives no: HMI, interaction with robots, ambient intelligence … topics well covered by recent & ongoing projects no: sign languages, dialogue systems …
19
Email: infso-e1@ec.europa.eu
Practical info FP7-ICT Objective 2.2 – Language based interation budget: 26 Meuro under Call 4 managed by: Unit E1 EC contact: Mr Kimmo Rossi inquiries: available pre-proposals: from Dec 1st until 3 weeks before the call closing date (Apr 1st) Language Technology Days: January 2009, Luxbg ICT Proposers’ Day: 22 January 2009, Budapest
20
Web sources INFSO.E1 website (under construction):
cordis.europa.eu/fp7/ict/language-technologies/.. FP7-ICT: ../fp7-call4_en.html ICT-PSP: ../cip-psp_en.html Events & Presentations Call guidance notes Background material & useful Links … EC contact: Mrs Susan Fraser
21
II. Practicalities & Success Factors
FP7-ICT Call II. Practicalities & Success Factors
22
LT Days 14-15 January, 2009 Luxembourg, JMO conference complex EC presentations, sessions w/ext speakers, proposal clinics, self-presentations & posters Agenda & registrations: cordis.europa.eu/fp7/ict/ language-technologies/fp7-call4_en.html
23
Pre-proposals & Clinics
3 pages max, mail to: describe the problem your proposal addresses, in particular specify the intended user profile and related tasks describe actual or prospective applications detail data sets: source(s), typology, volume how will the proposed project contribute to the outcomes and impacts set out in the work programme? what are the key innovations? what will be the main concrete results? what public outputs are foreseen? what impact do you expect? describe the consortium give partners' names or profiles and the intended skills mix indicate the intended instrument (if known) indicate the scale of your ambition what is the estimated effort (man-months) how long will the proposed project last? what amount of EU funding are you looking for?
24
Overall approach research for a purpose, problem driven
centred around people & tasks, data & flows a compelling use case is as important as the underlying research meaningful demonstrator(s) field validation & assessment active promotion & dissemination of results beyond purely scientific circles public outputs, public final showcase
25
Instruments IP up to 4 years, 5-8 Meuro (EU funding) NoE
26
Partnerships keep the consortium manageable: IPs 7-11 partners
STRs 5-7 partners NoEs “core” partners select competent, committed & reliable partners; geography not an issue! industry, SME, academia … participation as dictated by project needs user/industrial/commercial organisations to provide a demanding problem & validation context
27
Language coverage most of the work is expected to be language independent flexibility & ease of adaptation to other languages are indeed key factors many of the ancillary tasks & tools are language independent anyway project outcomes must however be validated in 3+ languages preferably belonging to different linguistic families target languages are chosen & justified by the proposers bearing in mind the following priorities (from high to low): EU official languages nationally recognised languages regional languages minority languages Non-EU world languages linked to global markets & exports can be considered as well on a proposal by proposal basis
28
Target industrial sectors
look for huge & growing data volumes competitive pressure high growth & innovation international markets obvious candidates ICT & media manufacturing process industries eg pharmaceuticals energy & utilities engineering & construction financial services …
29
Reasons for failure RTD content planning management
narrow scope, little or no EU dimension lack of focus, aims too general lack of innovation, current state of art missing planning links missing between objectives & work plan milestones missing or too general risk factors not addressed, no contingency plans no monitorable indicators, no metrics management consortium not balanced, gaps in the skills mix lack of integration between partners vague management structure weak or narrow dissemination plans ill-defined exploitation prospects
30
Success factors .1 Quality Impact Effectiveness but also
Relevance wrt. WP Credibility Evaluators will have access to Web sources: previous projects, teams & skills, background & reference documents …
31
Success factors .2 It’s a project, not a dissertation: problem? user?
data? outputs (incl. public ones)? metrics? impact? exploitation channels? …
32
Success factors .3 preserve your credibility: select one proposal & make it win ensure that the proposal brings out both innovation & exploitation potential full depth of participation rather than long list of organisations with limited involvement key individuals, expertise & achievements rather than long list of previous projects make the proposal compelling for a busy reader (the first 5-10 pages are key!)
33
Time schedule call due to close 1 April, 2009
evaluation & selection until end June negotiation from mid-July on contract awarding in December projects due to start Q1 2010 … highly selective & demanding process
34
ICT-PSP Call Overview (subject to forthcoming adoption of WP, call budget & schedule)
35
ICT-PSP Call 3, Q1 09 geared towards innovation & ICT uptake:
ICT Policy Support Programme (PSP) within the Competitiveness & Innovation Framework Programme (CIP) (adopted in October 2006) geared towards innovation & ICT uptake: development of the Single European information space strengthening of the internal market for ICT products and services and ICT-based products and services stimulation of innovation through the wider adoption of and investment in ICT ensure seamless access to ICT-based services improve the conditions for the development of digital content, taking into account multilingualism & cultural diversity Takes over eContentplus activities from Jan 2009
36
“Europe’s language is Translation”
translation & interpretation market (exc. in-house): c $15 billion; €1.1 billion for EU institutions alone (2006) top EU-based translation company posted a revenue of $175 million in 2006 market fragmentation big players < 1000 employees est. 300,000 full time salaried translators worldwide (37% in Europe) a good European base SDL, Star, RWS, XRX, Euroscript, Logos, Moravia, VistaTEC, Semantix … ESTeam, Lucy Software … a largely untapped potential 4x according to some companies
37
Business world new models: Most companies follow the age-old translate-edit-proofread model of translation. Collaborative, web-based technologies allow translation to become more agile, faster, and better with fewer steps (CSA Inc.) new markets: Language Weaver is entering the three new strategic markets – Web Content, Business Intelligence and Customer Care – to provide high-volume, high-speed, and accurate automated translation solutions at a price that would have been unfathomable just a few years ago new approaches: If you don't see your native language here, you can help Google create it by becoming a volunteer translator. Check out our Google in Your Language program and then of course: Unfortunately for Google as a person with 7 years of translation experience myself I can tell that you will hardly ever find a translator who will agree that machine translation can be useful for anything. (a Russian translator)
38
ICT-PSP Call 3, Theme 5: Multilingual Web
3 objectives: machine translation for the multilingual Web (pilot projects) multilingual Web content management (pilot projects) standards & best practices for the multilingual Web (thematic network) 14 Meuro in total, around 6 projects “The duration of the pilot is expected to be 24 to 36 months within which there should be a 12-month operational phase.”
39
ICT-PSP Call 3, Theme 5: Multilingual Web
research: no, at least not ICT research … development/engineering: optimisation, customisation, integration … of existing (state of the art) methods, tools & services with a view to defining new approaches, offerings & practices demonstration: innovative combination is key; new business models, processes & services, organisational setups, usability … evaluation along user, technical & (socio-)economic dimensions problem orientation: useful & useable although possibly not perfect; think ROI
40
Scope & defs MT as defined in the ICT-PSP workprogramme encompasses
fully automatic machine translation, whatever the technology interactive computer-aided translation (eg TM) a suitable combination of 1. and/or 2. with web based human translation, proof-reading & post-editing incl. where relevant methods inspired from social networks workflow & content management systems, … innovative & effective combination of people, processes & technology; the end result is not science, rather more and/or better output save time cut cost emphasis on language transfer, from source language to target language(s) language input-output (e.g. speech-to-text) is not the focus cross-platform, multi-format content access/delivery is key
41
Language coverage some of the work is expected to be language independent flexibility & ease of adaptation to other languages are key factors content authoring & management, collaboration & workflow … are language independent anyway project outcomes must be validated in 3+ languages preferably belonging to different linguistic families target languages are chosen & justified by the proposers bearing in mind the following priorities (from high to low): EU official languages nationally recognised languages regional languages minority languages Non-EU world languages linked to global markets & exports can be considered as well on a proposal by proposal basis
42
Cont’d project’s language coverage driven by the need to:
address gaps & overcome barriers e.g. cross-border communication for less-developed languages, or exploit opportunities e.g. address emerging markets & sizeable language communities impact is key, so: viability, sustainability, exploitation channels, deployment prospects … main findings must be pro-actively disseminated some form of public showcase is mandatory participants should include private or public sector content owners & aggregators providers of language services, technology suppliers (online) communities of interest where relevant 6-7 partners/project, up to €2.5 million funding, up to 36 months
43
ICT-PSP Call 3 Feb 09 3 intertwined objectives:
5.1 machine translation for the multilingual Web (projects) information access: MT and other multilingual solutions for information access & use, esp. cross-lingual search & retrieval information publishing: MT to create, distribute and (re-)use more widely & effectively online content in a multilingual environment 5.3 multilingual Web content management (projects) communication: multilingual Web content development & management; design, authoring, versioning & maintenance of multilingual Web sites, portals or repositories 5.2 standards & best practices for the multilingual Web (network) conventions & best practices for multilingual Web content
44
ICT-PSP, 5.3 multilingual Web content management
methods, techniques, metrics … for developing & managing multilingual web content & services much more than translation; significant cultural elements think of one big website in many languages, or several interrelated websites, one country/language each now think of how to maintain the integrity & consistency of such resources, effectively & over a long period of time and how to detect & repair gaps or inconsistencies so, beyond the “translation” step (obj 5.1): design, authoring, versioning & maintenance of (multiple, parallel, interconnected …) websites, portals or repositories in a distributed collaborative environment, possibly across organisational boundaries so as to turn a multi-million endeavour into a viable proposition for a much broader range of companies & administrations
45
ICT-PSP, 5.1 machine translation for the multilingual Web
5.1 can be seen as a subset & central component of obj 5.3 (its “translation box”) different usages: web at large, enterprise, public information repositories … different users: teams as well as individuals, engineers as well as analysts, sales & marketing, language professionals, … you & me different content rich, information bound sectors, private & public quality depends on task & user from raw translation & “gisting” up to error-free translation two important conditions: widely recognised, well argued problem; clearly identified target community thorough validation in a given domain / for a given task volume metrics
46
ICT-PSP, 5.2 standards & best practices
Thematic network covers the same broad issues as 5.3 “the web as THE vehicle for multilingual content & services” provides a forum for multilateral exchange of experience & consensus building structure & tasks to be defined by the proposers, indicative list: bring together a meaningful subset of the main stakeholders, possibly through their own groups & associations ICT & language industries, content aggregators/distributors, e-services, multinational agencies, industry & de-jure standards bodies … analyse current situation, identify gaps & bottlenecks; assess market failures if any, specify technical & non-technical conditions to be met and the respective actors establish roadmap (trends, requirements, dependencies …) for further developments in the coming years stimulate consensus & active involvement/coordination; take part in leading conferences, liaise with primary associations etc. explore means to promote best practice (conferences, portals, publications, training …) beyond current channels propose suitable follow-on actions
47
ICT-PSP Instruments & Funding
pilot B projects: min. 4 partners from 4 different countries 50% of eligible direct costs flat 30% overhead rate of personnel costs thematic networks: min. 7 partners from 7 different countries lump sum; for 3 years and 1+10 participants: coordinator: 95 Keuro other participants: 24 Keuro each ec.europa.eu/information_society/activities/ict_psp/participating/index_en.htm
48
Email: infso-e1@ec.europa.eu
Practical info ICT-PSP Theme 5 – Multilingual Web budget: 14 Meuro under Call 3 managed by: Unit E1 EC contact: Mr Kimmo Rossi inquiries: from the call publication date (~Feb) pre-proposals: from publication until 3 weeks before the call closing date
49
Email: INFSO-E1@ec.europa.eu
Events Language Technology Days: 14-15 Jan 2009, Luxbg ICT-PSP Info Day: 26 Jan 2009, Brussels (tbc) URL: cordis.europa.eu/fp7/ict/language-technologies/.. FP7-ICT: ../fp7-call4_en.html ICT-PSP: ../cip-psp_en.html
50
Thank you!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.