Presentation on theme: "1 Roberto Cencioni Kimmo Rossi Multilingual Web Theme 5 of the ICT-PSP Workprogramme DG Information Society and Media Unit INFSO.E1 Language Technologies."— Presentation transcript:
1 Roberto Cencioni Kimmo Rossi Multilingual Web Theme 5 of the ICT-PSP Workprogramme DG Information Society and Media Unit INFSO.E1 Language Technologies & Machine Translation email@example.com ICT 2008, 26 Nov 08
2 Why? – new online paradigms centred around communication, collaboration, co-creation … but significant language barriers remain – EU comprises 27 countries & 23 official languages – single European Information Space – one of the i2010 objectives – EC communication on Multilingualism (Sept 08) calls for a broader policy framework & joint action Purpose: support & enhance interpersonal & business communication information access & publishing across languages Baseline
3 A few facts EU official languages: 23 x 22 = 506 pairs – EC MT (Systran core engine) has 18 pairs in operation & 10 more pairs at prototype stage – 60+ national, regional & minority languages within the EU English accounts for 30% of todays Web content – 50% in 2000, 35% in 2004 – Arabic, Chinese, Portuguese … growing very fast nearly 1,5 billion internet users worldwide (2008) – c 320 million native EN speakers in the world basic requirements for the digital translation market: – volume – access – personalisation real quick, real cheap
4 Here we are a new unit established in July 2008 – Language Technologies & Machine Translation (INFSO.E1) – high expectations vs. low rate of EC S&T activity in the last few years language is everywhere – written & spoken; documents, messages, databases, webpages, multimedia objects etc; information as well as meta-information but our resources are limited, so initial focus on – multilingual technologies, services, applications two instruments in 2009: – Research: FP7 ICT, call 4 Objective 2.2 – Language based Interaction – Innovation: CIP ICT-PSP, call 3 Theme 5 – Multilingual Web total budget of 40 Meuro
5 Research vs. Innovation division of labour from – long term foundational research (FP7) through – applied research & technology development (FP7) to – integration & demonstration (FP7 + PSP) – infrastructure & resources (FP7 + PSP) different scale of ambition () different degree of maturity (technology service) different timescales & partnerships
8 Pre-proposals & Clinics 3 pages max, mail to: firstname.lastname@example.org@ec.europa.eu describe the problem your proposal addresses, in particular – specify the intended user profile and related tasks – describe actual or prospective applications – detail data sets: source(s), typology, volume how will the proposed project contribute to the outcomes and impacts set out in the work programme? – what are the key innovations? – what will be the main concrete results? – what public outputs are foreseen? – what impact do you expect? describe the consortium – give partners' names or profiles and the intended skills mix – indicate the intended instrument (if known) indicate the scale of your ambition – what is the estimated effort (man-months) – how long will the proposed project last? – what amount of EU funding are you looking for?
10 ICT-PSP Call 3, ~Feb 09 ICT Policy Support Programme (PSP) within the Competitiveness & Innovation Framework Programme (CIP) (adopted in October 2006) geared towards innovation & ICT uptake: – development of the Single European information space – strengthening of the internal market for ICT products and services and ICT-based products and services – stimulation of innovation through the wider adoption of and investment in ICT ensure seamless access to ICT-based services improve the conditions for the development of digital content, taking into account multilingualism & cultural diversity Takes over eContentplus activities from Jan 2009
11 translation & interpretation market (exc. in-house): – c $15 billion; 1.1 billion for EU institutions alone (2006) – est. 300,000 full time salaried translators worldwide (37% in Europe) market fragmentation – big players < 1000 employees – top EU-based translation company posted a revenue of $175 million in 2006 a good European base – SDL, Star, RWS, XRX, Euroscript, Logos, Moravia, VistaTEC, Semantix … – ESTeam, Lucy Software … a largely untapped potential – 4x according to some companies Europes language is Translation
12 Business world new models: Most companies follow the age-old translate-edit- proofread model of translation. Collaborative, web-based technologies allow translation to become more agile, faster, and better with fewer steps (CSA Inc.) new markets: Language Weaver is entering the three new strategic markets – Web Content, Business Intelligence and Customer Care – to provide high-volume, high-speed, and accurate automated translation solutions at a price that would have been unfathomable just a few years ago new approaches: If you don't see your native language here, you can help Google create it by becoming a volunteer translator. Check out our Google in Your Language programGoogle in Your Language and then of course: Unfortunately for Google as a person with 7 years of translation experience myself I can tell that you will hardly ever find a translator who will agree that machine translation can be useful for anything. (a Russian translator)
13 ICT-PSP Call 3, Theme 5: Multilingual Web 3 objectives: – machine translation for the multilingual Web (pilot projects) – multilingual Web content management (pilot projects) – best practices & standards for the multilingual Web (thematic network) 14 Meuro in total, around 6 projects The duration of the pilot is expected to be 24 to 36 months within which there should be a 12-month operational phase.
14 ICT-PSP Call 3, Theme 5: Multilingual Web research: no, at least not ICT research … development/engineering: – configuration, optimisation, customisation, integration … of existing (state of the art) methods, tools & services with a view to defining new approaches, offerings & practices demonstration: – innovative combination is key; new business models, processes & services, organisational setups, usability … – evaluation along user, technical & (socio-)economic dimensions problem orientation: – useful & useable although possibly not perfect; think ROI
15 Scope & defs MT as defined in the ICT-PSP workprogramme encompasses 1. fully automatic machine translation, whatever the technology 2. interactive computer-aided translation (eg TM) 3. a suitable combination of 1. and/or 2. with web based –human translation, proof-reading & post-editing incl. where relevant methods inspired from social networks –workflow & content management systems, … innovative & effective combination of people, processes & technology; the end result is not science, rather – more and/or better output – save time – cut cost emphasis on language transfer, from source language to target language(s) – language input-output (e.g. speech-to-text) is not the focus – cross-platform, multi-format content access/delivery is key
16 Language coverage some of the work is expected to be language independent – flexibility & ease of adaptation to other languages are key factors – content authoring & management, collaboration & workflow … are language independent anyway project outcomes must be validated in 3+ languages – preferably belonging to different linguistic families target languages are chosen & justified by the proposers bearing in mind the following priorities (from high to low): 1.EU official languages 2.nationally recognised languages 3.regional languages 4.minority languages Non-EU world languages linked to global markets & exports can be considered as well – on a proposal by proposal basis
17 Contd projects language coverage driven by the need to: – address gaps & overcome barriers e.g. cross-border communication for less-developed languages, or – exploit opportunities e.g. address emerging markets & sizeable language communities impact is key, so: viability, sustainability, exploitation channels, deployment prospects … main findings must be pro-actively disseminated some form of public showcase is mandatory participants should include – private or public sector content owners & aggregators – providers of language services, technology suppliers – (online) communities of interest where relevant 6-7 partners/project, up to 2.5 million funding, up to 36 months
18 ICT-PSP Call 3 exp. Feb 09 3 intertwined objectives: 5.1machine translation for the multilingual Web (projects) information access: MT and other multilingual solutions for information access & use, esp. cross-lingual search & retrieval information publishing: MT to create, distribute and (re-)use more widely & effectively online content in a multilingual environment 5.3multilingual Web content management (projects) communication: multilingual Web content development & management; design, authoring, versioning & maintenance of multilingual Web sites, portals or repositories 5.2 standards & best practices for the multilingual Web (network) conventions & best practices for multilingual Web content
19 ICT-PSP, 5.3 multilingual Web content management methods, techniques, metrics … for developing & managing multilingual web content & services – much more than translation; significant cultural elements think of – one big website in many languages, or – several interrelated websites, one country/language each now think of how to maintain the integrity & consistency of such resources, effectively & over a long period of time – and how to detect & repair gaps or inconsistencies so, beyond the translation step (obj 5.1): – design, authoring, versioning & maintenance of (multiple, parallel, interconnected …) websites, portals or repositories – in a distributed collaborative environment, possibly across organisational boundaries so as to turn a multi-million endeavour into a viable proposition for a much broader range of companies & administrations
20 ICT-PSP, 5.1 machine translation for the multilingual Web 5.1 can be seen as a subset & central component of obj 5.3 (its translation box) different usages: – web at large, enterprise, public information repositories … different users: – teams as well as individuals, engineers as well as analysts, sales & marketing, language professionals, … you & me different content rich, information bound sectors, private & public quality depends on task & user – from raw translation & gisting up to error-free translation two important conditions: – widely recognised, well argued problem; clearly identified target community – thorough validation in a given domain / for a given task volume metrics
21 ICT-PSP, 5.2 standards & best practices Thematic network covers the same broad issues as 5.3 – the web as THE vehicle for multilingual content & services provides a forum for multilateral exchange of experience & consensus building structure & tasks to be defined by the proposers, indicative list: – bring together a meaningful subset of the main stakeholders, possibly through their own groups & associations –ICT & language industries, content aggregators/distributors, e-services, multinational agencies, industry & de-jure standards bodies … – analyse current situation, identify gaps & bottlenecks; assess market failures if any, specify technical & non-technical conditions to be met and the respective actors –establish roadmap (trends, requirements, dependencies …) for further developments in the coming years – stimulate consensus & active involvement/coordination; take part in leading conferences, liaise with primary associations etc. –explore means to promote best practice (conferences, portals, publications, training …) beyond current channels – identify & describe suitable follow-on actions
22 ICT-PSP Instruments & Funding pilot B projects: – min. 4 partners from 4 different countries – 50% of eligible direct costs – 30% overhead rate of personnel costs thematic networks: – min. 7 partners from 7 different countries – lump sum; for 3 years and 1+10 participants: coordinator: 95 Keuro other participants: 24 Keuro each ec.europa.eu/information_society/activities/ict_psp/participating/index_en.htm
23 Practical info ICT-PSP Theme 5 – Multilingual Web budget: 14 Meuro under Call 3 managed by: Unit E1 Email: email@example.com@ec.europa.eu EC contact: Mr Kimmo Rossi inquiries: from the call publication date (~Feb) pre-proposals: from publication until 3 weeks before the call closing date
24 Events Language Technology Days: 14-15 Jan 2009, Luxbg ICT-PSP Info Day: 26 Jan 2009, Brussels (tbc) Email: INFSO-E1@ec.europa.euINFSO-E1@ec.europa.eu URL: cordis.europa.eu/fp7/ict/language-technologies/.. FP7-ICT:../fp7-call4_en.html ICT-PSP:../cip-psp_en.html
Your consent to our cookies if you continue to use this website.