Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cooperation for Arabic Language Resources and Tools – The MEDAR Project Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa.

Similar presentations


Presentation on theme: "Cooperation for Arabic Language Resources and Tools – The MEDAR Project Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa."— Presentation transcript:

1 Cooperation for Arabic Language Resources and Tools – The MEDAR Project Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa Yaseen Presented by: Bente Maegaard, University of Copenhagen, Co-ordinator of MEDAR

2 2 MEDAR: Background and mission Mission Support the development of language technology, language resources and tools for the Arabic language Important for the people, the economy and the culture in the Arab countries But current efforts are too small and too fragmented MEDAR is funded by the European Commission, and focuses on the Mediterranean area, but our scope for collaboration is much broader – all Arab countries, all continents – and we also want to include other Semitic languages in the future.

3 3 MEDAR partners University of Copenhagen, Denmark (coord.) ELDA, France University of Balamand, Lebanon Al-Ahlyya Amman University, Jordan Universiteit Utrecht, The Netherlands ILSP - Athena, Greece RDI, Egypt Birzeit University, West Bank and Gaza Strip ENSIAS, University of Mohammed V Soussi, Morocco CEA, France CNRS, France The Open University, United Kingdom Université Lumière Lyon 2, France IBM, Egypt Sakhr, Egypt

4 4 MEDAR Objectives and ‘streams’ 1) Technical stream Survey of players, projects, products BLARK for Arabic Focus on multilingual tools, develop MT 2) Roadmap stream Cooperation roadmap Network creation 3) Dissemination stream

5 5 Multilingual sub-project Focus: Machine Translation English-Arabic Into Arabic Important to use Open Source Education and training

6 6 MT system, corpora MOSES was chosen as the MT system Wide community Already experiments English-Arabic Previous experience of consortium partners Basic MOSES system developed by Balamand Enhanced system provided by IBM Cairo and Dublin City University. Partners collected parallel corpus, monolingual corpora

7 7 Evaluation - 1 Automatic evaluation 10,000 words evaluation corpus In 200,000 words masking corpus Four human translations have been produced, validated Human evaluation

8 8 Evaluation - 2 Second evaluation campaign will take place in June External participants have been invited and expressed interest

9 9 Resources for the community MT systems, the baselines developed in the project will be made publicly available according to the original licenses (MOSES, Giza++..) Training data, through ELRA, fair conditions Evaluation package, through ELRA, fair conditions

10 10 Cooperation roadmap Roadmap concept Set goals Define the steps to get there Define timeline The MEDAR roadmap covers 3 periods 2010-2012 2012-2014 2013-2015

11 11 Elements of the roadmap Players and human resources, education Technology and R&D E-infrastructure: internet penetration, mobile penetration Market A few examples are presented here, please refer to the booklet

12 12 Players and human resources, Education Players need skilled work force - not enough HLT experts We need HLT enabled professionals Typically one could add Linguistics, phonetics, language or speech processing – to engineers’ education Computing, machine learning, language or speech processing – to linguists’ education Do this in collaboration with other universities in the region, and with e.g. universities in Europe or the US

13 13 Players and human resources, Education - 2 Staff exchange Student grants Participation of (more) Arabic partners in EU funded projects MEDAR has chosen this as an area to investigate further Partners will elaborate a cooperation scheme

14 14 Technology BLARK - Basic building blocks: LR and tools Reusable Can be shared with other players Follow standards We need more resources and tools for Semitic languages, and they need to be shared. Free or cheap. Essential for education, research and first development

15 15 Technology - 2 Driving applications Fight illiteracy through HLT – speech enabled software etc Collaborate to make this happen Governments could introduce eGovernment etc. Many basic technologies are needed Discussion ongoing with other parties Agree what they are Agree on distribution of tasks, if possible

16 16 E-infrastructure - Internet users

17 17 Penetration rates

18 18 Market Important factors Piracy (38% worldwide, 60% in Middle-East and Africa) Fight piracy – this is ongoing Provide IT services, not products which can be copied

19 19 Conclusions Long-term goal of MEDAR Create better conditions for the development of language and speech technology for Arabic – in order to support the people, the culture, the economy Through collaboration and networking Therefore we welcome all comments and invite for a broad cooperation, Not only for Arabic, also for other Semitic languages. And also with partners outside the EU/Mediterranean Arabic countries

20 20 MEDAR Acknowledgement: All MEDAR partners Mediterranean Arabic Language and Speech Technology See the full Roadmap report and other information at www.medar.info


Download ppt "Cooperation for Arabic Language Resources and Tools – The MEDAR Project Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa."

Similar presentations


Ads by Google