Presentation is loading. Please wait.

Presentation is loading. Please wait.

Languages & the Media Berlin, November 23rd 2012 Davor Orlic Knowledge for All Foundation Ltd.

Similar presentations


Presentation on theme: "Languages & the Media Berlin, November 23rd 2012 Davor Orlic Knowledge for All Foundation Ltd."— Presentation transcript:

1 Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

2  VideoLectures.NET  Content, Statistics, Licenses, Partners  Education, MOOCs, OpenCourseWare Consortium, Opencast Matterhorn  The idea behind  Who, What, Why, When?  transLectures  Pillars, Current status, Results and Demo

3  WHAT IS IT?  VideoLectures.NET is the largest OER free and open access digital library of academic talks. The lectures are given by distinguished scholars and scientists at conferences, summer schools, workshops.  WHAT IS THE CONTENT?  Content built up via European research projects based in Computer Science fields. Other content from OCW partners.  WHAT ARE THE STATS?  732 events, 10512 authors, 13726 lectures, 15965 videos  Visits: 9,626,639  Page views: 26,011,939  Signed in users: 23560  Licenses: CC-NC-ND

4

5

6  I enrolled in the MOOC “Intro to Databases” winter 2011 at Coursera  108,000 accounts  475,000 assignment submissions  3,150,000 video views (heavy use of video)  Wouldn't it be awesome if all such content and future options would be multilingual?  Language personalisation for millions of students  Video, audio, papers, coursework - all multilingual

7  WHAT WAS THE REASONING?  Huge set of HigherEd users (undergrads, MA, MSc, PhD)  Huge collection of videos  Videos are made of audio and video  Audio and video are data  Data can be harvested, changed and remixed  WHAT IF?  We capture the audio  Transform it into text  WHAT THEN?  We can have subtitles, transcriptions, translations, personalisation, contextualisation, descriptions, time alignment, fragmentation, recommendations, for 15965 academic talks

8 Same for: Speech Processing, Text Analysis, Speech and Text Resources Most of Europe's Languages are apparently unlikely to survive in the digital age. (META-NET white paper)

9  LEARNERS PREFER VIDEO?  YouTube (78 hours per minute upload)  MOOCs (3 mio accounts)  INITIATIVES AROUND VIDEO?  Open content: OCW (20.000 courses)  Massive lecture capture system: Opencast Matterhorn project (700 Universities)  Massive portals specialized in video lectures: VLN, Polimedia (25.000 academic videos)

10  SPECS?  Cost: 4,5 mio EUR  Project ref no. ICT-287755  Project acronym: transLectures  Project full title: Transcription and Translation of Video Lectures  Instrument: ICT-2011.4.2 Language Technologies  Thematic Priority: STREP  Start date / duration: 01 November 2011 / 36 Months  WHO?  Universidad Politecnica De Valencia, Xerox, Knowledge 4 All Foundation Ltd., RWTH, European Media Laboratory Gmbh, Deluxe Digital Studios Ltd  OpenCast Matterhorn, VideoLectures.Net, Polmedia

11  WHAT IS THE AIM?  To develop innovative, cost-effective solutions to produce accurate transcriptions and translations in VideoLectures,  To deploy those tools across other Matterhorn-related repositories.  For translation, we consider the language pairs: en ⇆ es, en ⇆ sl, en  fr and en  de.  WHAT IS THE IMPACT?  A big step in making educational repositories truly accessible both to speakers of different languages and to people with disabilities.  ADDITIONAL VALUE?  Imagine having 16000 lectures in most of the world`s languages.

12  KEYWORDS?  language technologies, machine translation, automatic speech recognition, massive adaptation, intelligent interaction, education, video lectures, multilingualism, accessibility  WHY TRANSCRIPTION & TRANSLATION?  There are accessibility issues that can be solved by transcription  Non-native speakers understand better by reading than by hearing  At least 1,300 different languages with more than 100,000 native speakers  No language with more than 20% of the world population

13  TRANSCRIPTION (EML)  the complete transcription of English lectures took 45000 hours (2 months running parallel)  TRANSLATION (XRCE, UPV, RWTH)  different segmentation strategies for transcription and translation being considered  INTELLIGENT INTERACTION WITH USERS  experimental protocol to evaluate intelligent interactive approaches for users  INTEGRATION  first steps on integration software into VL, Polimedia, Matterhorn  EVALUATION  human evaluations for the second round of evaluation

14

15  Technology is good enough for transcription & translation  We are going to develop open tools for transcription and translation  Deploy the tools in the Opencast Matterhorn system  Think of a business plan and ideas on a spin-off  Provide optimisations for existing languages  Ideally extend the language set to Chinese, Hindi and other  Is intelligent interaction a realistic concept?  More focus on English into Slovenian translations to improve them.  Work on building a community of students for evaluation

16 Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd

17  Accuracy estimation for each transcription and translation.  Adjustable computational behaviour.  Output constrained to user preferences and corrections.  Fast learning from user corrections.

18  WHAT IS IT?  K4ALL is a Foundation based in London (2010) with the goal of providing the legacy of the PASCAL2 Network of Excellence (machine learning), part of this legacy is also the VideoLectures.NET website and strong connections in Opencast Foundation (creating the Matterhorn software) and Open Courseware Consortium.  WHAT DOES IT DO?  I4All: Provision and distribution of infrastructure that supports the K4A mission  S4All: Online Science video journals and conference special issues  E4All: Organization and access to educational material  R4All: Research that facilitates the mission of K4All  A4All: Ensuring accessibility for as wide an audience as possible


Download ppt "Languages & the Media Berlin, November 23rd 2012 Davor Orlic Knowledge for All Foundation Ltd."

Similar presentations


Ads by Google