Download presentation
Presentation is loading. Please wait.
Published byBrenda Turner Modified over 9 years ago
1
Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd
2
VideoLectures.NET Content, Statistics, Licenses, Partners Education, MOOCs, OpenCourseWare Consortium, Opencast Matterhorn The idea behind Who, What, Why, When? transLectures Pillars, Current status, Results and Demo
3
WHAT IS IT? VideoLectures.NET is the largest OER free and open access digital library of academic talks. The lectures are given by distinguished scholars and scientists at conferences, summer schools, workshops. WHAT IS THE CONTENT? Content built up via European research projects based in Computer Science fields. Other content from OCW partners. WHAT ARE THE STATS? 732 events, 10512 authors, 13726 lectures, 15965 videos Visits: 9,626,639 Page views: 26,011,939 Signed in users: 23560 Licenses: CC-NC-ND
6
I enrolled in the MOOC “Intro to Databases” winter 2011 at Coursera 108,000 accounts 475,000 assignment submissions 3,150,000 video views (heavy use of video) Wouldn't it be awesome if all such content and future options would be multilingual? Language personalisation for millions of students Video, audio, papers, coursework - all multilingual
7
WHAT WAS THE REASONING? Huge set of HigherEd users (undergrads, MA, MSc, PhD) Huge collection of videos Videos are made of audio and video Audio and video are data Data can be harvested, changed and remixed WHAT IF? We capture the audio Transform it into text WHAT THEN? We can have subtitles, transcriptions, translations, personalisation, contextualisation, descriptions, time alignment, fragmentation, recommendations, for 15965 academic talks
8
Same for: Speech Processing, Text Analysis, Speech and Text Resources Most of Europe's Languages are apparently unlikely to survive in the digital age. (META-NET white paper)
9
LEARNERS PREFER VIDEO? YouTube (78 hours per minute upload) MOOCs (3 mio accounts) INITIATIVES AROUND VIDEO? Open content: OCW (20.000 courses) Massive lecture capture system: Opencast Matterhorn project (700 Universities) Massive portals specialized in video lectures: VLN, Polimedia (25.000 academic videos)
10
SPECS? Cost: 4,5 mio EUR Project ref no. ICT-287755 Project acronym: transLectures Project full title: Transcription and Translation of Video Lectures Instrument: ICT-2011.4.2 Language Technologies Thematic Priority: STREP Start date / duration: 01 November 2011 / 36 Months WHO? Universidad Politecnica De Valencia, Xerox, Knowledge 4 All Foundation Ltd., RWTH, European Media Laboratory Gmbh, Deluxe Digital Studios Ltd OpenCast Matterhorn, VideoLectures.Net, Polmedia
11
WHAT IS THE AIM? To develop innovative, cost-effective solutions to produce accurate transcriptions and translations in VideoLectures, To deploy those tools across other Matterhorn-related repositories. For translation, we consider the language pairs: en ⇆ es, en ⇆ sl, en fr and en de. WHAT IS THE IMPACT? A big step in making educational repositories truly accessible both to speakers of different languages and to people with disabilities. ADDITIONAL VALUE? Imagine having 16000 lectures in most of the world`s languages.
12
KEYWORDS? language technologies, machine translation, automatic speech recognition, massive adaptation, intelligent interaction, education, video lectures, multilingualism, accessibility WHY TRANSCRIPTION & TRANSLATION? There are accessibility issues that can be solved by transcription Non-native speakers understand better by reading than by hearing At least 1,300 different languages with more than 100,000 native speakers No language with more than 20% of the world population
13
TRANSCRIPTION (EML) the complete transcription of English lectures took 45000 hours (2 months running parallel) TRANSLATION (XRCE, UPV, RWTH) different segmentation strategies for transcription and translation being considered INTELLIGENT INTERACTION WITH USERS experimental protocol to evaluate intelligent interactive approaches for users INTEGRATION first steps on integration software into VL, Polimedia, Matterhorn EVALUATION human evaluations for the second round of evaluation
15
Technology is good enough for transcription & translation We are going to develop open tools for transcription and translation Deploy the tools in the Opencast Matterhorn system Think of a business plan and ideas on a spin-off Provide optimisations for existing languages Ideally extend the language set to Chinese, Hindi and other Is intelligent interaction a realistic concept? More focus on English into Slovenian translations to improve them. Work on building a community of students for evaluation
16
Languages & the Media Berlin, November 23rd 2012 Davor Orlic davor.orlic@ijs.si Knowledge for All Foundation Ltd
17
Accuracy estimation for each transcription and translation. Adjustable computational behaviour. Output constrained to user preferences and corrections. Fast learning from user corrections.
18
WHAT IS IT? K4ALL is a Foundation based in London (2010) with the goal of providing the legacy of the PASCAL2 Network of Excellence (machine learning), part of this legacy is also the VideoLectures.NET website and strong connections in Opencast Foundation (creating the Matterhorn software) and Open Courseware Consortium. WHAT DOES IT DO? I4All: Provision and distribution of infrastructure that supports the K4A mission S4All: Online Science video journals and conference special issues E4All: Organization and access to educational material R4All: Research that facilitates the mission of K4All A4All: Ensuring accessibility for as wide an audience as possible
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.