Saras Shareable Rich Media Learning Object Repositories and Management for e-Learning Chitra Dorai IBM T.J. Watson Research Center New York

Saras Shareable Rich Media Learning Object Repositories and Management for e-Learning Chitra Dorai IBM T.J. Watson Research Center New York dorai@us.ibm.com (Saras(wati), a Sanskrit word for flow of knowledge/Goddess of Learning)

Overview of e-Learning Content Management Research E-learning media semantic analysis for metadata generation SCORM and MPEG-7 conformant asset metadata model Search and browse client interfaces Text, Images Course catalogs, Student Assessments Content Manager Asset Repository Search & Browse Client LO ingest Learning Management System Learning Authoring Tool E-Learning Media Analyzer Metadata Audio, Video SCORM / MPEG-7 Data Model (DD) Discussion Sections Narration sections Dialog, interviews,... raw footage, text,... Video On-screen narration Voice Over Direct Narration Assistive Narration Uninterrupted Voice Over Interrupted Voice Over Linkage Sections (DN)(AN)(UV)(IV)(LF) Multimodal narrative structure analysis for partitioning of instructional media Manage learning assets of various types Middleware for shareable learning object repositories Metadata model creation from XML schema

Project Goals Develop SCORM support technologies Enable generic content repositories (CMv8 and DB2) to support standards compliant e- learning and transform into shareable and interoperable learning object repositories Analyze instructional media for automated SCORM/MPEG-7 compliant metadata generation

The Department of Defense (DoD) established Advanced Distributed Learning (ADL) initiative in 1997. ADL develops strategy for using learning and information technologies to modernize education and training on the Web, and to promote e-learning standardization. SCORM (Shareable Content Object Reference Model): ADL reference model for shareable learning content objects that enable interoperability, accessibility and reusability of Web-based learning content. Content Aggregation Model: LO Metadata, Content Packaging SCORM is built on many e-Learning standardization efforts --- AICC, IMS, IEEE LOM (became a standard in 06/02), ARIADNE. E-Learning and Standards

SCORM LOM Overview Nine learning object metadata categories from IEEE LOM specification –General, Lifecycle, Meta-metadata, Technical, Educational, Rights, Relation, Annotation, and Classification IMSs XML binding specification for metadata representation Describe three content model components –Asset, Sharable Content Object (SCO), Content Aggregation

Enabling Content Repositories for e-Learning Objective: Develop middleware tools to enable content management products (IBM CM v8) and databases (DB2) for standards- based e-Learning archival and for supporting SCORM- compliant learning object metadata. Creation of SCORM compliant learning object metadata model on a repository Automated storage of learning objects and their meta-data in the content repository Search and retrieval of learning objects based on their meta-data

E-Learning Content Management with Content Manager

Meta-data Generation Pages

Automated Instructional Media Analysis Objectives: –Develop technologies for standards-based e-learning content tagging, supporting shareable and searchable learning object repositories with rich media. Rich instructional media analysis for automated extraction of learning objects and their metadata from media for content-based search and browse

Problem with the State of the Art The user seeks semantic similarity, the [multimedia] database can only provide similarity on data processing Existing content annotation/management systems cannot ensure reliable content location and access –Fall far short from the expectations of users: Semantic gap –Generic, low-level annotations that deal only with characterizing perceived content, not the meaning of it –Lack of structure in content organization for non-linear navigation

Our Approach to Media Semantics Analysis New Research Approach: Computational Media Aesthetics is the algorithmic study of visual and aural elements in media and associated analysis of the principles that underlie their manipulation in the creative art of clarifying and interpreting some event for an audience. Best semantic grid for media interpretation is that within which its creators work - Derive meaning from the production grammar, aesthetic conventions used Create tools for understanding high-level semantic constructs in a domain by interpreting the data with its makers eye, exploiting media production methods for their perceptual and interpretive guidance. Content Repository Media Semantic Analyzer Metadata (DD) Discussion Sections Narration sections Dialog, interviews,... raw footage, text,... Video On-screen narration Voice Over Direct Narration Assistive Narration Uninterrupted Voice Over Interrupted Voice Over Linkage Sections (DN)(AN)(UV)(IV)(LF) Example 1 - Multimodal analysis for extracting hierarchy of narrative structures in education/training video Focus Areas: Motion picture analysis for affect and story essence using film grammar (recognized w best paper awards) e-learning; Multimodal algorithms to parse and structure audiovisual content in media for content distillation & nonlinear browsing Multigranular media narrative segmentation to generate & annotate reusable assets Tempo in Titanic Tempo ebb and flow and associated story elements and events automatically deconstructed Example 2 - Titanic Movie Analysis for Tempo

Example Narrative Structure Based Segmentation of Education and Training Videos Problem Statement: Automatically structuralize instructional media through high-level semantics-based video partitioning and content tagging for effective segment search, access, and browse services in e-learning content management systems Joint Work with Dinh Q. Phung and Svetha Venkatesh, Curtin University of Technology, W. Australia

Narrative Structures Hierarchy Discussion sections Direct Narration Assistive Narration Un-interrupted VO Interrupted VO Linkage Sections On-screen Narration Voice Over Narration Sections Raw footage, text, … Dialog, interviews, …

Narrative Structures Hierarchy: Discussion Sections Discussion sections Direct Narration Assistive Narration Un-interrupted VO Interrupted VO Linkage Sections On-screen Narration Voice Over Narration Sections Raw footage, text, … Dialog, interviews, … Capture dialog, interviews, meeting sections.

Narrative Structures Hierarchy: On-Screen Narration Discussion sections Direct Narration Assistive Narration Un-interrupted VO Interrupted VO Linkage Sections On-screen Narration Voice Over Narration Sections Raw footage, text, … Dialog, interviews, … Clear view of a narrator speaking in the scene. Dominated by narrators face and captured in a close-up. Interrupted presence of the narrator.

Narrative Structures Hierarchy: Voice Overs Discussion sections Direct Narration Assistive Narration Un-interrupted VO Interrupted VO Linkage Sections On-screen Narration Voice Over Narration Sections Raw footage, text, … Dialog, interviews, … The audio track is dominated by the voice of the narrator, but without their appearances (no faces) smooth and continuous interrupted

Narrative Structures Hierarchy: Linkage Sections Discussion sections Direct Narration Assistive Narration Un-interrupted VO Interrupted VO Linkage Sections On-screen Narration Voice Over Narration Sections Raw footage, text, … Dialog, interviews, … Raw footage, superimposed text, and others.

Visual Processing S = {f 1, f 2, …, f N }: Sequence of frames from shots in a video for face detection Detect faces in frames using CMUs face detector software Feature 1: How many faces -- How many frames contain faces as a proportion of the total frames in a shot ? Feature 2: Avg. face areas -- If there is a face, how big is the face? Two frame sequences from a shot are used: Uniformly sampled and key frames sequence

Audio Processing Classify shot audio into voice (V), no-voice (N) or mixture of two (M) Is the voice consistently delivered ? New voice connectivity feature: Number of contiguous speech-dominant clips normalized by the shot length. Characterize dominance of speech in audio tracks of shots Cluster audio clips into two classes and assume the larger cluster as one of clips with speech domination N = total # of audio clips within a shot Nv = # of clips classified as voice-dominated Va = voice activity = Nv/N

Classification Decision Trees as machine learning classifiers for final labeling of narrative structures C4.5 algorithm to train and test decision trees First learn all six classes at the first children level and test accuracy of labeling Propose a two-level decision tree for improved performance

Experimental Results Average classification result is high: 91.6% Experimental Results: Confusion Matrix for Six Classes

Exp. Results (cont.) Results are very good for classes: DD, DN, AN and UV. However, poor for classes IV and LF VO with presences of many faces (meetings, party,..) accounts for most of misclassification Solution: group IV, LF and UV into a group G and study separately

Exp. Results (cont.) G 97.6%

Exp. Results (cont.) Over-fitting is the problem identified in G due to UV instances outnumbering IV and LF To solve the problem to a certain extent, reduce number of UV such that number of instances of (IV, UV, LF) are approx. the same, and train with C4.5 a b c 424 40 18 a = UV 14 10 2 b = IV 7 1 6 c = LF 84.3%

Conclusion Novel narrative structure based analysis for segmentation of education and training videos Hierarchical DT-classification system achieves an overall accuracy of 84.7% Focus on higher level semantics such as segmentation of topics Work is underway –Map media objects to LOs –Algorithms for support of both SCORM and MPEG- 7 compliant XML metadata

Acknowledgements Team: Geetika Tewari (IBM TJW, currently at Harvard U) Norman Haas (IBM TJW) Austin Schilling (IBM SWG)

Saras Shareable Rich Media Learning Object Repositories and Management for e-Learning Chitra Dorai IBM T.J. Watson Research Center New York

Similar presentations

Presentation on theme: "Saras Shareable Rich Media Learning Object Repositories and Management for e-Learning Chitra Dorai IBM T.J. Watson Research Center New York"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Saras Shareable Rich Media Learning Object Repositories and Management for e-Learning Chitra Dorai IBM T.J. Watson Research Center New York

Similar presentations

Presentation on theme: "Saras Shareable Rich Media Learning Object Repositories and Management for e-Learning Chitra Dorai IBM T.J. Watson Research Center New York"— Presentation transcript:

Similar presentations

About project

Feedback