Presentation is loading. Please wait.

Presentation is loading. Please wait.

LACRO'13 workshop April 2013, Leuven, Belgium Miguel‐Angel Sicilia, Salvador Sánchez‐Alonso, Elena Garcia‐Barriocanal, Julià Minguillón, Enayat Rajabi.

Similar presentations


Presentation on theme: "LACRO'13 workshop April 2013, Leuven, Belgium Miguel‐Angel Sicilia, Salvador Sánchez‐Alonso, Elena Garcia‐Barriocanal, Julià Minguillón, Enayat Rajabi."— Presentation transcript:

1 LACRO'13 workshop April 2013, Leuven, Belgium Miguel‐Angel Sicilia, Salvador Sánchez‐Alonso, Elena Garcia‐Barriocanal, Julià Minguillón, Enayat Rajabi 1

2 Introduction GLOBE materials Keywords and classifications Interlinking to Linked Open Data Discussion and conclusion 2

3 Huge number of e-learning resources available on- line, for free or by subscription Several initiatives aim at federating e-learning systems to unlock the educational content hidden in their repositories (e.g. GLOBE ) The use of the IEEE LOM standard + OAI ‐ PMH has facilitated the deployment of such collections 3

4 How different metadata elements properly describe and categorize the resource space IEEE LOM proposes around 50 different elements including keyword and classification: keywords are intended for the description of topics in any existing language classification refers to classifying the Los Some experimental studies exist on actual use of IEEE LOM ( e.g. Friesen (2004), Ochoa et al (2011)) 4

5 5

6 GLOBE(Global Learning Objects Brokered Exchange) enables share and reuse between several learning object repositories We harvested GLOBE through OAI-PMH and got around 770,000 metadata records Most frequent language is English (also pointed out Ochoa 2011), while large amount of resource has no language declared 6

7 Language # metadata en, english, eng, en‐US, en‐gb 392.682 nl97.976 x‐none,none,blank77.555 de,de‐AT,de‐DE49.807 es‐EC, es 47.816 it, ITA 23.102 hu‐HU, hu 20.316 Is8.804 ca8.066 fr6.414 7

8 There exist around 5,5 million keywords in the sample ( ~ 7 keywords per resource) Large number of keywords generated via machine translation (referenced by codes starting with “x-mt-”) There are also around 3,2 million records seem generated by human practices ( ~ 4 keywords per resource) Frequencies are high for relatively high number of keywords (beyond 15) (might be attributed to automated extraction) 8

9 9

10 A total of ~ 700k classifications distributed across ~500k resources were found with ~1 million taxon entries About 92% of all the resources have at most two classifications, and only 187 resources have more than 10. There were only 43 different classification purposes found, with discipline being “discipline” a 60% and “Technical design” around 18%. The latter is from a vocabulary specific of the MACE project. Another 11% of the purposes were blank. Keywords and classifications were matched against each other for the same resources ( ~270k coincidences) 10

11 Taxon_entry_lang records en,en‐US539568 unspecified180643 ca158340 es88261 fr19734 sv17488 de13687 nl12954 ro12038 it10352 11

12 12

13 In linked open data RDF links exposed through the web express relationships between elements DBpedia is the central dataset and most interlinking tools are providing automated ways to interlink with this dataset Keywords and classifications can be approached from the perspective of external data sources Keywords and classifications could be linked to large knowledge base e.g. Dbpedia (less than 30%) 13

14 English dominates the distribution of languages in GLOBE with a few other represented languages There is a considerable amount of keywords generated via machine translation. English again dominates the linguistic space of classifications Classifications result in a more concise representation, as becomes evident with the contrast of the more than 3 million keywords (excluding machine translation) with the 1 million classification entries 14

15 The amount of coincidence with lexical variants in DBPedia entries is limited and there is not a significant difference, so that they appear to have a similar potential for interlinking. It is important to highlight that the coincidence analysed have been based on equal string match without any consideration of polysemy and lexical variants. It should be noted that GLOBE has to be considered a highly heterogeneous repository in several aspects as described by Ochoa (2011), including the way the metadata is created in the repositories, ranging from automatic creation to quality- controlled, internal mechanisms 15

16 16


Download ppt "LACRO'13 workshop April 2013, Leuven, Belgium Miguel‐Angel Sicilia, Salvador Sánchez‐Alonso, Elena Garcia‐Barriocanal, Julià Minguillón, Enayat Rajabi."

Similar presentations


Ads by Google