Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Similar presentations


Presentation on theme: "Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia."— Presentation transcript:

1 Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia

2 Introduction  Ontology: formalize the knowledge of a domain by means of defining concepts and properties that relate them

3 Introduction: Ontology Alignment

4

5

6 Problem Definition: Ontology Alignment find a set of correspondences between two ontologies O 1 = and O 2 =. The ontology alignment problem:

7 Ontology Alignment Challenges  Improving the Alignment Quality  Structural & lexical disparity  Improving the Alignment Efficiency  Quickly producing quality alignment  Improving the Scalability Ontology Sizes Efficiency / Quality Resources Efficiency / Quality

8 Space of Alignments m11m12…m1|V 2 | m21m22…m2|V 2 | ………… m|V 1 |1m|V 1 |2…m|V 1 ||V 2 | x1 x2.. x|V 1 | y1y2…y|V 2 | Alignment between many-to-many Alignment Space Size: one-to-manyone-to-one Evaluating An Alignment: Cartesian Product of entities

9 Space of Alignments m11m12…m1|V 2 | m21m22…m2|V 2 | ………… m|V 1 |1m|V 1 |2…m|V 1 ||V 2 | x1 x2.. x|V 1 | y1y2…y|V 2 | Alignment between many-to-many Alignment Space Size: one-to-manyone-to-one Evaluating An Alignment: Cartesian Product of entities Bipartite graph

10 Large Ontology Matching  Reduction of alignment space  Early pruning of dissimilar element pairs  aflood (Hanif and Masaki ‘09)  Partition based matching  Falcon-AO (Jian et. al. ‘05)  Parallel matching  MapPSO (Bock and Hettenhausen ‘10)  VDoc+ (Zhang ‘12) O2O2 O1O1 P11P11 P12P12 P13P13 P21P21 P22P22 P23P23 4 blocks

11 Batch Alignment of Large Ontologies  Scalability is challenging  OAEI 2012 - Very Large Biomedical Ontology Track  8 out of 21 tools completed  Ontology repositories (e.g., NCBO at Stanford)  Batch alignment of ontologies  New ontologies posted  Ontologies get updated Approach allows any alignment algorithm to be utilized on a MapReduce architecture

12 Contributions: Batch Alignment of Large Ontologies General & Novel Approach To speed up batch alignment of large ontologies using MapReduce  No impact to alignment quality for some algorithms  Benefits ontology repositories

13 MapReduce Framework

14 output Key-> Value Key-> Key-> Output Value Key identifies a subproblem

15 MapReduce Framework O1O1 O2O2 O11O11 O21O21 O31O31 O12O12 O22O22

16 O1O1 O2O2 O11O11 O21O21 O31O31 O12O12 O22O22 …

17 O1O1 O2O2 O11O11 O21O21 O31O31 O12O12 O22O22

18 O1O1 O2O2 O11O11 O21O21 O31O31 O12O12 O22O22

19 Mapper & Reducer Algorithms

20 Identifying Alignment Subproblems  Approach: Hamdi et al. 2010  Identify anchors: entity pairs with identical names or labels  Cluster concepts around the anchors  Using structural neighborhood Entities from one cluster are predominantly in correspondence with entities in one other cluster

21 Merging Subproblem Alignments

22 Performance Evaluation  Datasets  Conference track from OAEI (120 pairs)  Large ontologies from OAEI (SNOMED, NCI,... 5 pairs)  New biomedical ontology testbed (50 pairs from NCBO)  Algorithms  Compare F-measure & runtime  Default setup on a single node  MapReduce setup using Hadoop (12 nodes each with 24 2GB & 2GHz Intel Xeon processors) Falcon-AOOptima+LogMapYAM++

23 Results – 3 Datasets Algos.Speedup Confer.LargeOAEIBiomed Falcon2155 LogMap9165 Optima+1164110 Yam++4227 ConferenceLarge OAEI Biomedical

24 Results – Large OAEI ontologies  Conference Track  No partitioning  No change in output Ontology Pairs MapRed./Def. Falcon-AO MapReduce LogMap MapRed./Def. Optima+ MapReduce YAM++ Default LogMap Default YAM++ PRFPRFPRFPRFPRFPRF mouse, human 737473967584787376957785928588948690 STW, TheSoz 575053575154184025555253696467607566 fma,nci 958188958389968389978490958690988591 fma, snomed 856372856372846171866373976678977081 snomed, nci 695863675862705863715864906475956074  Other Datasets  LogMap & Yam++ :  Tradeoff is in the alignment quality  Falcon-AO & Optima+:  No change in output

25 Speedup with # of nodes in the Hadoop cluster

26 Discussion  First inter-matcher parallelization approach  Especially using MapReduce  Exhibits significant speedup for batch alignment  Some algorithms may find small reduction in alignment quality due to the partitioning  Significant speedup for single ontology pair  Falcon-AO, Optima+ & YAM++  Any alignment algorithm can fit in our framework

27 Thank you Questions ?

28 Parallel Alignment of Large Ontologies on A Computing Cluster  Current Divide and Conquer Approaches  Heavily rely on structure  Size based partitioning techniques are not effective  Current Parallel Matching algorithms  Parallelize the process within the algorithms  Do not support multi node – cluster architecture


Download ppt "Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia."

Similar presentations


Ads by Google