Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.

Similar presentations


Presentation on theme: "CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer."— Presentation transcript:

1 CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer Science Carnegie Mellon University Evangelos Papalexakis Abhay Harpale

2 CMU SCS U Kang (CMU) 2KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions

3 CMU SCS U Kang (CMU) 3KDD 2012 Background: Tensor Tensors (=multi-dimensional arrays) are everywhere  Hyperlinks and anchor texts in Web graphs URL 1 URL 2 Anchor Text Java C++ C# 1 1 1 1 1 1 1

4 CMU SCS U Kang (CMU) 4KDD 2012 Background: Tensor Tensors (=multi-dimensional arrays) are everywhere  Sensor stream (time, location, type)  Predicates (subject, verb, object) in knowledge base “Barrack Obama is the president of U.S.” “Eric Clapton plays guitar” (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M

5 CMU SCS U Kang (CMU) 5KDD 2012 Problem Definition Q1: How to decompose a billion-scale tensor?  Corresponds to SVD in 2D case

6 CMU SCS U Kang (CMU) 6KDD 2012 Problem Definition Q2: What are the important concepts and synonyms in a KB tensor?  Q2.1: What are the dominant concepts in the knowledge base tensor?  Q2.2: What are the synonyms to a given noun phrase? (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M

7 CMU SCS U Kang (CMU) 7KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions

8 CMU SCS U Kang (CMU) 8KDD 2012 Algorithm: Problem Definition Q1: How to decompose a billion-scale tensor?  Corresponds to SVD in 2D case

9 CMU SCS U Kang (CMU) 9KDD 2012 Challenge Alternating Least Square (ALS) Algorithm : pseudo-inverse How to design fast MapReduce algorithm for the ALS? : Hadamard : Khatri-Rao (J=26M) (I=26M) (K=48M) Details

10 CMU SCS U Kang (CMU) 10KDD 2012 Main Idea 1. Ordering of Computation Our choice FLOPS (NELL data) Details

11 CMU SCS U Kang (CMU) 11KDD 2012 Main Idea 2. Avoiding Intermediate Data Explosion Size of Intermediate Data (NELL) - Naïve: 100 PB (J=26M) (I=26M) (K=48M) Details

12 CMU SCS U Kang (CMU) 12KDD 2012 Main Idea 2. Avoiding Intermediate Data Explosion Size of Intermediate Data (NELL) - Proposed: 1.5 GB Details Size of Intermediate Data (NELL) - Naïve: 100 PB (Before) (After)

13 CMU SCS U Kang (CMU) 13KDD 2012 Experiments GigaTensor solves 100x larger problem Number of nonzero = I / 50 (J) (I) (K) GigaTensor Tensor Toolbox Out of Memory 100x

14 CMU SCS U Kang (CMU) 14KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions

15 CMU SCS U Kang (CMU) 15KDD 2012 Discoveries: Problem Definition Q2: What are the important concepts and synonyms in a KB tensor?  Q2.1: What are the dominant concepts in the knowledge base tensor?  Q2.2: What are the synonyms to a given noun phrase? (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M

16 CMU SCS U Kang (CMU) 16KDD 2012 A2.1: Concept Discovery Concept Discovery in Knowledge Base

17 CMU SCS U Kang (CMU) 17KDD 2012 A2.1: Concept Discovery

18 CMU SCS U Kang (CMU) 18KDD 2012 A2.2: Synonym Discovery Synonym Discovery in Knowledge Base a1a1 a2a2 aRaR … (Given) noun phrase (Discovered) synonym 1 (Discovered) synonym 2

19 CMU SCS U Kang (CMU) 19KDD 2012 A2.2: Synonym Discovery

20 CMU SCS U Kang (CMU) 20KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions

21 CMU SCS U Kang (CMU) 21KDD 2012 Conclusion GigaTensor: scalable tensor decomposition algorithm for billion-length modes tensors  Algorithm: avoid intermediate data explosion  Discoveries: concept discovery and contextual synonym detection on KB tensor

22 CMU SCS U Kang (CMU) 22KDD 2012 Thank you ! www.cs.cmu.edu/~pegasus www.cs.cmu.edu/~ukang


Download ppt "CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer."

Similar presentations


Ads by Google