Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014.

Similar presentations


Presentation on theme: "Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014."— Presentation transcript:

1 Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014

2 Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 2

3 Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 3

4 CQA 4

5 Long-Term Impact Q: How many users will find it beneficial? 5 What is the Long-Term Impact of a Q/A post?

6 Challenges Q: Why not off-the-shell data mining algorithms? Challenge 1: Multi-aspect  C1.1. Coupling between questions and answers  C1.2. Feature non-linearity  C1.3. Posts dynamically arrive Challenge 2: Efficiency 6

7 C1.1 Coupling 7 Strong positive correlation! [Yao+@ASONAM’14] [Yao+@ASONAM’14] Y Yao, H Tong, T Xie, L Akoglu, F Xu, J Lu. Joint Voting Prediction for Questions and Answers in CQA. ASONAM 2014. FqFq Predicted Q Impact Y q Question Features FaFa Predicted A Impact Y a Answer Features Voting Consistency 1 2 3

8 C1.1 Coupling LIP-M: [Yao+@ASONAM’14] Question predictionAnswer prediction Voting ConsistencyRegularization 12 3 8

9 C1.2 Non-linearity The kernel trick (e.g., SVM)  Mercer kernel  Kernel matrix as new feature matrix 9

10 C1.3 Dynamics Solution: recursive least squares regression [Haykin2005] 10 (x 1, y 1 ) (x 2, y 2 ) … (x t, y t ) (x t+1, y t+1 ) Model t Model t+1 + [Haykin2005] S Haykin. Adaptive filter theory. 2005. Existing examples New examples Current model New model

11 This Paper Q1: how to comprehensively capture the multi-aspect in one algorithm?  Coupling, non-linearity, and dynamics Q2: how to make the long-term impact prediction algorithm efficient? 11

12 Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 12

13 Modeling Non-linearity Basic Idea: kernelize LIP-M Details - LIP-KM:  Closed-form solution: Complexity: O(n 3 ) Question predictionAnswer prediction Voting ConsistencyRegularization 12 3 13

14 Modeling Dynamics Basic idea: recursively update LIP-KM Details - LIP-KIM: Complexity: O(n 3 ) -> O(n 2 ) (matrix inverse lemma) 14

15 Modeling Dynamics THEOREM 1. Correctness of LIP-KIM. Let β 1 be the output of LIP-KM at time t+1, and β 2 be the output of LIP-KIM updated from time t to t+1, we have that β 1 = β 2. 15

16 Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 16

17 Approximation Method (1) Basic idea: compress the kernel matrix Details – LIP-KIMA:  1) Separate decomposition  2) Make decomposition reusable  3) Apply decomposition on LIP-KIM Complexity: O(n 2 ) -> O(n) (Nyström approximation) (Eigen-decomposition) (SVD on X 1 ) (Eigen-decomposition) 17

18 Approximation Method (2) Basic idea: filter less informative examples Details - LIP-KIMAA: Complexity: O(n) -> <O(n) ? (x 1, y 1 ) (x 2, y 2 ) … (x t, y t ) Model t Existing examples Current model (x t+1, y t+1 ) New examples Model t+1 + New model Filtering 18

19 LIP-KIM O(n 2 ) LIP-M (CoPs) Ridge Regression Coupling Non-linearityDynamics LIP-KI (Recursive Kernel Ridge Regression) LIP-KM O(n 3 ) LIP-IM LIP-KIMA O(n) LIP-KIMAA <O(n) K: Non-linearity I: Dynamics M: Coupling A: Approximation LIP-K (Kernel Ridge Regression) LIP-I (Recursive Ridge Regression) Summary 19

20 Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 20

21 Experiment Setup Datasets ( http://blog.stackoverflow.com/category/cc-wiki-dump/ )  Stack Overflow, Mathematics Stack Exchange Features  Content (bag-of-words) & contextual features Test setTraining set Initial setIncremental set Time 21

22 Evaluation Objectives O1: Effectiveness  How accurate are the proposed algorithms for long-term impact prediction? O2: Efficiency  How scalable are the proposed algorithms? 22

23 Effectiveness Results Comparisons with existing models. (better) Our methods 23

24 Effectiveness Results Performance gain analysis. K: Non-linearity I: Dynamics M: Coupling (lower is better) 24

25 Efficiency Results The speed comparisons. (better) LIP-KIMAA (sub-linear) Ours 25

26 Quality-Speed Balance-off Our methods (better) 26

27 Roadmap Background and Motivations Modeling Multi-aspect Computation Speedup Empirical Evaluations Conclusions 27

28 Conclusions A family of algorithms for long-term impact prediction of CQA posts  Q1: how to capture coupling, non-linearity, and dynamics?  A1: voting consistency + kernel trick + recursive updating  Q2: how to make the algorithms scalable?  A2: approximation methods Empirical Evaluations  Effectiveness: up to 35.8% improvement  Efficiency: up to 390x speedup and sub-linear scalability 28

29 Q&A Thanks ! 29 Authors: Yuan Yao, Hanghang Tong, Feng Xu, and Jian Lu Yuan Yao, yyao@smail.nju.edu.cnyyao@smail.nju.edu.cn


Download ppt "Yuan Yao Joint work with Hanghang Tong, Feng Xu, and Jian Lu Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint 1 Aug 24-27, KDD 2014."

Similar presentations


Ads by Google