Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding High-Quality Content in Social Media chenwq 2011/11/26.

Similar presentations


Presentation on theme: "Finding High-Quality Content in Social Media chenwq 2011/11/26."— Presentation transcript:

1 Finding High-Quality Content in Social Media chenwq 2011/11/26

2 Authors Eugene Agichtein Emory University Research: Intelligent Information Access Lab (IRLab) News:our team wins the "Best Paper" award at SIGIR 2011.

3 Abstract From the early 2000s,user-generated content h as become popular on the web.The quality of u ser-generated content varies drastically from e xcellent to abuse and spam. To separate high-quality content from the rest automatically Graph-based framework –combine the different sources of evidence in a classi fication formulation

4 MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion 1 1 2 2 3 3 4 4 Contents

5 Related work Link analysis in social media Propagating reputation Question/answering portals and forums Expert finding Text analysis for content quality Implicit feedback for ranking

6 Related work Link analysis in social media –G = (V, E) –V corresponding to the users of a question/answer syste m –a directed edge e = (u, v) ∈ E from a user u ∈ V to a user v ∈ V if user u has answered to at least one question of user v –G’ = (V, E’) PageRank , ExpertiseRank, HITS

7 MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion 1 1 2 2 3 3 4 4 Contents

8 CONTENT QUALITY ANALYSIS ——Intrinsic content quality As a baseline, we use textual features only—w ith all word n-grams up to length 5 that appear in the collection more than 3 times used as fea turesusers

9 Punctuation and typosSyntactic and semanticGrammaticality 1.Punctuation 2.Capitalization 3.Spacing density 4.Character-level entropy 5.Spelling mistakes 6.Out-of-vocabulary words 1.Average number of syllables per word 2.Entropy of word lengths 3.Readability measures 1.Part-of-speech sequences 2.Formality score 3.Distance between its (trigram) language model and several given language models CONTENT QUALITY ANALYSIS ——Intrinsic content quality

10 CONTENT QUALITY ANALYSIS ——User relationships items and users Graph user-user Graph uq answer u v u has answered a question from user v

11 CONTENT QUALITY ANALYSIS ——Usage statistics The number of clicks on some item The dwell time on some item

12 CONTENT QUALITY ANALYSIS ——classification framework We cast the problem of quality ranking as a bi nary classification –support vector machines –log-linear classifiers –stochastic gradient boosted trees Our goal is to discover interesting,well for-mul ated and factually accurate content

13 MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion 1 1 2 2 3 3 4 4 Contents

14 MODELING CONTENT QUALITY ——user relationships Our dataset, viewed as a graph as illustrated i n Figure 1

15 MODELING CONTENT QUALITY ——user relationships The relationships between questions, users as king and answering questions, and answers c an be captured by a tripartite graph outlined in Figure 2

16 MODELING CONTENT QUALITY ——user relationships the unique characteristics of the community q uestion/answering domain

17 MODELING CONTENT QUALITY ——user relationships Question subtree –Q Features from the question being answered –QU Features from the asker of the question being answe red –QA Features from the other answers to the same questio n

18 MODELING CONTENT QUALITY ——user relationships User subtree –UA Features from the answers of the user –UQ Features from the questions of the user –UV Features from the votes of the user –UQA Features from answers received to the user’s quest ions –U Other user-based features

19 MODELING CONTENT QUALITY ——user relationships Question features

20 MODELING CONTENT QUALITY ——user relationships Implicit user-user relations G = (V,E) –E = Ea ∪ Eb ∪ Ev ∪ Es ∪ E+ ∪ E− Gx = (V,Ex) –h x the vector of hub scores on the vertices V –a x the vector of authority scores –p x the vector of PageRank scores –p´ x the vector of PageRank scores in the transposed gra ph

21 MODELING CONTENT QUALITY ——user relationships Implicit user-user relations

22 MODELING CONTENT QUALITY ——user relationships Content features for QA –to identify the most salient features for the specific t asks of question or answer quality classification the KL-divergence between the language models of the two texts their non-stopword overlap the ratio between their lengths

23 MODELING CONTENT QUALITY ——user relationships Usage features for QA –number of item views (clicks) –Metadata of question how long ago the question was posted –derived statistics the expected number of views for a given ca tegory the deviation from the expected number of v iews –other second-order statistics the click frequency

24 MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion 1 1 2 2 3 3 4 4 Contents

25 Experiment & Conclusions ——EXPERIMENTAL SETTING Dataset Edges induced from the whole dataset.

26 MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

27 MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

28 MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

29 MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

30 MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

31 MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

32 MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

33 MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

34 Thanks for attention!


Download ppt "Finding High-Quality Content in Social Media chenwq 2011/11/26."

Similar presentations


Ads by Google