Presentation is loading. Please wait.

Presentation is loading. Please wait.

MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

Similar presentations


Presentation on theme: "MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity."— Presentation transcript:

1

2 MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity

3 ALETHIOMETER FRAMEWORK CCContributorontentontext 3

4 C1 CONTRIBUTOR 4

5 5 Contributor modalities Reputation - Analyse comments in the course of time, discover sentiments and opinions towards a source. - Measured by the number of upvotes or likes. History - Information about activity on different social media platforms, combined with validity data. - Measured by the update frequency of valid posts. Popularity - Information about following source activity (readings, recommendations). - Measured by the number of friends/followers, and the number of responses.

6 6 Contributor modalities Influence - Information about activities triggered by this source (re-posts, discussions or comments). - Measured by number of retweets/shares, Klout influence score. Presence - Information about type of source (individual, organisation,officially verified account, fake identity, etc.) and its presence on multiple social media platforms. - Measured by the number of accounts in different social media.

7 C2 CONTENT 7

8 8 Reputation of linked web content - Measured in terms of domain reputation, page rank (GoogleRank or Alexa PageRank), or properties of the contributors to the content. Provenance - Finding the original occurrence of the content and its whole path across sources, places and time, and measuring the reputation of these sources. Popularity - Information about how many people are following this content. - Measured by the number of followers, and the number of responses. Content modalities

9 9 Influence - Analyse if this content is triggering discussions or other actions in the social sphere. - Measured by number of retweets/shares. Originality - Check whether the content or parts thereof have been used in the past (e.g., reused text or images that have appeared in the past). Authenticity - Check whether the content has been changed with respect to its original state (e.g., changed text or attached multimedia content) Objectivity and Diversity - Measured by the variation of opinions found for people, content, or general entities. Content modalities

10 C3 CONTEXT 10

11 11 Cross-checking - Measured by the number of different reports or mentions about the same thing coming from independent sources Coherence - Measurement of text coherence (e.g., Coh-Metrix) and coherence between the content and tags, attached web-links, or attached multimedia. Proximity - Measurement of coherence between reference location/time and publication location/time. Context modalities

12 12 How to combine all these parameters?

13 13 Approach for rating of modality parameters Rate parameters on 5-point discrete scale, from 0 to 4 - [0, a 0 ) → 0, [a 0, a 1 ) →1, [a 1, a 2 ) → 2, [a 2, a 3 ) → 3, [a 3, ∞) → 4. - a 0 : 20 th percentile, a 1 : 40 th percentile, a 2 : 60 th percentile, a 3 : 80 th percentile (adjust the scale so it follows a uniform distribution). Weight the rating of parameters for deriving a total score uniformly or based on their significance

14 14

15 15 Parameters studied Number of followers Number of tweets User account age Sample: ~10 M tweets, 5 K users Collection period: July-September 2013 Preliminary statistical results

16 16 Empirical distributions Heavy-tailed distributions Multimodal heavy-tailed distributions with three different peaks (6.7 months, 23.3 months, 4.4 yrs)

17 17 Correlation coefficients Friends - followers: 0.1222 Friends - tweets: 0.08 Followers - tweets: 0.0197 Conclusion: - all parameters relatively independent from one-another - need to be studied independently

18 18 Summary Defined Alethiometer: a framework taking into account all aspects: Contributor, Content and Context Showed an approach for combining the ratings of all parameters Attested the relative independence of parameters and the need to consider a variety of measures (also previously emphasized in the literature) Future work Investigate statistical properties of other modalities Extract the significance of modalities Study correlation between content, contributor and context modalities Summary and future work

19  find us at http://ilab.atc.grhttp://ilab.atc.gr  follow us @iLabATC Questions & Answers


Download ppt "MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity."

Similar presentations


Ads by Google