Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDP-MARCH-Talk 恶意任务检测 姚大海 2013/11/24. papers Characterizing and Detecting Malicious Crowdsourcing Detecting Deceptive Opinion Spam Using Human Computation.

Similar presentations


Presentation on theme: "SDP-MARCH-Talk 恶意任务检测 姚大海 2013/11/24. papers Characterizing and Detecting Malicious Crowdsourcing Detecting Deceptive Opinion Spam Using Human Computation."— Presentation transcript:

1 SDP-MARCH-Talk 恶意任务检测 姚大海 2013/11/24

2 papers Characterizing and Detecting Malicious Crowdsourcing Detecting Deceptive Opinion Spam Using Human Computation SmartNotes: Application of Crowdsourcing to the Dectection of Web Threats

3 papers Characterizing and Detecting Malicious Crowdsourcing Detecting Deceptive Opinion Spam Using Human Computation SmartNotes: Application of Crowdsourcing to the Dectection of Web Threats

4 outline malicious crowdsourcing measured datasets some initial results

5 malicious crowdsourcing increasing secrecy –tracking jobs is more difficult and easier to detect –details of the task are only revealed to workers that take on a task. –worker accounts require association with phone numbers or bank accounts.

6 malicious crowdsourcing behavioral signatures –ouput from crowdturfing tasks are likely to display specific patterns that distinguish them from "organically" generated content. –signatures worker account (their behavior) cotent (bursts of content generation when tasks are first posted)

7 malicious crowdsourcing our methodology –we limit our scope to campaigns that target microblogging platforms (Sina Weibo). –First, we gather "ground truth" content generated by turfer and "organic" content generated by normal users. –Second, we compare and contrast these datasets. –Our end goal is to develop detectors by testing them against new crowdturfing campaigns as they arrive.

8 measured datasets crowdturf accounts on Weibo –download full user profiles of 28947 Weibo accounts IDs crowdturf campaigns –crawled tweets, retweets and comments of 18335 campaigns –61.5 million tweets, 118 million comments and 86 million rerweets (2012.11~2013.1)

9 some initial results turkers tend to straddle the line between malicious and normal users. crowdturfing campaigns have a higher ratio of repeated users.

10 papers Characterizing and Detecting Malicious Crowdsourcing Detecting Deceptive Opinion Spam Using Human Computation SmartNotes: Application of Crowdsourcing to the Dectection of Web Threats

11 outline introduction data preparation human assessor measurements writing style measurements classifier measurements hybrid measurements conclusion

12 introduction review spam –hyper spam——positive review –defaming spam——negative review limitation of related work –focus on hyper spam

13 data preparation truthful reviews (each of 8 products) : 25 highly-rated reviews : 25 low-rated reviews fake reviews (created on AMT) : 25 highly-rated reviews : 25 low-rated reviews

14 human assessor measurements balanced: 5 truthful and 5 deceptive reviews random: n deceptive reviews and (10-n) truthful reviews 1.students performed better than the crowd, but not significant. 2.detecting high- ralted reviews is easier than low- rated reviews. an assessor has a "default" belief that a review must be true.

15 writing style measurements three linguistic qualities ( 语言指标 ) –polarity –sentiment –readability——ARI C——#characters W——#words S——#sentences sentiment API in text-processing.com

16 writing style measurements truth reviews require higher readability highly-rated reviews require higher readability

17 classifier measurements QuickLM language model toolkit language model, sentiment score, ARI as feature set inputs to SVM our classifier outperfomed our human and crowd assessors.

18 hybrid measurements providing students and thd crowd with additional measurement data: sentiment scores and ARI scores providing assessors with meaningful metrics is likely to improve the quality of assessment.

19 conclusion 展望:如果使用对相关问题很熟悉的众包 工人,效果是否比自动分类要好? 疑问: SVM 的效果比混合方法的效果好, 为啥还要用混合方法?

20 papers Characterizing and Detecting Malicious Crowdsourcing Detecting Deceptive Opinion Spam Using Human Computation SmartNotes: Application of Crowdsourcing to the Dectection of Web Threats

21 outline introduction related work design of SmartNotes web scam detection technique

22 introduction two types of cybersecurity threats –threats created by factors outside the end user's control, such as security flaws in application and protocols. –threats caused by the user's actions, such as phishing. the way to identifying the these websites –statistic –blacklist

23 introductin our crowdsourcing approach –users report security theats –machine learning to integrate their responses features –combining data from multiple sources –combining social bookmarking with questiong- answering –appling machine learning and natural- language processing

24 related work social bookmarking –sharing bookmarks among users question answering –post questions and answer question posed by others safe browsing——browser extensions web scam detection –closely related to spam email detection –content based

25 design of SmartNotes user interface –Chrome browser extension –post a comment or ask a question –share your notes and questions with others –analyze the current wbsite

26 design of SmartNotes read and write notes, account... javascript and Chrome extension API machine learning algorithms collecting 43 features from 11 sources

27 web scam detection technique We need a training set of websites labeled scam or non-scam to apply our supervised machine learning technique. approaches to construct a training set 1. Scam queries (random) –select 100 domain names from each query and summitted them to AMT. 2. Web of Trust (scam) –200 most recent discussion threats

28 web scam detection technique 3. Spam emails (scam) –1551 spam emails from a corporate email system. 4. hpHosts (scam) –top 100 most recent reported website on the blacklist 5. Spam emails (non-scam) –top 100 websites according to the ranking on alexa.com.

29 validation & result harmonic mean of the precision and the recallthe area under the ROC curve

30 Q&A


Download ppt "SDP-MARCH-Talk 恶意任务检测 姚大海 2013/11/24. papers Characterizing and Detecting Malicious Crowdsourcing Detecting Deceptive Opinion Spam Using Human Computation."

Similar presentations


Ads by Google