Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Slides:



Advertisements
Similar presentations
How to Create an MLA citation for a web document....
Advertisements

Traditional Marketing Methods are Dead or Dying ” Unlocking the Power of Social Media for Higher Search-Engine Rankings “
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Neil Sayers - Student Number: URL
Search Engines and Information Retrieval
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Information Retrieval in Practice
Social Media Motion: How to Get Started & Keep Going With Facebook, Twitter & More Presented by Eli Lilly and Company Hosted by Rob Robinson McNeely Pigott.
Search Engine Optimization
What is SEO? Making your site’s content easy to find through external search engines such as Google, Yahoo! and Bing.
What is a blog? "A blog is a personal website that contains content organized like a journal or a diary. Each entry is dated, and the entries are displayed.
Search Engine Optimization (SEO) Week 07 Dynamic Web TCNJ Jean Chu.
Search Engines and Information Retrieval Chapter 1.
E-Commerce and the Entrepreneur
Evaluating Online Information Sources Ask yourself the following questions…
Your Medical Blog Zero to Hero Guide Mark Seigel, MD, FACOG Chair, ACOG Committee on Practice Management Co-Chair, Physicians’ Electronic Health Record.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
Webpage Design.
Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Driving Traffic It is not enough to promote your site when it is first launched. You also need to actively promote your site on a long term basis.
Copyright © 2010 Pearson Education, Inc.Copyright © 2007 Pearson Education, Inc. Slide 1-1 ELC 200 Day 15.
Search Engine Optimization 101 What is SEM? SEO? How can I use SEO on my blogs and/or my personal web space?
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
Ruder Finn Interactive ePR. 91% of internet users use a search engine 6B searches per month in the U.S. *Pew Internet Project.
Presented by team 4: Mateo ALBARRACIN Aurelie CHEUCLE Arfa FANG Zheng Jun Roshan GERAMIAN-NIK Shahrukh QURESHI Ricky YOUNG Wing Kei.
Web Search Module 6 INST 734 Doug Oard. Agenda The Web Crawling  Web search.
Blog Track Open Task: Spam Blog Detection Tim Finin Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi, Justin.
SEO & Analytics The Grey and the Hard Numbers. Introduction  Build a better mouse trap and the world will beat a path to your door  Mouse Trap -> Website.
HoneySpam 2.0 Profiling Web Spambot Behaviour Pedram Hayati Kevin Chai Vidyasagar Potdar Alex Talevsky Prof. Tharam Dillon Prof. Elizabeth Chang Digital.
Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.
HTML Basic. What is HTML HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language, it.
Advantages and disadvantages of TechMed using web 2.0 technologies. What is Web 2.0? Web 2.0 describes World Wide Web sites that use technology beyond.
Inbound Marketing Training What is Inbound Marketing? Why are we here today? Who sponsored these FREE sessions? Who is Inbound Marketing Specialists? How.
Setting up and maintaining a web presence Your brand on the internet.
Blogging Webinar LEARN THE BENEFITS OF BLOGGING & HOW TO GET STARTED!
Website design and structure. A Website is a collection of webpages that are linked together. Webpages contain text, graphics, sound and video clips.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Pamela Drake December 11, 2015 SEARCH ENGINE OPTIMIZATON (SEO)
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )
Blog Track Open Task: Spam Blog Detection Tim Finin Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi, Justin.
NTU Natural Language Processing Lab. 1 Blog Track Open Task: Spam Blog Classification Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen Date: 2007/01/08.
+ “Introduction to Blogging” Katelyn Jacobsen By WordPress.org.
Identifying Spam Web Pages Based on Content Similarity Sole Pera CS 653 – Term paper project.
The small thin quiz of the course. Q1. WordPress is... A.A website creation tool B.A blogging tool C.A content management system D.An accessible and free.
SVMs for the Blogosphere: Blog Identification and Splog Detection Pranam Kolari, Tim Finin, Anupam Joshi Computational Approaches to Analyzing Weblogs,
Think Digital, Think Ally Digital Media 1of19 SEO Press Release Strategy 2015.
Presented By: Buybulkwebtraffic Presented By: Buybulkwebtraffic
COPYRIGHT © 2011 |INFINITUM DIGITAL, MUMBAI - LEADING SEO OUTSOURCING COMPANY IN MUMBAI, INDIA. Welcome to Infinitum Digital.
1 Web Search What are easy ways to create a website? 2 Web Search What is a blog? What type of content does this type of website provide? 3 Web.
Smart Way to Increasing Organic Traffic to a Website Created By, Martine
Web Analytics Fundamentals Presented by Tejaswi, Chandrika, Sunil.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
LOGO 10 Local SEO Tips for Your Local Business Presented by: Your Name
Uncovering Social Spammers: Social Honeypots + Machine Learning
Does offline media drive online response?
Evaluation Anisio Lacerda.
Yu-Ru Lin, Wen-Yen Chen, Xiaolin Shi, Richard Sia, Siaodan Song,
A Machine Learning Approach
SEO Basics to Grow a Small Business
SEO Article Writing Presented by
Question 3: What have you learnt from your audience feedback?
ورود اطلاعات بصورت غيربرخط
Understanding the Features of a Web Site
How to use them How to evaluate performance
Presenter # 1 • Presenter # 2 • Presenter # 3
Building Topic/Trend Detection System based on Slow Intelligence
Best SEO Techniques To Increase Organic Traffic Presented By:- Abhinav Shashtri.
Development of Search engine optimization for Crowdfunding site
Presentation transcript:

Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al. NEC America TREC 2006 (Blog session) Presentor: Chun-Yuan Teng

Natural Language Processing Lab National Taiwan University Splog characteristics Machine-generated content No Value-addition –No unique information to their readers Hidden agenda, usually an economic goal –Commercial intention

Natural Language Processing Lab National Taiwan University Uniqueness of splogs Dynamic content –Unlike web spam, a splog generates fresh content to drive traffic Non-endorsement link –Hyperlink is an endorsement of other pages –Spammers can create hyperlinks in normal blogs, links in blogs is not endorsement

Natural Language Processing Lab National Taiwan University Features to detect splog Traditional features –Tokenized URL, blog and post titles, homepage content, and post content Temporal regularity –Temporal content regularity/Temporal structural regularity Link regularity –Consistency in target website

Natural Language Processing Lab National Taiwan University Temporal Content Regularity

Natural Language Processing Lab National Taiwan University Temporal Structural Regularity

Natural Language Processing Lab National Taiwan University Link Regularity estimation

Natural Language Processing Lab National Taiwan University Two kinds of spam detection Offline detection –Traditional measurement Online detection –Detect spam online

Natural Language Processing Lab National Taiwan University Experimental Result (Offline)

Natural Language Processing Lab National Taiwan University Experimental results (Offline)

Natural Language Processing Lab National Taiwan University Online indexing in blog search engine

Natural Language Processing Lab National Taiwan University Online test

Natural Language Processing Lab National Taiwan University Online test in this paper

Natural Language Processing Lab National Taiwan University Experimental results

Natural Language Processing Lab National Taiwan University Conclusion and contributions Modeling the splog problem –The uniqueness of splog Regularity based detection –Content and post time Evaluation –Online evaluation