--He Xiangnan PhD student Importance Estimation of User-generated Data.

Slides:



Advertisements
Similar presentations
SEO Best Practices with Web Content Management Brent Arrington, Services Developer, Hannon Hill Morgan Griffith, Marketing Director, Hannon Hill 2009 Cascade.
Advertisements

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Tweet this PresentationTweet this Presentation Share on Facebook Share on LinkedIn Share on Facebook Share on LinkedIn 1.
@ Carnegie Mellon Databases User-Centric Web Crawling Sandeep Pandey & Christopher Olston Carnegie Mellon University.
Dave Krause ANRCS Web Action Team.  Data is collected from a web site based on what the user does during the visit.
Trust Relationship Prediction Using Online Product Review Data Nan Ma 1, Ee-Peng Lim 2, Viet-An Nguyen 2, Aixin Sun 1, Haifeng Liu 3 1 Nanyang Technological.
The Research Project - Preliminary Proposal Presentation Contextual Suggestion Track: Travel Plan Recommendation System Based on Open-web Information Presenter:
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis.
Automatic Blog Monitoring and Summarization Ka Cheung “Richard” Sia PhD Prospectus.
SEO PACKAGES. Types of Plans Starter Plan Business Plan Enterprises Plan.
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
Add image. 3 “ Content is NOT king ” today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.
SEO. Self Exploding Organs SEO Search Engine Optimisation By Joey Cannon.
Social Network Analysis via Factor Graph Model
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
Understanding the External Links of Video Sharing Sites: Measurement and Analysis.
Kristina Lerman Aram Galstyan USC Information Sciences Institute Analysis of Social Voting Patterns on Digg.
Gradual Adaption Model for Estimation of User Information Access Behavior J. Chen, R.Y. Shtykh and Q. Jin Graduate School of Human Sciences, Waseda University,
Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network author: Meeyoung Cha Alan Mislove Krishna P. Gummadi From Saarbrucken,
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Presented By :Ayesha Khan. Content Introduction Everyday Examples of Collaborative Filtering Traditional Collaborative Filtering Socially Collaborative.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock,
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
Page 1 CSISS Center for Spatial Information Science and Systems CWIC Metrics: Current and Future Weiguo Han, Liping Di, Yuanzheng Shao, Lingjun Kang Center.
MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Search Engine and SEO Presented by Yanni Li. Various Components of Search Engine.
Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Adaptive Web Sites Authors : Mike Perkowitz, Oren Etzioni Source : Communications of the ACM, Volume 43 Issue 8, 2000 Speaker :Li-Ya Liao Adviser : Ku-Yaw.
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Service Reliability Engineering The Chinese University of Hong Kong
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
RCDL 2007, Pereslavl-Zalessky, Oct 2007 Converting Desktop into a Personal Activity Dataset Sergey Chernov, Enrico Minack, and Pavel Serdyukov.
On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Search Engines Session 5 INST 301 Introduction to Information Science.
How Social Media Changed The World Of Event Planning By Olivia Burke.
TwitterFeedRank Nick Flacco Dalton Huynh Abhishek Jha Phong Lam.
ACSIUS Technologies Pvt. Ltd. Tomorrow’s Success Starts Today!
He Xiangnan (PhD student) 11/2/2012 Research Updates.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Search Engine Optimization
Presentation by: Rebecca Chambers WebDuck Designs
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Map Reduce.
Lecture 7. Web Search. Author: Aleksey Semyonov
Why Does Your Website Need a Sitemap?
Jason altmetrics an exploratory study of impact metrics based on social media Jason
Knowledge Base.
Hierarchical Relational Models for Document Networks
Mining Anchor Text for Query Refinement
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Web archives as a research subject
Presentation transcript:

--He Xiangnan PhD student Importance Estimation of User-generated Data

Motivation Emerging user-generated-data of web2.0 applications(Youtube, Twitter, Foursquare.etc) Estimate their importance and popularity in the future Applications:  Web crawling  Relevance ranking  Index Selection

Related Work(Page Importance) Link-connectivity based  PageRank Content analysis  User-centric crawling Sitemap based Web-log(in server) based Learning to rank

Constraints of their works Link structure analysis is not suitable for many web2.0 websites.  Contents are generated by users  Updated very frequently  Ignore the temporal factor, cannot reflect the popularity of a webpage in the future

Empirical Studies Analysis of user-posted-data in YouTube, Digg.etc.:  Popularity is changing with time: View count per unit(hour/day) conforms a Log Normal distribution after posting.  Some activities may influence popularity for a time period Such as external reference from other websites, internal related recommendation will cause a burst of view count

Related Work (Popularity Prediction) Two main types of prediction method of user- generated data  (i) A very complex model that considers various factors of a specific website Problem: model is too specified, not general  (ii)Statistical analysis over large volume of data, training a regression model Problem: only reflect collective patterns, can not be used for individual webpage

Our Goal Based on some common features of user- generated-data in web2.0 applications, propose a general model that can roughly predict a webpage’s importance and future popularity.

Our Idea Taking into account page-level statistics:  # views / replies  # like / dislike  Time of created/comments Using the features in common, training a model of importance estimation

Examples

Methods Comments have tight correlation with view count Popularity prediction:  View count as the popularity metric (but it’s only a snapshot of current time)  Comments are the traces left by users, can reflect users’ response, use comment history to predict future popularity

Current Progress(with Shawn) Analyzing how the features reflect popularity Collecting datasets (YouTube, Digg.etc.) Reading related papers

Better suggestions? Thanks!