2002-04-24CHI Web Behavior Patterns1 Separating the Swarm Categorization Methods for User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Web Mining.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advanced Web Metrics with Google Analytics By: Carley Brown.
شهره کاظمی 1 آزمايشکاه سيستم های هوشمند ( گزارش پيشرفت کار پروژه مدل مارکف.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Chapter 12: Web Usage Mining - An introduction
CS CS 5150 Software Engineering Lecture 12 Usability 2.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.
Web Mining Research: A Survey
Ed H. Chi IMA Digital Library Workshop Ed H. Chi U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology.
Tasks, scenarios, sitemaps 21 Feb Task Analysis (1/3)  Know who is going to use the system  ID tasks that they now perform  ID tasks that they’d.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
Web Design, 3 rd Edition 3 Planning a Successful Web Site: Part 1.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
© 2004 Keynote Systems Customer Experience Management (CEM) Bonny Brown, Ph.D. Director, Research & Public Services.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
]. Website Must-Haves Know your audience Good design Clear navigation Clear messaging Web friendly content Good marketing strategy.
WebQuilt and Mobile Devices: A Web Usability Testing and Analysis Tool for the Mobile Internet Tara Matthews Seattle University April 5, 2001 Faculty Mentor:
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
1.Understand the decision-making process of consumer purchasing online. 2.Describe how companies are building one-to-one relationships with customers.
Lecturer: Ghadah Aldehim
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
1 State Records Center Entering New Inventory  Versatile web address:  Look for any new ‘Special Updates’ each.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Plan My Move & MilitaryINSTALLATIONS May, 2008 Relocation Personnel Roles and Responsibilities MC&FP.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Case 2: Emerson and Sanofi Data stewards seek data conformity
Sustainability: Web Site Statistics Marieke Napier UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: URL
Web Design, 3 rd Edition 3 Planning a Successful Web Site: Part 1.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Web Design, 3 rd Edition 3 Planning a Successful Web Site: Part 1.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Quality Is in the Eye of the Beholder: Meeting Users ’ Requirements for Internet Quality of Service Anna Bouch, Allan Kuchinsky, Nina Bhatti HP Labs Technical.
IllinoisJobLink.com Training Video Creating a Resume Copyright © 2015, America’s Job Link Alliance–Technical Support (AJLA–TS) All rights reserved. This.
Assess usability of a Web site’s information architecture: Approximate people’s information-seeking behavior (Monte Carlo simulation) Output quantitative.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Zaap Visualization of web traffic from http server logs.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Data mining in web applications
Big Data.
Lin Lu, Margaret Dunham, and Yu Meng
Web Mining Department of Computer Science and Engg.
Presentation transcript:

CHI Web Behavior Patterns1 Separating the Swarm Categorization Methods for User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research Center – CHI Web Behavior Patterns

CHI Web Behavior Patterns 2 Web Analytics: What can you measure? - content - page traffic Marketing Infrastructure - load testing - user intent - usability - user experience Site Design Want to improve site design, content, and performance

CHI Web Behavior Patterns 3 The Change in Web Sites: What should you measure? Page-based websites Activity-based websites Time Site Complexity Products Management Team I’d like information on used cars. Search for a car dealer in my neighborhood. TRAFFIC USER EXPERIENCE

CHI Web Behavior Patterns 4 Motivation What are users’ information goals? Understanding the composition of web user traffic. Strategy: Use all available data to discover user goals. (Content, Usage, Topology) System Description Evaluation Implications Conclusion

CHI Web Behavior Patterns 5 System Description Generate a user profile for each user session. –How: Use access logs and site content to to build a multi-featured model of user activity (multi-modal clustering). Group user profiles into common activities like “product browsing” and “job seeking” –How: Apply clustering algorithms to user profiles

CHI Web Behavior Patterns 6 System Description Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles Steps: 1.Process Access Logs 2.Crawl Web Site 3.Build Document Model 4.Extract User Sessions 5.Build User Profiles 6.Cluster Profiles

CHI Web Behavior Patterns 7 Document Model Site is crawled –Pay special attention to pages in logs. Documents described by feature vectors: Content: TF.IDF weighted keyword vector URL: Tokenized and TF.IDF weighted Inlinks: Column vectors in topology matrix Outlinks: Row vectors in topology matrix Vectors are concatenated to form a single multi-modal vector P d for each document. Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles

CHI Web Behavior Patterns 8 User Sessions Sessions extracted and represented by a vector s: –For path i = A  B  D, s i = (For site with 5 documents ) Different weightings can be employed in creating the session vector s: Frequency: number of times each page is accessed. A  B  D, s = TF.IDF:  hits / # paths including page Position: Use order of pages within surfing path. A  B  D, s = View Time: Use time spent viewing pages. A 10s  B 20s  D 15s, s = Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles

CHI Web Behavior Patterns 9 User Profiles User profiles are linear combination of the viewed pages. –“You are what you see.” User Profiles Session weights Document Vectors Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles

CHI Web Behavior Patterns 10 Clustering Clustering is a form of statistical analysis which organizes data into individual clusters. –Groupings are determined by a shared similarity. –Similarity is defined by a computable similarity metric. Clustering proceeds by recursive bisection, using K-Means to perform the bisections [Zhao01]. Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles weights w m specify the contribution of each modality

CHI Web Behavior Patterns 11 User population breakdown Detailed stats Keywords describing user groups Frequent documents accessed by group

CHI Web Behavior Patterns 12 Clustering Results Users reached end of tutorial, had nowhere to go.

CHI Web Behavior Patterns 13 System Evaluation Does the system correctly infer user intentions? Logs System User Intent Groupings User Intent Compare

CHI Web Behavior Patterns 14 User Study Asked users to surf specific tasks on –captured actions using the WebQuilt proxy logger [Hong01] –done at their leisure. 15 unique tasks: –Tasks developed after exploring xerox.com and reading user feedback –5 task groups with 3 tasks per group. –Products, TechSupport, Supplies, Company Info, and Jobs Participation: –21 users signed up, 18 went through, 104 usable sessions.

CHI Web Behavior Patterns 15 Results: Results: 340 combinations of clustering schemes Outlink-based schemes performed poorly (omitted).

CHI Web Behavior Patterns 16 Analysis: Modalities Linear Contrast shows Content sig. different: (unimodal) F(1,105)=32.51, MSE= , p< (multimodal) F(1,35)=33.36, MSE= , p< Content is King! Mean=0.96, StdDev=0.07

CHI Web Behavior Patterns 17 Analysis: Path Weighting Paired t-Test between Time-based and non-Time based weightings: n=60, t(59)=4.85, p=4.68e-6 V.T.mean=89.5%, s.d.=12.7%, non-V.T.mean=83.2%, s.d.=12.0% View Time is best!

CHI Web Behavior Patterns 18 Observation: Multi-Modal vs. Unimodal In practice, Multi-Modal should be more robust –Some pages don’t have much content »Images, Audio, Video »PDF, PS (if you don’t have necessary software) –URL Tokens: All pages have URLs. –Inlinks: don’t depend on any features of a page! In our experience, Content-based Multi-Modal Clustering retains accuracy. Linear Contrast shows no significant difference between multi- modal and uni-modal schemes: F(1,77)=1.63, MSE= , p=.21

CHI Web Behavior Patterns 19 Findings Incorporating View Time improves clustering accuracy. Though it involves extra work, extracting Content can provide very high accuracy. Adding other modalities make clustering more robust. Modalities should be chosen carefully, and tailored for each specific site.

CHI Web Behavior Patterns 20 Implications for Designers Good design means understanding your users. It’s possible to understand trends of user activities accurately. –Requires well-defined user tasks doable on the site. Now you can design and tailor user experience. –Address discovered usability issues. –Update design to facilitate common tasks.

CHI Web Behavior Patterns 21 Summary: “You are what you see.” User Information Goals Web site Page Content Topology InfoScent Clustering Observed Usage Users follow the best Information Scent to accomplish their goals.

CHI Web Behavior Patterns 22 Future Work Determining # of clusters –Currently done semi-manually Model unstructured task more directly Directly recommend design changes Integrate with –Clustering Visualization –User Path Visualization Lots of Commercial Interest, Licensing

CHI Web Behavior Patterns 23 Conclusion Performed first known user study to characterize the analytic space of session clustering techniques. Found that session clustering can be highly accurate with respect to user intentions. Demonstrated our method is scalable and useful in real-world scenarios. This should prove to be a useful tool for web designers and researchers!

CHI Web Behavior Patterns 24 Acknowledgements Peter Pirolli, Stu Card, Adam Rosien, Pam Schraedley and the the UIR and Bloodhound Team at PARC. George Karypis for CLUTO software Participants in our user study Office of Naval Research Contact: Jeff Heer Ed H. Chi Separating the Swarm Categorization Methods for User Sessions on the Web