Presentation is loading. Please wait.

Presentation is loading. Please wait.

2002-04-24CHI Web Behavior Patterns1 Separating the Swarm Categorization Methods for User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research.

Similar presentations


Presentation on theme: "2002-04-24CHI Web Behavior Patterns1 Separating the Swarm Categorization Methods for User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research."— Presentation transcript:

1 2002-04-24CHI Web Behavior Patterns1 Separating the Swarm Categorization Methods for User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research Center 2002.04.24 – CHI Web Behavior Patterns

2 2002-04-24CHI Web Behavior Patterns 2 Web Analytics: What can you measure? - content - page traffic Marketing Infrastructure - load testing - user intent - usability - user experience Site Design Want to improve site design, content, and performance

3 2002-04-24CHI Web Behavior Patterns 3 The Change in Web Sites: What should you measure? Page-based websites Activity-based websites Time Site Complexity Products Management Team I’d like information on used cars. Search for a car dealer in my neighborhood. TRAFFIC USER EXPERIENCE

4 2002-04-24CHI Web Behavior Patterns 4 Motivation What are users’ information goals? Understanding the composition of web user traffic. Strategy: Use all available data to discover user goals. (Content, Usage, Topology) System Description Evaluation Implications Conclusion

5 2002-04-24CHI Web Behavior Patterns 5 System Description Generate a user profile for each user session. –How: Use access logs and site content to to build a multi-featured model of user activity (multi-modal clustering). Group user profiles into common activities like “product browsing” and “job seeking” –How: Apply clustering algorithms to user profiles

6 2002-04-24CHI Web Behavior Patterns 6 System Description Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles Steps: 1.Process Access Logs 2.Crawl Web Site 3.Build Document Model 4.Extract User Sessions 5.Build User Profiles 6.Cluster Profiles

7 2002-04-24CHI Web Behavior Patterns 7 Document Model Site is crawled –Pay special attention to pages in logs. Documents described by feature vectors: Content: TF.IDF weighted keyword vector URL: Tokenized and TF.IDF weighted Inlinks: Column vectors in topology matrix Outlinks: Row vectors in topology matrix Vectors are concatenated to form a single multi-modal vector P d for each document. Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles

8 2002-04-24CHI Web Behavior Patterns 8 User Sessions Sessions extracted and represented by a vector s: –For path i = A  B  D, s i = (For site with 5 documents ) Different weightings can be employed in creating the session vector s: Frequency: number of times each page is accessed. A  B  D, s = TF.IDF:  hits / # paths including page Position: Use order of pages within surfing path. A  B  D, s = View Time: Use time spent viewing pages. A 10s  B 20s  D 15s, s = Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles

9 2002-04-24CHI Web Behavior Patterns 9 User Profiles User profiles are linear combination of the viewed pages. –“You are what you see.” User Profiles Session weights Document Vectors Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles

10 2002-04-24CHI Web Behavior Patterns 10 Clustering Clustering is a form of statistical analysis which organizes data into individual clusters. –Groupings are determined by a shared similarity. –Similarity is defined by a computable similarity metric. Clustering proceeds by recursive bisection, using K-Means to perform the bisections [Zhao01]. Web CrawlAccess Logs Document Model User Sessions User Profiles Clustered Profiles weights w m specify the contribution of each modality

11 2002-04-24CHI Web Behavior Patterns 11 User population breakdown Detailed stats Keywords describing user groups Frequent documents accessed by group

12 2002-04-24CHI Web Behavior Patterns 12 Clustering Results Users reached end of tutorial, had nowhere to go. http://www.diamondreview.com

13 2002-04-24CHI Web Behavior Patterns 13 System Evaluation Does the system correctly infer user intentions? Logs System User Intent Groupings User Intent Compare

14 2002-04-24CHI Web Behavior Patterns 14 User Study Asked users to surf specific tasks on www.xerox.com www.xerox.com –captured actions using the WebQuilt proxy logger [Hong01] –done at their leisure. 15 unique tasks: –Tasks developed after exploring xerox.com and reading user e-mail feedback –5 task groups with 3 tasks per group. –Products, TechSupport, Supplies, Company Info, and Jobs Participation: –21 users signed up, 18 went through, 104 usable sessions.

15 2002-04-24CHI Web Behavior Patterns 15 Results: Results: 340 combinations of clustering schemes Outlink-based schemes performed poorly (omitted).

16 2002-04-24CHI Web Behavior Patterns 16 Analysis: Modalities Linear Contrast shows Content sig. different: (unimodal) F(1,105)=32.51, MSE=.005361, p<0.0001 (multimodal) F(1,35)=33.36, MSE=.007332, p<0.0001 Content is King! Mean=0.96, StdDev=0.07

17 2002-04-24CHI Web Behavior Patterns 17 Analysis: Path Weighting Paired t-Test between Time-based and non-Time based weightings: n=60, t(59)=4.85, p=4.68e-6 V.T.mean=89.5%, s.d.=12.7%, non-V.T.mean=83.2%, s.d.=12.0% View Time is best!

18 2002-04-24CHI Web Behavior Patterns 18 Observation: Multi-Modal vs. Unimodal In practice, Multi-Modal should be more robust –Some pages don’t have much content »Images, Audio, Video »PDF, PS (if you don’t have necessary software) –URL Tokens: All pages have URLs. –Inlinks: don’t depend on any features of a page! In our experience, Content-based Multi-Modal Clustering retains accuracy. Linear Contrast shows no significant difference between multi- modal and uni-modal schemes: F(1,77)=1.63, MSE=.004407, p=.21

19 2002-04-24CHI Web Behavior Patterns 19 Findings Incorporating View Time improves clustering accuracy. Though it involves extra work, extracting Content can provide very high accuracy. Adding other modalities make clustering more robust. Modalities should be chosen carefully, and tailored for each specific site.

20 2002-04-24CHI Web Behavior Patterns 20 Implications for Designers Good design means understanding your users. It’s possible to understand trends of user activities accurately. –Requires well-defined user tasks doable on the site. Now you can design and tailor user experience. –Address discovered usability issues. –Update design to facilitate common tasks.

21 2002-04-24CHI Web Behavior Patterns 21 Summary: “You are what you see.” User Information Goals Web site Page Content Topology InfoScent Clustering Observed Usage Users follow the best Information Scent to accomplish their goals.

22 2002-04-24CHI Web Behavior Patterns 22 Future Work Determining # of clusters –Currently done semi-manually Model unstructured task more directly Directly recommend design changes Integrate with –Clustering Visualization –User Path Visualization Lots of Commercial Interest, Licensing

23 2002-04-24CHI Web Behavior Patterns 23 Conclusion Performed first known user study to characterize the analytic space of session clustering techniques. Found that session clustering can be highly accurate with respect to user intentions. Demonstrated our method is scalable and useful in real-world scenarios. This should prove to be a useful tool for web designers and researchers!

24 2002-04-24CHI Web Behavior Patterns 24 Acknowledgements Peter Pirolli, Stu Card, Adam Rosien, Pam Schraedley and the the UIR and Bloodhound Team at PARC. George Karypis for CLUTO software Participants in our user study Office of Naval Research Contact: Jeff Heer (jheer@parc.com)jheer@parc.com Ed H. Chi (echi@parc.com)echi@parc.com Separating the Swarm Categorization Methods for User Sessions on the Web


Download ppt "2002-04-24CHI Web Behavior Patterns1 Separating the Swarm Categorization Methods for User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research."

Similar presentations


Ads by Google