Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adaptive Web Sites: Adaptive Web Sites: Automatically Synthesizing Web Pages Mike Perkowitz and Oren Etzioni www.cs.washington.edu/homes/map/adaptive/

Similar presentations


Presentation on theme: "Adaptive Web Sites: Adaptive Web Sites: Automatically Synthesizing Web Pages Mike Perkowitz and Oren Etzioni www.cs.washington.edu/homes/map/adaptive/"— Presentation transcript:

1 Adaptive Web Sites: Adaptive Web Sites: Automatically Synthesizing Web Pages Mike Perkowitz and Oren Etzioni www.cs.washington.edu/homes/map/adaptive/

2 2 Adaptive Web Sites Web sites that automatically reconfigure their organization and presentation by learning from user access patterns. (Perkowitz & Etzioni, IJCAI’97)

3 3 Adaptive Web Sites Individual CustomizationIndividual Customization: site learns you like sports Group TransformationGroup Transformation: site learns most sports lovers also read “Tank McNamara” and cross-links them

4 4 Group Transformations Our approach: history-based Previously: Simple transformations (Perkowitz & Etzioni, WWW6) change in viewGoal: change in view

5 5 machines.hyperreal.org

6 6 Drum Machine Samples

7 7

8 8 Index Page Synthesis Find groups of related documents at the site and create new pages linking to those documents. Input:web site, access log Output:pages of links to related pages

9 9 Questions What links are on the index page?What links are on the index page? How are the contents ordered? What is the title? How are links labeled? How do we make the index comprehensive?

10 10 Outline Motivation Plausible approachesPlausible approaches –Clustering –Frequent sets Our approach: Cluster MiningOur approach: Cluster Mining –Algorithm: PageGather EvaluationEvaluation

11 11 Clustering Voorhees-86,Willet-88,Rasmussen-92 SimilaritySimilarity metric over documents close togetherCluster: items close together, far from others Algorithms: Hierarchical Agglomerative Clustering (HAC) K-means clustering

12 12 Clustering Visit Visit: set of pages accessed by an individual Document = page Similarity = co-occurrence in visits Cluster  index page contents

13 13 Clustering: Problems partitionClustering induces a partition over data slowClustering can be slow

14 14 Frequent Sets Agrawal, Imielinski, & Swami-93 transactionsSet of transactions: “basket” of items frequently-occurring itemsetsFind all frequently-occurring itemsets Algorithm: A priori

15 15 Frequent Sets Visit Visit: set of pages accessed by an individual Item = page Transaction = visit Frequent set  index page contents

16 16 Frequent Sets: Problems Frequent Item Problem“Frequent Item Problem” Finds many similar itemsets low minimum frequency  high running time

17 17 Idea: Cluster Mining only high-qualityFind only high-quality clusters Not a partition overlapClusters may overlap

18 18 The PageGather Algorithm Graph-basedGraph-based representation –Nodes: pages –Edges: if P(P 1 |P 2 ) and P(P 2 |P 1 ) is high Fast accurateFast and accurate

19 19 www.hyperreal.com|crawl3.atext.com|GET /robots.txt HTTP/1.0|text/html|301|1997/07/03-23:59:08|-|188|-|-|-|ArchitextSpider www.apache.org|blizzard-ext.wise.edt.ericsson.se|GET /related_projects.html HTTP/1.0|text/html|200|1997/07/03-23:59:09|-|5047|-|- |http://www.apache.org/|Mozilla/3.01Gold (X11; I; SunOS 5.5.1 sun4u) via Harvest Cache version 3.0pl5-Solaris www.hyperreal.org|md27-001.mun.compuserve.com|GET /music/labels/recycle_or_die/ralf_hildenbeutel.gif HTTP/1.0|image/gif|304|1997/07/03-23:59:09|-|-|-|- |http://www.hyperreal.org/music/labels/recycle_or_die/|Mozilla/2.02E [de]-Beta2 (Win95; I; 16bit) www.hyperreal.org|ras87.brunnet.net|GET /raves/media/cyberia/link.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:09|-|415|-|- |http://www.hyperreal.org/raves/media/cyberia/|Mozilla/4.01 [en] (Win95; I) www.apache.org|blizzard-ext.wise.edt.ericsson.se|GET /images/apache_sub.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:10|-|6083|-|- |http://www.apache.org/related_projects.html|Mozilla/3.01Gold (X11; I; SunOS 5.5.1 sun4u) via Harvest Cache version 3.0pl5-Solaris www.apache.org|210.140.143.27|GET /images/apache_pb.gif HTTP/1.0|image/gif|304|1997/07/03-23:59:10|-|-|-|- |http://www.apache.org/|Mozilla/3.01 [ja] (Win95; I) www.apache.org|r2d2.dd.dk|GET /docs/ HTTP/1.0|text/html|200|1997/07/03- 23:59:11|-|2207|-|-|http://www.apache.org/|Mozilla/2.0 (compatible; MSIE 3.01; Windows 95) www.hyperreal.org|md27-001.mun.compuserve.com|GET /music/labels/recycle_or_die/oliver_lieb.gif HTTP/1.0|image/gif|304|1997/07/03- 23:59:11|-|-|-|- |http://www.hyperreal.org/music/labels/recycle_or_die/|Mozilla/2.02E [de]-Beta2 (Win95; I; 16bit) www.hyperreal.org|du5-ts1.lascruces.com|GET /~wally/epsilon.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:11|-|4002|-|- |http://www.hyperreal.org/music/artists/fsol/www/|Mozilla/2.0 (compatible; MSIE 3.02; Update a; Windows 95) www.hyperreal.org|du5-ts1.lascruces.com|GET /~wally/hyperreal.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:11|-|2525|-|- |http://www.hyperreal.org/music/artists/fsol/www/|Mozilla/2.0 (compatible; MSIE 3.02; Update a; Windows 95) www.hyperreal.org|md27-001.mun.compuserve.com|GET /music/labels/recycle_or_die/baked_beans.gif HTTP/1.0|image/gif|304|1997/07/03- 23:59:11|-|-|-|- |http://www.hyperreal.org/music/labels/recycle_or_die/|Mozilla/2.02E [de]-Beta2 (Win95; I; 16bit) www.hyperreal.org|cc6145d.comm.sfu.ca|GET /music/machines/categories/effects/ HTTP/1.0|text/html|200|1997/07/03-23:59:12|-|3844|-|- |http://www.hyperreal.org/music/machines/categories/|Mozilla/2.02 (Macintosh; I Log /97/Winter/Final/ /97/Spring/Final/ /96/Autumn/Final/ /97/Spring/Midterm/ /96/Autumn/Midterm/ www.hyperreal.com|crawl3.atext.com|GET /robots.txt HTTP/1.0|text/html|301|1997/07/03-23:59:08|-|188|-|-|- |ArchitextSpider www.apache.org|blizzard-ext.wise.edt.ericsson.se|GET /related_projects.html HTTP/1.0|text/html|200|1997/07/03-23:59:09|-|5047|-|- |http://www.apache.org/|Mozilla/3.01Gold (X11; I; SunOS 5.5.1 sun4u) via Harvest Cache version 3.0pl5-Solaris www.hyperreal.org|md27-001.mun.compuserve.com|GET /music/labels/recycle_or_die/ralf_hildenbeutel.gif HTTP/1.0|image/gif|304|1997/07/03-23:59:09|-|-|-|- |http://www.hyperreal.org/music/labels/recycle_or_die/| Mozilla/2.02E [de]-Beta2 (Win95; I; 16bit) www.hyperreal.org|ras87.brunnet.net|GET /raves/media/cyberia/link.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:09|-|415|-|- |http://www.hyperreal.org/raves/media/cyberia/|Mozilla/ 4.01 [en] (Win95; I) www.apache.org|blizzard-ext.wise.edt.ericsson.se|GET /images/apache_sub.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:10|-|6083|-|- |http://www.apache.org/related_projects.html|Mozilla/3. 01Gold (X11; I; SunOS 5.5.1 sun4u) via Harvest Cache version 3.0pl5-Solaris www.apache.org|210.140.143.27|GET /images/apache_pb.gif HTTP/1.0|image/gif|304|1997/07/03-23:59:10|-|-|-|- |http://www.apache.org/|Mozilla/3.01 [ja] (Win95; I) www.hyperreal.com|crawl3.atext.com|GET /robots.txt HTTP/1.0|text/html|301|1997/07/03-23:59:08|-|188|-|-|- |ArchitextSpider www.apache.org|blizzard-ext.wise.edt.ericsson.se|GET /related_projects.html HTTP/1.0|text/html|200|1997/07/03-23:59:09|-|5047|-|- |http://www.apache.org/|Mozilla/3.01Gold (X11; I; SunOS 5.5.1 sun4u) via Harvest Cache version 3.0pl5-Solaris www.hyperreal.org|md27-001.mun.compuserve.com|GET /music/labels/recycle_or_die/ralf_hildenbeutel.gif HTTP/1.0|image/gif|304|1997/07/03-23:59:09|-|-|-|- |http://www.hyperreal.org/music/labels/recycle_or_die/| Mozilla/2.02E [de]-Beta2 (Win95; I; 16bit) www.hyperreal.org|ras87.brunnet.net|GET /raves/media/cyberia/link.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:09|-|415|-|- |http://www.hyperreal.org/raves/media/cyberia/|Mozilla/ 4.01 [en] (Win95; I) www.apache.org|blizzard-ext.wise.edt.ericsson.se|GET /images/apache_sub.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:10|-|6083|-|- |http://www.apache.org/related_projects.html|Mozilla/3. 01Gold (X11; I; SunOS 5.5.1 sun4u) via Harvest Cache version 3.0pl5-Solaris www.apache.org|210.140.143.27|GET /images/apache_pb.gif HTTP/1.0|image/gif|304|1997/07/03-23:59:10|-|-|-|- |http://www.apache.org/|Mozilla/3.01 [ja] (Win95; I) www.hyperreal.com|crawl3.atext.com|GET /robots.txt HTTP/1.0|text/html|301|1997/07/03-23:59:08|-|188|-|-|- |ArchitextSpider www.apache.org|blizzard-ext.wise.edt.ericsson.se|GET /related_projects.html HTTP/1.0|text/html|200|1997/07/03-23:59:09|-|5047|-|- |http://www.apache.org/|Mozilla/3.01Gold (X11; I; SunOS 5.5.1 sun4u) via Harvest Cache version 3.0pl5-Solaris www.hyperreal.org|md27-001.mun.compuserve.com|GET /music/labels/recycle_or_die/ralf_hildenbeutel.gif HTTP/1.0|image/gif|304|1997/07/03-23:59:09|-|-|-|- |http://www.hyperreal.org/music/labels/recycle_or_die/| Mozilla/2.02E [de]-Beta2 (Win95; I; 16bit) www.hyperreal.org|ras87.brunnet.net|GET /raves/media/cyberia/link.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:09|-|415|-|- |http://www.hyperreal.org/raves/media/cyberia/|Mozilla/ 4.01 [en] (Win95; I) www.apache.org|blizzard-ext.wise.edt.ericsson.se|GET /images/apache_sub.gif HTTP/1.0|image/gif|200|1997/07/03-23:59:10|-|6083|-|- |http://www.apache.org/related_projects.html|Mozilla/3. 01Gold (X11; I; SunOS 5.5.1 sun4u) via Harvest Cache version 3.0pl5-Solaris www.apache.org|210.140.143.27|GET /images/apache_pb.gif HTTP/1.0|image/gif|304|1997/07/03-23:59:10|-|-|-|- |http://www.apache.org/|Mozilla/3.01 [ja] (Win95; I) Visits Co-occurrence GraphClique/CCNew Page /97/Winter/Final/ /97/Spring/Final/ /96/Autumn/Final/ /97/Spring/Midterm/ /96/Autumn/Midterm/

20 20 PageGather Implement with Cliques or CCs –Find all candidates, return best –Clique: maximal cliques of size  k Clique and CC versions comparable in time and performance

21 21 Experiments machines.hyperreal.org 1200 visitors/daySite gets ~1200 visitors/day (10k hits) 2500 distinct documentsSite contains ~2500 distinct documents TrainingTraining: a month of access data TestingTesting: ten days of data

22 22 Performance Metric Are index pages helpful to users? How well do clusters predict user navigation? Q(C) = Given that a user visits one page in cluster C, how likely is she to visit any other?

23 23 Cluster Mining vs. Clustering PageGather using Clique  10 clusters 1:05 min HAC  10 clusters 48+ hours K-means  10 clusters 3:35 min

24 24 Cluster Mining vs. Clustering PageGather using Clique  10 clusters 1:05 min HAC  10 clusters 48+ hours K-means  10 clusters 3:35 min HAC*  8 clusters 21:55 min (threshold, less data, mining)

25 25 Cluster Mining vs. Clustering PageGather using Clique  10 clusters 1:05 min HAC  10 clusters 48+ hours K-means  10 clusters 3:35 min HAC*  7 clusters 293:08 min (threshold, less data, mining)

26 26 Cluster Mining vs. Clustering Top 10 Clusters Q

27 27 Cluster Mining vs. Clustering Top 10 Clusters Q

28 28 Cluster Mining vs. Clustering Top 10 Clusters Q

29 29 PageGather vs. Frequent Sets PG/Clique  10 clusters 1:05 min A priori  10 frequent sets 1:41 min

30 30 PageGather vs. Frequent Sets Top 10 Clusters Q

31 31 Contributions Web page synthesisMotivating problem: Web page synthesis Cluster miningMethod: Cluster mining –well suited for discovery of coherent sets –comparison to clustering, frequent sets PageGatherAlgorithm: PageGather –graph-based, fast and accurate

32 32 Clique vs. Conn-component Top 10 Clusters Q

33 33 Clique vs. Conn-component Comparable accuracy Clique finds fewer, smaller clusters than CC Clique: more accurate (at first) Comparable running time (in practice)

34 34 Future Directions Meta-InformationMeta-Information to improve coherence Conceptual clusteringConceptual clustering –Improve coherence –Naming pages association rulesCluster mining to generate association rules


Download ppt "Adaptive Web Sites: Adaptive Web Sites: Automatically Synthesizing Web Pages Mike Perkowitz and Oren Etzioni www.cs.washington.edu/homes/map/adaptive/"

Similar presentations


Ads by Google