Presentation is loading. Please wait.

Presentation is loading. Please wait.

119-03-08. Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services. 219-03-08.

Similar presentations


Presentation on theme: "119-03-08. Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services. 219-03-08."— Presentation transcript:

1 119-03-08

2 Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services. 219-03-08

3 Over 1 billion HTML pages, 15 terabytes Wealth of information Bookstores, restaurants, travel, malls, dictionaries, news, stock quotes, yellow & white pages, maps, markets,......... Diverse media types: text, images, audio, video Heterogeneous formats: HTML, XML, postscript, pdf, JPEG, MPEG, MP3 Highly Dynamic 1 million new pages each day Average page changes in a few weeks Graph structure with links between pages Average page has 7-10 links Hundreds of millions of queries per day 319-03-08

4  E-commerce  generate user profiles  targeted advertising  fraud  Network Management  performance management  fault management  Information Retrieval 419-03-08

5 Web Mining Web Content Mining Web Usage Mining Web Structure Mining 519-03-08

6 Web content mining: focuses on techniques for assisting a user in finding documents that meet a certain criterion (text mining) Web structure mining: aims at developing techniques to take advantage of the collective judgement of web page quality which is available in the form of hyperlinks Web usage mining: focuses on techniques to study the user behaviour when navigating the web 619-03-08

7  Visual Web Mining (VWM) is the application of Information Visualization techniques on results of Web Mining in order to further amplify the perception of extracted patterns, rules and regularities, or to visually explore new ones in web domain. 719-03-08

8 8

9  Webbot  Integration Engine  Data mining suite  Link analysis suite  Database  VTK 919-03-08

10  Global techniques  Geometric techniques  Feature-based techniques The second and third have now become the most widely used visualization methods. 1019-03-08

11 The Web Knowledge Visualization and Discovery System (WEBKVDS) is mainly composed of two parts: 1- FootPath: for visualizing the web structure with the different data and pattern layers. 2- Web Graph Algebra: for manipulating and operating on the web graph objects for visual data mining. 1119-03-08

12  Web graph  Web image  Information layers NumofVisit layer LinkUsage layer ViewTime layer ProbUsage layer  Pattern layers Association rules 1219-03-08

13 1319-03-08

14 Footpath is the rendering engine of visualization and discovery system. A web graph is displayed by first rendering the web image and then attributing visual characteristics to nodes and edges such as colour, thickness etc., to represent data from information layers.  Web image rendering  Dynamic layout 1419-03-08

15 Web Graph Algebra, to manipulate and produce web graphs. Variables in our algebra are web graphs.  Operator FILTER: θ = FLTLayer,threshold(α)  Operator ADD: θ = α + β  Operator MINUS: θ = α − β  Operator COMMON: θ = α :: β  Operator MINUS IN: θ = α −.β  Operator MINUS OUT θ = α. − β  Operator EXCEPT: θ = α _ β 1519-03-08

16 VISUALIZATION DIAGRAMS Figure shows 2D visualization with strahler coloring.It shows user access paths scattering from first page of website (the node in center) to cluster of web pages corresponding to faculty pages, course home pages, etc. 1619-03-08

17 VISUALIZATION DIAGRAM 2 It is a 3D visualization of web usage for a site. The cylinder like part of this figure is visualization of web usage of surfers as they browse a long HTML document 1719-03-08

18 VISUALIZATION DIAGRAM 3 Right: One can observe long user sessions as strings falling off clusters. Those are special type of long sessions when user navigates sequence of web pages which come one after the other under a cluster, e.g., sections of a long document. In many cases we found web pages with many nodes connected with Next/Up/Previous hyperlinks. 1819-03-08

19 VISUALIZATION DIAGRAM 4 User’s browsing access pattern is amplified by a different coloring. Depending on link structure of underlying pages, we can see vertical access patterns of a user drilling down the cluster, making a cylinder shape. Also users following links going down a hierarchy of web pages makes a cone shape and users going up hierarchies, e.g., back to main page of website makes a funnel shape 1919-03-08

20 VISUALIZATION DIAGRAM 5 Frequent access patterns extracted by web mining process are visualized as a white graph on top of embedded and colorful graph of web usage. 2019-03-08

21 VISUALIZATION DIAGRAM 6 Superimposition of Web Usage on top of Web Structure with span tree layout. One can easily see what parts of the web site was visited by users and what parts are not frequently used. Coloring gives visual cue of entry and exit points of access paths. 2119-03-08

22 web knowledge visualization and discovery system visualizes multi-tier web graphs, and with the help of the web graph algebra, provides a powerful means for interactive visual web mining. Moreover, we have yet to study interesting properties such as commutativity, associativity, or distributivity of operators if coefficients are introduced later in the algebra. 2219-03-08

23 www.cs.rpi.edu www.cs.arizona.edu madrias@umr.edu 2319-03-08

24 2419-03-08


Download ppt "119-03-08. Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services. 219-03-08."

Similar presentations


Ads by Google