Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.

Similar presentations


Presentation on theme: "Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India."— Presentation transcript:

1 Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India

2

3

4 Information Retrieval

5

6 Create a list of words Remove stop words Stem words Calculate frequency of each stemmed word Figure 2.1 Transforming text document to a weighted list of keywords

7

8 Data Mining has emerged as one of the most exciting and dynamic fields in computing science. The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them. Commercial enterprises have been quick to recognize the value of this concept; consequently, within the span of a few years, the software market itself for data mining is expected to be in excess of $10 billion. Data mining refers to a family of techniques used to detect interesting nuggets of relationships/knowledge in data. While the theoretical underpinnings of the field have been around for quite some time (in the form of pattern recognition, statistics, data analysis and machine learning), the practice and use of these techniques have been largely ad-hoc. With the availability of large databases to store, manage and assimilate data, the new thrust of data mining lies at the intersection of database systems, artificial intelligence and algorithms that efficiently analyze data. The distributed nature of several databases, their size and the high complexity of many techniques present interesting computational challenges.

9

10

11

12

13

14

15

16 Figure 2.43 Relationship between precision and recall

17

18 Semantic Web

19 Semantic Web The layer language model (Berners-Lee, 2001; Broekstra et al, 2001)

20

21

22 Figure 3.4 Representing classes and instances (Noy et al., 2001)

23

24

25

26 Queries 1 and 2

27 Queries 3 and 4

28

29

30

31

32 A RDF model for automobiles

33

34

35 Classification and Association

36 Data Preparation Database Theory SQL Data Transformation http://www.ecn.purdue.edu/KDDCUP/data/

37 Classification Find a rule, a formula, or black box classifier for organizing data into classes. –Classify clients requesting loans into categories based on the likelihood of repayment –Classify customers into Big or Moderate Spenders based on what they buy –Classify the customers into loyal, semi-loyal, infrequent based on the products they buy The classifier is developed from the data in the training set The reliability of the classifier is evaluated using the test set of data

38 Classification ID3 Algorithm –Numerical Illustration –Application to a Small E-commerce Dataset C4.5 for Experimentation Other approaches –Neural Networks –Fuzzy Classification –Rough Set Theory

39 Association Market basket analysis –determine which things go together Transactions might reveal that –customers who buy banana also buy candles –cheese and pickled onions seem to occur frequently in a shopping cart Information can be used for –arranging a physical shop or structuring the Web site –for targeted advertising campaign

40 Association Apriori Algorithm Demonstration for an E-commerce Application

41 Clustering

42 Breaks a large database into different subgroups or clusters Unlike classification there are no predefined classes The clusters are put together on the basis of similarity to each other The data miners determine whether the clusters offer any useful insight

43

44 Statistical Methods k – means –Numerical Example –Implementation Data Preparation Clustering Other Methods

45 Neural Network Based Approaches Kohonen Self Organising Maps –Numerical Demonstration –Application to Web Data Collection Other Neural Network Based Approaches

46 Clustering of customers

47

48 Web Usage Mining

49 High level web usage mining process (Srivastava et al., 2000)

50 Applications of web usage mining (Romanko, 2006; Srivastava et al., 2000)

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67 Clustering exercise

68

69

70 Classification exercise

71 Association exercise

72

73 Sequence Pattern Analysis of Web Logs

74

75

76

77 Web Content Mining

78 Data Collection Web Crawlers Public Domain Web Crawlers An Implementation of a Web Crawler

79 Architecture of a search engine (Romanko, 2006)

80

81

82

83 Other topics in Web Content Mining Search Engines –How to prepare for and setup a search engine –Types and listings of search engines (freeware, remote hosting services, commercial) Multimedia Information Retrieval

84 Web Structure Mining

85

86 http://www.iprcom.com/papers/pagerank/

87

88

89

90

91 Index quality for different search engines (Henzinger, et al., 1999)

92 Index quality per page for different search engines (Henzinger, et al., 1999)

93

94


Download ppt "Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India."

Similar presentations


Ads by Google