Presentation is loading. Please wait.

Presentation is loading. Please wait.

Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.

Similar presentations


Presentation on theme: "Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new."— Presentation transcript:

1 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 1 Advanced databases – Inferring implicit/new knowledge from data(bases): Tying it all together (a start) Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ Last update: 6 December 2007

2 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 2 Goal 1 for today Wrap up yesterday‘s lecture and discussion + prepare you for the next assignment

3 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 3 Goal 2 for today: identify „missing links“ & point to solution approaches (on the board)

4 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 4 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

5 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 5 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

6 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 6 Mining association rules Apriori: (slides from D. Delic) Mining generalized association rules: (Karlsruhe slides)

7 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 7 Main interestingness measures of association rules n Support of a rule A  B = no. of instances with A and B / no. of all instances n Confidence of a rule A  B = no. of instances with A and B / no. of instances with A = support (A & B) / support (A) n Lift of a rule A  B = support (A & B) / [ support (A) * support (B) ] l What does this measure, and in what numerical interval can it be?

8 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 8 Interesting- ness measures

9 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 9 Interestingness as a constraint So we‘re not interested in „show me all patterns“ But „show me all patterns that are interesting = that have properties X“  constraints!

10 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 10 Examples from MINERULE MINE RULE exemple as SELECT DISTINCT 1..n Item as BODY, 1..1 Item as HEAD, SUPPORT, CONFIDENCE WHERE HEAD.Item=« umbrellas » // also other fields, e.g. Date FROM Purchase GROUP BY Tid HAVING COUNT(*)<6 EXTRACTING RULES WITH SUPPORT: 0.06, CONFIDENCE: 0.9 E.g., jacket flight_Dublin  umbrellas (0.08,0.93)

11 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 11 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

12 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 12 The site Business understanding / problem definition: * How do users search in this online catalog? * Which search criteria are popular? * Which are efficient? [Berendt & Spiliopoulou, VLDB Journal 2000]

13 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 13 The concept hierarchies / site ontology (excerpt) SEITE1-...LI (1st page of a list) or SEITEn-...LI (further page) LA („Land“)SA („Schulart“)SU („Suche“)

14 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 14 Sequence mining – one result pattern: successful search for a school in Germany a refinement a repetition a continuation one example pattern select t from node a b, template a * b as t where a.url startswith "SEITE1-" and a.occurrence = 1 and b.url contains "1SCHULE" and b.occurrence = 1 and (b.support / a.support) >= 0.2 (Berendt & Spiliopoulou, VLDB J. 2000) /liste.html?offset=920&ze ilen=20&anzahl=1323&sprac he=de&sw_kategorie=de&ers cheint=&suchfeld=&suchwer t=&staat=de&region=by&sch ultyp=

15 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 15 Sequences

16 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 16 Generalized sequences, navigation patterns, hits in WUM

17 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 17 Aggregated Logs: The basic internal representation in WUM

18 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 18 The confi- dence measure for genera-lized sequences

19 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 19 Templates in the query language MINT, g-sequences, and navigation patterns

20 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 20 Interestingness measures: Support (hits) and confidence

21 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 21 Aggregated Logs, queries, and query results

22 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 22 The basic idea of the WUM algorithm

23 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 23 MINT can express 3 types of constraints (“predicates“)

24 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 24 The WUM gseqm algorithm (B predicates)

25 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 25 Also for higher-order structures (graphs): Ex. MolFea

26 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 26 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

27 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 27 The basic idea (on the board)

28 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 28 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

29 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 29 (One) basic idea (on the board)

30 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 30 Next lecture Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief) Applications

31 Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/ 31 References and background reading; acknowledgements n Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207--216, Washington, D.C., May 1993. http://citeseer.ist.psu.edu/agrawal93mining.htmlhttp://citeseer.ist.psu.edu/agrawal93mining.html l (presentation from Delic, D. (2002). Mining Association Rules with Rough Sets and Large Itemsets - A Comparative Study.) n Ramakrishnan Srikant and Rakesh Agrawal. Mining Generalized Association Rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995. http://citeseer.ist.psu.edu/srikant95mining.html http://citeseer.ist.psu.edu/srikant95mining.html l (presentation from http://www.kde.cs.uni- kassel.de/lehre/ss2004/kdd/folien/4Folie_VII.3_Assoziationsregeln.pdf)http://www.kde.cs.uni- kassel.de/lehre/ss2004/kdd/folien/4Folie_VII.3_Assoziationsregeln.pdf n P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proceedings of the Eight A CM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2002. 183 http://citeseer.ist.psu.edu/tan02selecting.html http://citeseer.ist.psu.edu/tan02selecting.html n MINERULE: R. Meo, G. Psaila and S. Ceri, An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, Vol. 2 (2), pp. 195-224, 1998. http://www.springerlink.com/index/L57188431Q027L73.pdf http://www.springerlink.com/index/L57188431Q027L73.pdf n WUM and the Schulweb study: Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75. http://vasarely.wiwi.hu-berlin.de/Home/berendt-spiliopoulou-vldbj00.pdf http://vasarely.wiwi.hu-berlin.de/Home/berendt-spiliopoulou-vldbj00.pdf n MolFea (esp. The example): S. Kramer, L. De Raedt, C. Helma. Molecular Feature Mining in HIV Data, in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 2001. n De Raedt, L. (2002) A perspective on inductive databases. SIGKDD Explorations. Volume 4, Issue 2, 69-77. http://owl-workshop.man.ac.uk/acceptedLong/submission_25.pdfhttp://owl-workshop.man.ac.uk/acceptedLong/submission_25.pdf


Download ppt "Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new."

Similar presentations


Ads by Google