Presentation is loading. Please wait.

Presentation is loading. Please wait.

2001/12/181/50 Discovering Robust Knowledge from Databases that Change Author: Chun-Nan Hsu, Craig A. Knoblock Advisor: Dr. Hsu Graduate: Yu-Wei Su.

Similar presentations


Presentation on theme: "2001/12/181/50 Discovering Robust Knowledge from Databases that Change Author: Chun-Nan Hsu, Craig A. Knoblock Advisor: Dr. Hsu Graduate: Yu-Wei Su."— Presentation transcript:

1 2001/12/181/50 Discovering Robust Knowledge from Databases that Change Author: Chun-Nan Hsu, Craig A. Knoblock Advisor: Dr. Hsu Graduate: Yu-Wei Su

2 2001/12/182/50 Abstract Databases usually change over time and make machine-discovered knowledge inconsistent Useful knowledge should be robust against database changes so that it is unlikely to become inconsistent after database changes

3 2001/12/183/50 Abstract( cont.) Defines this notion of robustness in database and describes how robustness of first-order Horn-clause rules can be estimated and applied in knowledge discovery

4 2001/12/184/50 outline Motivation Objective Terminology Robustness of knowledge Definitions of robustness Estimating robustness Templates for estimating robustness Empirical demonstration of robustness estimation Applying robustness in knowledge discovery Experimental results Conclusion and future work opinion

5 2001/12/185/50 Motivation Many application require discovery knowledge to be consistent in all database states Most solution approach to these problem assume static databases

6 2001/12/186/50 Objective To discover robust knowledge that is unlikely to become inconsistent with new database states To presents an efficient approach to the estimation and use of the new measure

7 2001/12/187/50 Terminology Robustness can be defined as the probability that the knowledge is consistent with a database state This paper considers relational databases, which consist of a set of relations

8 2001/12/188/50 Terminology( cont.) Horn-clause rules that express the regularity of data To literals defined on database relation as database literal and literals on built-in relations as built-in literals Range rule  built-in literal Relational rule  database literals

9 2001/12/189/50 Terminology( cont.) A database state at a given time t is the collection of the instances present in the database at time t To use the close-world assumption( CWA) to interpret the semantic of a database state A rule is consistent with a database state if all variable instantiations that satisfy the antecedents of the rule also satisfy the consequent of the rule

10 2001/12/1810/50 Terminology( cont.)

11 2001/12/1811/50 Robustness of knowledge Definitions of robustness Estimating robustness Templates for estimating robustness Empirical demonstration of robustness estimation

12 2001/12/1812/50 Definitions of robustness Definition 1(Robustness for all states) Given a rule r, let D be the event that a database is in a state that is consistent with r. The robustness of r is Robust 1 (r)=Pr(D) Two problems: treats all database states are equally probable and possible database states is intractably large Robust 1 (r)= # of database states consistent with r # of all possible database states

13 2001/12/1813/50 Definitions of robustness( cont.) Definition 2( Robustness for accessible states) Given a rule r, a database in a state denoted as d, in which r is consistent. New database states are accessible from d by performing transactions. Let t denote the event of performing a transaction on d that result in new database state inconsistent with r

14 2001/12/1814/50 Definitions of robustness( cont.) The robustness of r in accessible states from the current state d is

15 2001/12/1815/50 Definitions of robustness( cont.) Corollary 3 If r is consistent with d, and if new database states are accessible from d only by performing transaction, and all transaction are equally probable, then Robust 1 (r)=Robust(r|d)

16 2001/12/1816/50 Definitions of robustness( cont.) Example to reach a state inconsistent with r d1  delete ten tuples d2  delete one tuple Robust(r|d1) > Robust(r|d2)

17 2001/12/1817/50 Estimating robustness Laplace law of succession Given a repeatable experiment with an outcome of one of any of k classes experiment n times r of which have resulted in some outcome C The probability that the outcome of the next experiment will be C can be

18 2001/12/1818/50 Estimating robustness( cont.) m-probability Let r, n, C be as laplace law Pr(C) is probability that has an outcome C m is an adjusting constant that indicates our confidence in Pr(c) The probability that the outcome of the next experiment will be C can be

19 2001/12/1819/50 Estimating robustness( cont.) Laplace law is a special case of m-probability with Pc(c)=1/k and m=k To estimate the robustness of a rule based on the probability of transactions that may invalidate the rule To decomposed into the transactions of deriving a set of invalidating transaction and estimating the probability of those transactions

20 2001/12/1820/50 Estimating robustness( cont.) example

21 2001/12/1821/50 Estimating robustness( cont.) T1, T2 and T3 are mutually exclusive with each other and these cover all possible transactions will invalidate R2.1

22 2001/12/1822/50 Estimating robustness( cont.) To decompose the transaction into more primitive statements and estimate their local probabilities. The decomposition is based on a bayesian network model

23 2001/12/1823/50 Estimating robustness-example X1: a tuple is updated X2: a tuple of geoloc is updated X3: a tuple of geoloc, whose ?country=“malta”, is updated X4: a tuple of geoloc whose ?latitude is updated X5: a tuple of geoloc whose ?latitude is updated to a new value less than 35.89

24 2001/12/1824/50 Estimating robustness-example( cont.) X1: a tuple is updated t u is the number of pervious updates t is the total number of pervious transactions If no information is available, assume t u =t=0

25 2001/12/1825/50 Estimating robustness-example( cont.) X2: a tuple of geoloc is updated R is the number of relations in the database t u,geoloc is the number of updates made to tuples of relation geoloc

26 2001/12/1826/50 Estimating robustness-example( cont.) X3: a tuple of geoloc, whose ?country=“malta”, is updated G is the size of relation geoloc I a3 is the number of tuples in geoloc satisfy ?country=“malta” T u,a3 is the number of updates made on the tuples in geoloc that satisfy ?country=“malta”

27 2001/12/1827/50 Estimating robustness-example( cont.) X4: a tuple of geoloc whose ?latitude is updated A is the number of attributes of geoloc T u,geoloc,latitude is the number of updates made on the latitude attribute of the geoloc relation

28 2001/12/1828/50 Estimating robustness-example( cont.) X5: a tuple of geoloc whose ?latitude is updated to a new value less than no information available with range information

29 2001/12/1829/50 Templates for estimating robustness The templates allow the system to automatically estimate the robustness of knowledge Parameters of these equations can be evaluate by accessing database schema or transaction log

30 2001/12/1830/50 Templates for estimating robustness( cont.)

31 2001/12/1831/50 Templates for estimating robustness( cont.)

32 2001/12/1832/50 Empirical demonstration of robustness estimation

33 2001/12/1833/50 Empirical demonstration of robustness estimation( cont.)

34 2001/12/1834/50 Empirical demonstration of robustness estimation( cont.)

35 2001/12/1835/50 Empirical demonstration of robustness estimation( cont.) Definition 4(probability of consistency) Given a rule r, a database state d and a set of n transactions, the probability of consistency for a rule r after applying n transactions to the database state d is defined

36 2001/12/1836/50 Empirical demonstration of robustness estimation( cont.)

37 2001/12/1837/50 Applying robustness in knowledge discovery Using robustness alone is not enough to guide the discovery Use robustness together with other measures of usefulness One of the measure of usefulness is applicability A pruning discovered rule is both highly applicable and robust

38 2001/12/1838/50 Applying robustness in knowledge discovery( cont.) A rule is more applicable if it is shorter To dividing a learning process into a two- stage rule construction and rule pruning Specification of rule pruning Take a machine-generated rule as input which is consistent with a database but overly-specific Remove antecedent literals of the rule so that it remains consistent but is short and robust

39 2001/12/1839/50 Applying robustness in knowledge discovery( cont.) To search for a subset of antecedent literals to remove until any further removal will yield an inconsistent rule To present a beam-search algorithm to trim the search space Two property-robustness and length

40 2001/12/1840/50 Applying robustness in knowledge discovery( cont.)

41 2001/12/1841/50 Applying robustness in knowledge discovery( cont.) The pruner removes the pruned rules that are inconsistent or dangling literal in the rule To identify an inconsistent rule, the pruner can consult the database directly A set of literals are dangling if the variables occurring in those literals do not occur in any other literals in a rule

42 2001/12/1842/50 Applying robustness in knowledge discovery( cont.) To ensure removing a database literal L does not yield dangling literals, L must satisfy following No built-in literal in the antecedents of the rule is defined on the variables occurring in L If a variable occurring in the consequent of r also occurs in L, this variable must occurs in some other database literals in the rule Removing L from the rule does not disconnet existing join paths between any database literals in the rule

43 2001/12/1843/50 Applying robustness in knowledge discovery( cont.)

44 2001/12/1844/50 Applying robustness in knowledge discovery( cont.)

45 2001/12/1845/50 Applying robustness in knowledge discovery( cont.)

46 2001/12/1846/50 Experimental results To used the rule discovery system BASIL Two large ORACLE relational databases 123 synthesized transactions contains 27 updates, 29 deletions and 67 insertions

47 2001/12/1847/50 Experimental results Experiment design Train BASIL to discover a set of rules and estimate their robustness Exhaust its search space during the rule discovery and generated 355 rules. Meanwhile BASIL estimated the robustness of rules with another 202 sample transactions Use the 123 transactions to generate a new database state

48 2001/12/1848/50 Experimental results( cont.) Check if high robust rules have a better chance to remain consistent with the data in the new database state

49 2001/12/1849/50 Conclusion and future work To formalize the notion of the robustness against database changes Applying approaches to a variety of KDD applications in database management To improve the precision of the robustness estimation by refining the estimation templates to prevent overestimating

50 2001/12/1850/50 Opinion Applying this approach in reliability test


Download ppt "2001/12/181/50 Discovering Robust Knowledge from Databases that Change Author: Chun-Nan Hsu, Craig A. Knoblock Advisor: Dr. Hsu Graduate: Yu-Wei Su."

Similar presentations


Ads by Google