Presentation is loading. Please wait.

Presentation is loading. Please wait.

A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.

Similar presentations


Presentation on theme: "A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation."— Presentation transcript:

1 A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation Database Systems PANDA

2 P. Vassiliadis. PANDA Meeting, Milano, 18 April 20022 Overview General Understanding of the PBMS Mathematical Background MetaModel: Entities and Language The Software Engineering Perspective Conclusions

3 P. Vassiliadis. PANDA Meeting, Milano, 18 April 20023 Overview General Understanding of the PBMS Mathematical Background MetaModel: Entities and Language The Software Engineering Perspective Conclusions

4 P. Vassiliadis. PANDA Meeting, Milano, 18 April 20024 General Framework Meta-Pattern Type + Patter Types = PBMS Catalog Pattern Layer = PBMS Content Raw Data Cluster 3 Cluster 2 Cluster 1 Assoc. Rule n Assoc. Rule 2 Assoc. Rule 1 Decision Tree 1 Ass. Rule Algorithm Dec. Tree Algorithm DBSCAN Cluster Algorithm belong to belongs to belong to Association Rule Type DBSCAN Cluster Type Decision Tree Type belong to Meta_Pattern Type PBMS Pattern Type Layer Meta-Pattern Type Layer Language

5 P. Vassiliadis. PANDA Meeting, Milano, 18 April 20025 General Idea Meta-Pattern Type+ LanguageRelation + Language a Name a Condensed Expression an Extension and Language a Name a Schema an Extension and Relational Calculus Pattern TypeRelational Table AssociationRuleType head :- body ext(AssociationRuleType) Buys session_id,date,item, price ext(Buys) PatternTuple Buys(x,_,beer,_):- Buys(x,_,pampers,_) Buys(34,4/4/2002,beer,2)

6 P. Vassiliadis. PANDA Meeting, Milano, 18 April 20026 Overview General Understanding of the PBMS Mathematical Background MetaModel: Entities and Language The Software Engineering Perspective Conclusions

7 P. Vassiliadis. PANDA Meeting, Milano, 18 April 20027 Mathematical Background Assumptions from the definition: There exists a data space and a pattern space. There always exist M:N relationships among data and patterns. Data Space Pattern Space

8 P. Vassiliadis. PANDA Meeting, Milano, 18 April 20028 Characteristics of data and pattern space Each data item is characterized by a finite number of features N. dom(x) the domain of each feature. Data space D N  dom(A 1 )x…xdom(A N ) Proposal: all dom(x) are infinitely countable + consider cases for D N (whether it is finite or not). Each pattern is characterized by a finite number of features M. Pattern space D M  dom(A 1 )x…xdom(A M ) Proposal: all dom(x) are infinitely countable + D M is clearly finite.

9 P. Vassiliadis. PANDA Meeting, Milano, 18 April 20029 Statistical Measures The data-pattern relationship f DP has: participation measures for the relationship; importance measures for a data item; importance measures for a pattern. Data Space Pattern Space

10 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200210 Statistical Measures Richness of representation = relationships captured by the condensed representation total number of relationships Compactness of the representation = size(D M )*M size(D N )*N

11 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200211 Overview General Understanding of the PBMS Mathematical Background MetaModel: Entities and Language The Software Engineering Perspective Conclusions

12 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200212 General Framework Meta-Pattern Type + Patter Types = PBMS Catalog Pattern Layer = PBMS Content Raw Data Cluster 3 Cluster 2 Cluster 1 Assoc. Rule n Assoc. Rule 2 Assoc. Rule 1 Decision Tree 1 Ass. Rule Algorithm Dec. Tree Algorithm DBSCAN Cluster Algorithm belong to belongs to belong to Association Rule Type DBSCAN Cluster Type Decision Tree Type belong to Meta_Pattern Type PBMS Pattern Type Layer Meta-Pattern Type Layer Language

13 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200213 Pattern Types Intentional Description of a Pattern Type as follows: –PID –Explicit Relationship: f DPi :D N → D i M. –Relationship Expression –Statistical Measures. Extensional Description (or Pattern Extension) of a Pattern Type : a finite set of patterns Data extension of of a Pattern Type : a countable? set of data items

14 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200214 Example Pattern Type Intentional Description [small part of] Pattern Type Extensional Description PID Explicit Relationship Relationship Expression Statistical Measures PID123 f DPi :D N →D i M ={(PID123,RID124),…} Buys(x,_,beer,_):- Buys(x,_,pampers,_) Coverage=80%, Confidence=90%

15 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200215 General Framework Meta-Pattern Type + Patter Types = PBMS Catalog Pattern Layer = PBMS Content Raw Data Cluster 3 Cluster 2 Cluster 1 Assoc. Rule n Assoc. Rule 2 Assoc. Rule 1 Decision Tree 1 Ass. Rule Algorithm Dec. Tree Algorithm DBSCAN Cluster Algorithm belong to belongs to belong to Association Rule Type DBSCAN Cluster Type Decision Tree Type belong to Meta_Pattern Type PBMS Pattern Type Layer Meta-Pattern Type Layer Language

16 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200216 Meta-Pattern Types Intentional Description of a Pattern Type as follows: –Name –Condensed Expression –[Meta]Statistical Measures. –?? Schema Attributes ?? Extensional Description of a Meta-Pattern Type : a finite set of pattern types

17 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200217 Example Meta-Pattern Type Intentional Description [small part of] Meta-Pattern Type Extensional Description Name Condensed Expression [Meta]Statistical Measures Schema Attributes?? AssociationRuleType head :- body Coverage:Float[0..1], Confidence: Float[0..1] PID, Head, Body ?? Pattern Type Intentional Description [small part of] Pattern Type Extensional Description PID Explicit Relationship Relationship Expression Statistical Measures PID123 f DPi :D N →D i M ={(PID123,RID124),…} Buys(x,_,beer,_):- Buys(x,_,pampers,_) Coverage=80%, Confidence=90%

18 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200218 Which language to choose? Relational Calculus, Datalog and Stratified Datalog ? –Powerful but not elegant for all the patterns that we might want to express… Constraint database approach ? –We cannot guarantee a finite representation of the result for non-linear constraints…

19 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200219 Which language to choose?

20 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200220 Which language to choose? Remove recursion ? –Cannot express interesting patterns like transitive closure… Only linear constraints ? –Cannot express interesting patterns like cyclic clusters… –Approximation of polynomials through sets of linear constraints ? Not elegant… Forget constraints and describe every pattern type as a simple predicate ? –Loss of all the declarative information on the nature of the pattern type … So, what to do? Possible dead-end due to the paradigm?

21 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200221 Overview General Understanding of the PBMS Mathematical Background MetaModel: Entities and Language The Software Engineering Perspective Conclusions

22 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200222 How to build it? Each of the pattern types implemented as a Class. The different pattern types defined as specializations of a Generic Pattern Class. Treat pattern types as predicates, with semantics computed by a computationally complete procedural language [e.g., PL/SQL, C++, …]? –Instead of fundamental research we turn to feasibility issues… What about behavior?

23 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200223 General Framework Meta-Pattern Type + Patter Types = PBMS Catalog Pattern Layer = PBMS Content PBMS Cluster 3 Cluster 2 Cluster 1 Assoc. Rule n Assoc. Rule 2 Assoc. Rule 1 Decision Tree 1 IN Association Rule Class Cluster Class Decision Tree Class ISA Generic Class Set of DDL/DML Languages How to build it?

24 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200224 Overview General Understanding of the PBMS Mathematical Background MetaModel: Entities and Language The Software Engineering Perspective Conclusions

25 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200225 Conclusions Followed the Datalog paradigm (need for deductive capabilities) enhanced with constraints (need for elegance) Reduced the problem to the specification of a proper language for the description of pattern types Fundamental language limitations when considered constraints Dilemma: –Change paradigm? –Stick with this paradigm and focus on engineering issues? –…Any other suggestions ?…

26 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200226 Thank you …

27 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200227 Definitions from the minutes of Athens meeting Pattern is a compact and rich in semantics representation of raw data. A Pattern-Based Management System (PBMS) is a system for handling (storing / processing / retrieving) patterns extracted from raw data in order to efficiently support pattern matching and to exploit pattern- related operations generating intentional information.

28 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200228 Issues around the pattern definition The mapping from original raw data space to less populated (  compact) pattern space is always possible preserving (or, documenting) as much knowledge as possible from raw data space (  rich in semantics). A M:N mapping between raw data space and pattern space is permitted Perhaps, several levels of representation / abstraction exist (different levels of granularity, multi- dimensionality, recursion, hierarchies, etc.)

29 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200229 Issues around the PBMS definition A PBMS will cooperate with a DBMS storing raw data; A PBMS processes different kinds of queries (because of different user needs) on raw data and returns more intuitive results to users; A PBMS is useful in order to process those queries more efficiently than a normal DBMS would do; A PBMS will have its own mechanisms for representing and storing its entries (patterns), posing and processing queries, efficiently retrieving its entries.

30 P. Vassiliadis. PANDA Meeting, Milano, 18 April 200230 Query Language Issues Given a datum, which pattern does it refer to? Which are the data that correspond to this pattern? Zoom-in, zoom-out a pattern. Pattern union, difference. Composition of patterns (i.e., if A  B and B  C, then derive A  C). What are values of the statistical measures for this pattern? Which patterns fulfill a certain constraint on a statistical measure? Which are the patterns in the PBMS catalog? Which are the attributes or the statistical measures for this pattern type? Which pattern types relate to a certain statistical measure? Closed Form of the Language.


Download ppt "A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation."

Similar presentations


Ads by Google