From Prolog via Datalog to O-Telos

From Prolog via Datalog to O-Telos
source: last update: From Prolog via Datalog to O-Telos This lecture bridges from classical first-order logic to O-Telos, our main formalism to represent modeling languages. Key ingredients are the minimal Herbrand interpretation and the four IRDS abstraction levels (data, models, notations, notation definition). It should also be noted that most statements about modeling languages shall have a factual nature (like tuples in a database). Logical formulas come into play when specifying syntactic and semantic properties.

Our running example = set interpretation logic interpretation
Employee  emp-id  emp-name  salary emp-id  ID emp-name  String salary  Integer Project  prj-id  budget prj-id  ID budget  Integer worksOn  emp-id  prj-id Our running example Employee set interpretation worksOn logic interpretation Project = forall e,p worksOn(e,p) ==> exists n,s Employee(e,n,s) exists b Project(p,b) ... Employee(e1,’Bill’,10000) Employee(e2,’Mary’,50000) Employee(e3,’John’,30000) Project(p1,400000) Project(p2,50000) worksOn(e1,p1),worksOn(e2,p1) worksOn(e2,p2) quantified formulas We can check whether the formulas generated for the example diagram are consistent with some database. However: the formulas are no Horn clauses? Or are they? The mapping to the set interpretation is not equivalent to the mapping to the logic interpretation. For example, the first clause of the set interpretation would correspond to the following logic interpretation forall e,n,s Employee(e,n,s) ==> empid(e) and emp-name(n) and salary(s) We do not further elaborate on whether or not equivalence is desirable. We shall instead focus on the logic interpretation since it will yield automatically also a set interpretation (see subject 'minimal Herbrand interpretation') in subsequent slides. facts representing some data base

.. following slides are formal definitions; only
included for completeness; we'll introduce the essence of the ideas by examples

Translating to Prolog (Horn clauses)
the quantified formulas so far are not all in Horn clause form. Lloyd & Topor (1984): translate a first-order logic formula to a set of Horn clauses. Consider the following example: Prolog predicate logic inconsistent :- worksOn(_e,_p), NOT Employee(_e,_n,_s) forall e,p worksOn(e,p) ==> exists n,s Employee(e,n,s) forall x,y A(x,y) ==> B(x) and C(y) inconsistent :- A(_x,_y), NOT B(_x). A(_x,_y), NOT C(_y). So, in general several Prolog clauses are generated for one predicate logic formula Consistency: We assume that (not inconsistent) is part of any theory that we investigate. ‘inconsistent’ is a special predicate symbol with arity 0. It may never be true!

Datalog-neg: a decidable subset of Prolog
only allow function symbols with arity 0 (constants); analogous to 1NF in databases use 'negation-by-failure' principle Safeguard negation by ‘range-restriction’ Safeguard recursion by ‘stratification’ Result is a subset of Prolog where each derivation will terminate after a finite number of steps. This is not the case with Prolog! Hence: Datalog-neg is a good candidate for implementing a method engineering system - Semantics based on logic (actually we consider only ‘minimal MODELS’) - Efficient derivation of consequences of the models possible - Datalag-neg is the foundation of ConceptBase Datalog-neg is syntactically a subset of Prolog. It basically forbids terms (functions) as arguments of predicates. By this restriction, all predicates have a finite interpretation. Specifically, the example with succ in lecture 2 is not representable in Datalog-neg. A good implementation of Datalog-neg is provided by the DES system (

Range Restriction We now allow Horn clauses to have negated predicates (NOT: negation by failure) in their condition part, i.e. P(x1,...,xn) :- P1(x11,....), ... Pk(xk1,....), NOT Q1(y11,...), NOT Qm(ym1,....). Such a Horn clause is called range-restricted iff any variable xi and any variable yij occurs also in some Pr (1≤r ≤k). The occurrence of the variable yij (xi resp.) is restricted to those values which make Pr(...,yij,...) true.

Transforming to range-restricted formulas (1)
inconsistent :- worksOn(_e,_p), NOT Employee(_e,_n,_s) inconsistent :- worksOn(_e,_p), String(_n), Integer(_s), NOT Employee(_e,_n,_s) 'domain predicates' problematic variables The range restriction allows to consider only ‘ground’ negated goals. The resolution calculus is forced to evaluated first the non-negated predicates. After that, the ‘problematic variables’ are bound to constants. The transformed formula is NOT logically equivalent to the formula on the left hand side! Instead, it is a corrected formula which fulfils range-restriction. We assume that the domain predicates 'String(_x)' and Integer(_x) are interpreted by finite sets (strings and integers currently stored in the database).

Transforming to range-restricted formulas (2)
inconsistent :- worksOn(_e,_p), NOT Employee(_e,_n,_s) inconsistent :- worksOn(_e,_p), NOT Employee1(_e). Employee1(_e) :- Employee(_e,_n,_s). problematic variables This is another alternative to deal with unbound variables in negated predicates. This alternative is not requiring the auxiliary domain predicates. ConceptBase realizes a mixture of both approaches. First, it adds domain predicates to make sure that any variable is positively bound. Then, it removes redundant domain predicates (=guaranteed by the other predicates in the condition). Then, it treats negated predicates with unbound variables like shown on this slide.

Herbrand Interpretations
Up to now we have interpreted logical theories by relations (for the predicates) and by functions for the function symbols. Herbrand interpretations interpret terms by themselves, i.e. each constant symbol like ‘12’ is interpreted by itself and each variable-free term is interpreted by itself. For example ‘12+3’ is interpreted by ‘12+3' and not by ‘15’. Predicates like P(x,y) are interpreted by sets like {P(1,2),P(45,7),P(12+3,null),...} It’s sufficient to consider Herbrand Interpretations in the ‘proof by contradiction’ fashion of Prolog. If a theory is inconsistent with a goal query G, then this can be proven by just considering Herbrand Interpretations.

Minimal Interpretations
Def.: An interpretation M for a theory  is called minimal if no subset of M is an interpretation for . A minimal interpretation is preferred as interpretation of a theory because it leaves out all assumptions that are not necessary to explain the theory. {student(mary), human(mary)} M1 student(mary). human(_x) :- student(_x). = {student(mary), student(karl), human(mary),human(karl), human(peter)} M2 M2 is not minimal because M1 is a subset and is also an interpretation. M1 is minimal because any proper subset is not an interpretation of .

Minimal Herbrand Interpretations
Goal: define a unique interpretation of a Datalog-neg theory This unique interpretation is the minimal set of factual consequences of the Datalog-neg theory. So, it is encoding what we minimally have to agree upon when looking at a given Horn clause theory. As we map models (like ERD models) to Datalog-neg theories, the unique interpretation is what we must agree upon when analyzing a model. Unfortunately, we need to make some assumptions on the well-structuredness of the Datalog-neg theory. The first assumption is range-restrictedness, the second one is more tricky: stratification. Of course, a Datalog-neg theory can have in general arbitrarily many interpretations because it is a logical theory. However, we are interested subsequently only in minimal Herbrand interpretations. The restriction to Herbrand interpretations means that we cannot assign specific meanings to the constants occurring in the formulas. Any constant is interpreted by itself. so, the constant '1' is just the sign '1', not the number 1. The restriction to minimal interpretations means that we do not allow interpretations that add information that is not encoded in the theory. The ConceptBase system deviates to a certain degree from the restrictions. In particular, it will also allow to use terms like x+1 and even let you define your own functions. This directly leads to undecidability however. So, only use this capability if you really need it!

Stratification: Reason
Consider the following variable-free formula: P or Q which is expressed by two Horn clauses P :- not Q Q :- not P So, when Q is not true P will be true and vice versa There are two ‘minimal’ interpretations of the above theory: M1= {P} M2= {Q} We do not want this ambiguity in DATALOG! But: How to avoid it?

Stratification: Realization
Each predicate symbol gets a level number 0,1,2,... If there is a clause r in the theory, then each positive predicate in the condition must have a level smaller or equal to the level of the conclusion predicate. For each negative predicate in the condition of a clause r, the level of the conclusion predicate must be stricly greater than the level if the negated predicate If such an assignment of levels can be found, the theory is called stratified. P(x1,...,xn) :- P1(x11,....), ... Pk(xk1,....), NOT Q1(y11,...), NOT Qm(ym1,....). level equal or smaller as level of P level smaller than level of P Theorem: a stratified Horn clause theory always has a unique minimal Herbrand Interpretation (called the perfect interpretation or perfect model).

Stratification: It fails for paradoxons (but not only there)
P(_a) :- Q(_a) and NOT P(_a). P occurs negatively in the clause. Hence it P must have a level smaller than the level of the conclusion predicate. That’s also P. Hence, we cannot find a stratification for this theory. Famous example: The barbier barbs all males who do not barbe themselves: shaves(barbier,_x) :- male(_x), NOT shaves(_x,_x). male(barbier). This paradoxon is luckily not stratifiable. So intuitively, non-stratifiable theories are of a paradoxical nature. Subsequently, we will only consider stratifiable Datalog-neg theories.

Stratification: Another famous example (Russel's paradoxon)
Consider the set M which is the set of all sets that do not contain themselves: M={S| S ∉ S} In Horn clause logic: element(_s,M) :- set(_s), NOT element(_s,_s). set(_s1) :- element(_s,_s1). So is M an element of itself? If yes: M may not contain itself. Contradiction. If no: M must contain itself. Contradiction. See also:

Lack of stratification is not always a bad thing
Unstratified Datalog-neg theories are not always paradoxical. Consider the following general definition of win positions in games*: win(_x) :- move(_x,_y), NOT win(_y). So, a position _x is a win position if there is a move to a position _y that is not a win position. This is clearly not stratified but if there are no cycles in the moves, then this definition actually can compute all win positions of a finite game. ConceptBase does support the mechanism of so-called 'dynamic' (or local) stratification. It will compute the interpretation of a non-stratified theory and report a stratification errors only when it is found during the computation of the interpretation. If the data (here: facts for predicate move(_x,_y)) is acyclic, then no stratification violation will be found and the answer is correct. For the purpose of this course, we can assume the classical ('static') stratification. See also: *) Weidong Chen, David Scott Warren: Computation of Stable Models and its Integration with Logical Query Processing, TKDE 8(5). 1996

“Bottom-up” computation of the perfect interpretation
Assume we have a stratified, range-restricted Datalog-neg theory. Then, it has a unique minimal Herbrand interpretation (called 'perfect interpretation'). This Herbrand Interpretation can be computed by a so-called fixpoint algorithm: Init: Set F={} For all stratification levels i=0,1,...: Do until no more facts are added F = F ∪ {ground facts of a predicate P of level i For all clauses r with a conclusion predicate P of level i substitute the positive predicates by matching facts of F evaluate the negated (NOT) predicates of r on F using negation-as-failure * apply substitution to conclusion predicate P and insert any such new conclusion of P into F end Do end For all This is the algorithm that effectively computes the minimal Herbrand interpretation. So, we do not have to assemble it ourselves. We can just let the computer evaluate it. This is very much like answering a query to a database system. We have to be aware however, that the minimal Herbrand interpretation is just containing facts. The algorithm is not suitable to prove that a complex formula f1 is a logical consequence of a Datalog-neg theory. In other words, the algorithm is not meant for reasoning about Datalog-neg theories. In practical terms, we will be able to check that a given data level is fulfilling the constraint of a given model level. But we cannot automatically show that a given model level has at least one data level that fulfills its constraints. So, we could define a class Employee and demand in a constraint that the employee must have at least 2 and at most 1 project. Whenever we try to enter an instance of that class, we shall get an error message. But ConceptBase will not tell us that the formula itself enforces that Employee is an 'empty' concept. It can be shown that this algorithms stops after a finite number of steps *) Note that the substitution in the step before substitutes all variables of the negated predicates because of range-restriction. Hence, we just have to check that the negated facts are not in F.

Expressiveness of Datalog-neg
We now have Datalog-neg as a logical language whose unique interpretation can be computed by an effective algorithm If we have a model like the running ERD example and we can map it into a Datalog-neg theory, then we get the perfect interpretation semantics of it for free. Note however: The semantics of some modeling languages like STD or Structured English or even JAVA is beyond the expressiveness of Datalog-neg. A JAVA program denotes a transformation of some input data to some output data. The relation computes(input,output) of some arbitrary JAVA program cannot be in general expressed in Datalog. Example: Computation of the square root of a number with a given precision. Still, we can represent some (syntactic) properties of such modeling languages, e.g. its lines of code or the program variables that occur in a JAVA program.

end of formal stuff ... so lets see what's behind all
these definitions

Herbrand interpretation (a.k.a. Herbrand model)
In the preceding lecture we interpreted 'predicates' like LiesUpon(x,y) by tables LiesUpon-Table arg1 arg2 mypen1 mydesk1 In principle, every person can have another table in her mind as interpretation for a predicate. A Herbrand interpretation is more restricted: only those objects are allowed to occur in the table that are also occurring in some formula as a constant. This is no longer the full power of logical interpretation since not all logical interpretations are considered. However, we still can do 'proof by contradiction': if there is a contradiction in a Herbrand interpretation, then the logical formulas are INCONSISTENT. “One counter-example is sufficient.”

How to write Herbrand interpretations
Instead of LiesUpon-Table arg1 arg2 mypen1 mydesk1 mypen2 mydesk1 we write LiesUpon(mypen1,mydesk1) LiesUpon(mypen2,mydesk1) This is a Herbrand interpretation consisting of two tuples.

Minimal Herbrand Interpretations
human(mary) human(peter) forall x human(x) ==> mortal(x) 3 formulas forming a logical theory human(mary) human(peter) mortal(mary) mortal(peter) mortal(jane) example Herbrand interpretation in which the above formulas are true; it contains some facts for mortal! human(mary) human(peter) mortal(mary) mortal(peter) A minimal Herbrand interpretation which still makes all above formulas true but no proper subset of it would do. Luckily, this Herbrand interpretation can be computed automatically in the case of Datalog-neg!

Hence ... If we are able to express our definitions in the framework of Datalog with negation, then we get at least one interpretation (the minimal Herbrand interpretation) for free! The minimal Herbrand interpretation is kind of the least common denominator. It makes the least assumptions about the reality. Instead it only says what are the unavoidable consequences of the definitions (e.g. the definitions in an entity-relationship diagram). Why bother? SQL (without COUNT/SUM) is a subset of a logic with Herbrand interpretation. So any database system implements this logical interpretation. The answer to a query is the result of computing the 'Herbrand interpretation' of the query plus the current database state. Going beyond minimal Herbrand interpretation is harmful (difficult to implement), doing less is also harmful (too inexpressive). Minimal Herbrand interpretations are well understood and there are plenty of tools based on this foundation.

Still: Why bother? “Let's make our method definitions in natural language.” If you are precise, then your definitions may be as good as if expressed in logic. However, your conclusions about the definitions are debatable and depend on the interpretation of your language. Creating a prototypical tool implementing your method is a huge programming task. “Let's make our method definitions in a more powerful framework.” Most 'more powerful' frameworks are yielding method definitions which contain undecidable expressions. A human expert (mathematician) has to take care of the consistency of the definitions. So: Datalog with negation (with its minimal Herbrand interpretation) is just as expressive as it can be to still be 'implementable' and allowing generation of modeling environments.

But ... ... we will feel the limitations of this framework when we talk about dynamic models like business process models ...

Syntax versus Semantics
Our Datalog-neg version of the ERD example defines what semantics we have in mind for the diagram. It does however NOT specify what ERD diagrams are syntactically allowed. ERD example Datalog-neg-theory of ERD example Model Data example data Service: check that the Datalog-neg theory is consistent, i.e. that ‘inconsistent’ cannot be derived. Question: How can we define the symbols and the general rules of the ER notation??? Use abstraction based on logic!

Concrete and abstract statements
The statement at the token level is an instance of the statement at the class level which is an instance of the statement at the meta class level and so on. Goal: Map this to logic.

The Information Resource Dictionary System (ISO IRDS 1990)
Contains the facilities to define a notation (formal language for modeling/design/...) M3 Language definition level (meta meta classes) Contains the definition of notations using the facilities of the above level Language level (meta classes) M2 Contains models/schemata/ programs written in a certain notation Schema/Model level (classes,schema) M1 Contains example data/process traces conforming the schemata/ models of the above level Data/Production level (tokens/data) M0 N.B.: We sometimes use the term 'notation' as a synonym to 'language'.

The ERD Notation in IRDS levels
M3: Meta meta model Nodes and links ... defines node and link types in, is example of ... defines entity types etc. by distinct node and link types M2: ER Modeling language role Relationship Type EntityType in, is example of ... defines the conceptual schema of a database in ER notation Employee M1: ER Diagram check syntax worksFor Project in, is example of miller ... contains example data conforming the schema wf-123 M0: Data proj-17 proj-45 check semantics

Natural language representation (1)
Node connectedTo EntityType Relationship Type role We are interested in modeling notations that can be represented as graphs (nodes connected to each other via links). The simplified ER notation is as follows: there are two node types (entity type and relationship type) and one link type (role). EntityType is an example of Node. RelationshipType is an example of Node. The role link is an example of the connectedTo link. When we say that one IRDS level is an example of the other, then this applies to the concepts contained in them!

Natural language representation (2)
Employee This ER diagram says: we have entity types “Employee” and “Project”. There is a relationship types “worksFor” which is connected to both. worksFor Project miller Employee is an example of EntityType. Project is an example of EntityType. WorksFor is an example of RelationshipType. The two links of worksFor are both examples of the role link. wf-123 proj-17 proj-45 The data encodes: employee “miller” stands in relationship “wf-123” to “proj-17” and “proj-45”. Miller is an example of Employee. Proj-17,Proj-45 are examples of Project. The link between wf-123 and miller is an example of the link between worksFor and Employee. The other two links are examples of the link between worksFor and Project. We can automatically check whether the data-level facts are consistent with the ERD by using the perfect model semantics of Datalog-neg.

User types communicating via IRDS levels
learn Node connectedTo Method engineers define modeling languages. create Method engineer role Relationship Type learn EntityType Application engineers apply a notation to produce models. create Employee worksFor Application engineers learn Project Application users work according to models, e.g. interface descriptions. create miller wf-123 Application users proj-17 proj-45

Now: How to build an IRDS-based method engineering system?
Uniformly represent information from all IRDS levels Creating a model (or a notation, or some example data) is done by appropriate updates to the repository Checking balancing rules, checking completeness, checking level of agreement is done by appropriate queries to the repository Repository query Language definition level Language level update Schema/Model level Data/Production level

Representing abstract statements is not so trivial
Employee(Bill) Integer(10000) Salary(Bill,10000) “Bill has the salary of Dollars.” EntityType(Employee) Domain(Integer) EntityAttribute(Employee,Salary,Integer) “Employees have salaries.” “Entity types are described by attributes.” Node(EntityType) Node(Domain) Link(EntityType,EntityAttribute,Domain) This is no longer 1st order logic and thus also not Datalog-neg since predicate symbols occur as terms in other predicates.

O-Telos: Don’t use predicate symbols for classes!
Employee(Bill) Integer(10000) Salary(Bill,10000) In(Bill,Employee) In(10000,Integer) AL(Bill,salary,earns,10000) EntityType(Employee) Domain(Integer) EntityAttribute(Employee,Salary,Integer) In(Employee,EntityType) In(Integer,Domain) AL(Employee,ent_attr, salary,Integer) Node(EntityType) Node(Domain) Link(EntityType,EntityAttribute,Domain) In(EntityType,Node) In(Domain,Node) AL(EntityType,connectedTo, ent_attr,Domain) not 1st order logic Datalog

O-Telos: Infix notation for predicates
In(Bill,Employee) In(10000,Integer) AL(Bill,salary,earns,10000) (Bill in Employee) (10000 in Integer) (Bill salary/earns 10000) In(Employee,EntityType) In(Integer,Domain) AL(Employee,ent_attr, salary,Integer) (Employee in EntityType) (Integer in Domain) (Employee ent_attr/salary Integer) In(EntityType,Node) In(Domain,Node) AL(EntityType,connectedTo, ent_attr,Domain) (EntityType in Node) (Domain in Node) (EntityType connectedTo/ ent_attr Domain) Prefix mode Infix mode

O-Telos: Graphical representation
connectedTo O-Telos: Graphical representation Node in in (EntityType in Node) (Domain in Node) (EntityType connectedTo/ ent_attr Domain) ent_attr EntityType Domain (Employee in EntityType) (Integer in Domain) (Employee ent_attr/salary Integer) salary Employee Integer (Bill in Employee) (10000 in Integer) (Bill salary/earns 10000) earns Bill 10000 same information as graph O-Telos facts

Instantiation and attribution in O-Telos
We need to represent data, schema, notation etc. uniformly as objects! (x in c) c “the object x is an instance of the object c” in x (x m/n y) (x!n in c!m) m “the object x has a relation labeled n to y ; this relation is an instance of the class-level relation m” c d n x y this denotes the link n of object x

Specialization in O-Telos
d “c is a subclass of d” (c isA d) c isA we want to achieve that any instance of c is automatically also an instance of d * in x Example: Employee (Manager isA Employee) (Mary in Manager) isA Manager *) This will be done by the pre-defined deductive rule forall x,c,d (x in c) and (c isA d) ==> (x in d) in Mary

Anything is an object! Nodes as well as links that occur in the graphical representation are regarded as objects.

Object references Node, Node!connectedTo
in in EntityType, EntityType!ent_attr, Domain ent_attr EntityType Domain Employee, Employee!salary, Integer salary Employee Integer Bill->Employee, (Bill!earns)->(Employee!salary) 10000->Integer earns Bill, Bill!earns, 10000 Bill 10000

Object references are not predicates!
The object references are ‘constants’ in our logic, not predicates! They can however occur in predicates like (Bill in Employee) (Bill!earns in Employee!salary) The ability to refer to any object (node or link) allows us to define attributes of attributes like: salary Employee Integer when Date

From Prolog via Datalog to O-Telos

Similar presentations

Presentation on theme: "From Prolog via Datalog to O-Telos"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Prolog via Datalog to O-Telos

Similar presentations

Presentation on theme: "From Prolog via Datalog to O-Telos"— Presentation transcript:

Similar presentations

About project

Feedback