Reasoning and Assessing Trust in Uncertain Information using Bayesian Description Logics Achille Fokoue, Mudhakar Srivatsa (IBM-US) Rob Young (dstl-UK)

Reasoning and Assessing Trust in Uncertain Information using Bayesian Description Logics Achille Fokoue, Mudhakar Srivatsa (IBM-US) Rob Young (dstl-UK) ITA Bootcamp July 12, 2010

2 Sources of Uncertainty: (Accuracy, Stochasticity and Beyond)

Decision Making under Uncertainty Coalition warfare –Ephemeral groups (special forces, local militia, Medecins Sans Frontieres, etc) with heterogeneous trust levels respond to emerging threats Secure Information Flows –Can I share this information with an (un)trusted entity? –Can I trust this piece of information? Information Flow in Yahoo!

Limitations of traditional approaches Coarse grained and static access control information –Rich security metadata [QoISN08] –Semantic knowledgebase for situation awareness (e.g., need-to- share) [SACMAT09] Fail to treat uncertainty as a first-class citizen –Scalable algorithms and meaningful query answering semantics (possible worlds model*) to reason over uncertain data [submitted] Lack of explanations –Provide dominant justifications to decision makers [SACMAT09] –Use justifications for estimating info credibility [submitted] [QoISN08: IBM-US & RHUL] [SACMAT09: IBM-US, CESG & dstl] [submitted (https://www.usukitacs.com/?q=node/5401): IBM-US & CESG] [submitted: IBM-US & dstl]

Our approach in a nutshell Goal: More flexible and situation aware decision support mechanisms for information sharing Key technical principles –Perform late binding of decisions (flexibility) –Shareability/trust in information is expressed as logical statements over rich security metadata and a semantic KB Domain specific concepts and relationships Current state of the world –Logical framework that supports explanations that Allow a sender to intelligently downgrade information (e.g., delete participant list in a meeting) Allow a recipient to judge the credibility of information

Architecture A Global Awareness Module continually maintains and updates a knowledge base encoding, in a BDL language, the relevant state of the world for our application –(e.g., locations of allied and enemy forces) A hybrid reasoner is responsible for making decisions on information flows –The reasoner provides dominant explanation(s) over uncertain data that justifies the decision This architecture is replicated at every decision center Global Awareness BDL KB BDL Reasoner BDL Reasoner Rich Metadata Rules & Policy Justifications

DL: Semantic Knowledgebase [SACMAT09: IBM-US, CESG, dstl] SHIN Description logics (OWL) –Very expressive decidable subset of first order logic –Reasoning intractable in the worst-case, but SHER (Scalable Highly Expressive Reasoner) good scalability characteristics in practice –DL KB consists of: TBox: terminology box Description of the concepts and relations in the domain of discourse. Extension of KANI ontology ABox: extensional part. Description of instance information ABox Extended KANI TBox

Traditional approaches to deriving trust from data Drawbacks of a pure DL based approach [SACMAT 09] –Does not account for uncertainty –Trust in information and sources given – not derived from data, history of interactions Limitations of traditional approaches to deriving trust in data –Assumes pair-wise numeric (dis)similarity metric between two entities: e.g., eBay recommendation, Netflix ratings –Lack of support for conflicts spanning multiple entities: e.g., 3 Sources: S1, S2,S3 Ax1 = all men are mortal Ax2 = Socrates is a man Ax3 = Socrates is not mortal –Lack of support for uncertainty in information

Bayesian Description Logics (BDL) Challenge 1: How to scalably reason over inconsistent and uncertain knowledgebase? BDL experimental evaluation on an open source DL reasoner shown to scale up to 7.2 million probabilistic axioms Pellet (a state of the art DL reasoner) broke down at 0.2 million axioms Pronto (probabilistic reasoner) uses an alternate richer formulation, but does not scale beyond a few dozen axioms Challenge 2: What is a meaningful query answering semantics for uncertain knowledgebase Possible worlds model* (concrete definition in paper)

Bayesian Description Logics (BDL) Challenge 3: How to efficiently compute justifications over uncertain data? Sampling Challenge 4: How to use justifications? Assess the credibility of information sources (trust- based decision making) Intelligently transform data to make it shareable [TBD]

Notation: Bayesian Network V: set of all random variables in a Bayesian network V = {V 1, V 2 } D(V i ): set of all values that V i can take D(V 1 ) = D(V 2 ) = {0, 1} v: assignment of all random variables to a possible value v = {V 1 = 0, V 2 = 1} v|X (for some X V): projection of v that includes random variables in X v|{V 2 } = {V 2 = 1} D(X) (for some X V): Cartesian product of domains D(X i ) for all X i in X

Notation: BDL Probabilistic knowledge base K = (A, T, BN) BN = Bayesian network over a set V of variables T = { : X = x}, where is a classical Tbox axiom; annotates with X =x X V x in D(X) e.g., Road SlipperyRoad : Rain = true A= { : X = x}, where is a classical Abox axiom : p, where p [0, 1] assigns a probability value directly to a classical axiom : X new = true, X new new independent random boolean variable

BDL: Simplified Example TBox: SlipperyRoad OpenedRoad HazardousCondition Road SlipperyRoad : Rain = true ABox: Road(route9A) OpenedRoad(route9A) : TrustSource = true BN has three variables: Rain, TrustSource, Source Pr BN (TrustSource = true | Source = Mary) = 0.8 Pr BN (TrustSource = true | Source = John) = 0.5 Pr BN (Rain = true) = 0.7 Pr BN (Source = John) = 1 Informally, the probability values computed through the Bayesian network is propagated to the DL side as follows

BDL: Simplified Example Primitive event e: Each assignment v for all random variables in BN (e.g., {Rain = true, TrustSource = false, Source = John}) corresponds to a primitive event e (or a scenario or a possible world) Each primitive event e is associated with A probability value (Pr BN (V=v)) through BN and to a set of classical DL axioms (K e ) annotated with compatible annotations (e.g., SlipperyRoad OpenedRoad HazardousCondition, Road SlipperyRoad, Road(route9A)) Intuitively the probability value associated with an statement (e.g., HazardousCondition(route9A)) is obtained by summing the probabilities of all primitive events e such that the classical KB K e entails (see full definition in paper)

Handling Inconsistent KBs Simple example KB = {T, A {true false: X = true}, BN} Namely, there exists a possible world (when random variable X = true) when the KB is inconsistent But there are other possible worlds in which KB is consistent (X = false) What if Pr BN (X = true) = 10 -6 : the probability of KB being inconsistent is very small Inconsistency tolerant semantics for BDL: degree of unsatisfiability Let e be a primitive event such that its associated KB K e is inconsistent; degree of unsatisfiability is essentially the sum of probabilities p e Reason over consistent subspaces of KB (reasoning / query answering semantics follow)

BDL: Query Answering Semantics Given query = : e Does the axiom hold under a set of possible worlds compatible with e And a KB K = (T, A, BN) that is satisfiable to degree d ( 0) Answer (, pr) consists of a ground substitution for the variables in query and some pr [0, 1] such that pr = infimum { Pr( : e) | Pr is a model of K} See detailed proofs in paper Paper shows that prior query answering semantics on probabilistic KBs (Amato et al) may be counter-intuitive See counter examples in paper

Scalable Query Answering Monte-Carlo sampling for error-bounded approximate query answering To compute the probability pr for a ground substitution (, pr) that satisfies a query : e Observation 1: pr is of the form v Pr BN (v) Essentially, we need to selectively identify possible worlds (based on the query ) Sum up the probabilities (obtained from BN) for selected possible worlds Observation 2: decision makers may not need the exact value of pr Say we need answers such that pr > thr If the true probability that is an answer is 0.95 and thr = 0.75 using only 25 samples we can conclude that is an answer with 95% confidence (true pr = 0.85, # samples = 60; exact pr, # samples = 396)

Experimental Evaluation SHER – A Highly Scalable SOUND and COMPLETE Reasoner for large OWL-DL KB –Reasons over highly expressive ontologies –Reasons over data in relational databases –Highly scalable Can scale to more than 60 million triples Semantically index 300 million triples from the medical literature. –Provide explanations PSHER – Probabilistic extension to SHER using BDL

Scalability via Summarization (ISWC 2006) C1 M1M1 H1H1 isTaughtBy C2C2 M2M2 H2H2 Original ABox likes P1 P2P2 Summary M H likes P C Legend: C – Course P - Person M - Man W – Woman H - Hobby C {C1, C2} isTaughtBy The summary mapping function f that satisfies the constraints: –If any individual a is an explicit member of a concept C in the original Abox, and f(a) is an explicit member of C in the summary Abox. –If ab is explicitly in the original Abox, then f(a) f(b) is explicitly in the summary Abox. –If a relation R(a, b) exists in the original ABox, then R(f(a), f(b)) exists in the summary. If the summary is consistent, then the original Abox is consistent (converse is not true). isTaughtBy TBox: Functional (isTaughtBy) Disjoint (Man, Woman)

Results: Scalability 20 UOBM benchmark data set (university data set) PSHER has sub-linear scalability with # axioms Exact query answering (computing exact pr for ground substitutions) is very expensive State of art reasoner (Pellet) broke down on UOBM-1

Results: Response Time 21 PSHER performs well on threshold queries 99.5% of answers were obtained in a few 10s of seconds Further enhancements PSHER is parallelizable

Traditional approaches to deriving trust from data Assumes pair-wise numeric (dis)similarity metric between two entities: –e.g., eBay recommendation, Netflix ratings Lack of support for conflicts spanning multiple entities: e.g., –3 Sources: S1, S2,S3 –Ax1 = all men are mortal –Ax2 = Socrates is a man –Ax3 = Socrates is not mortal respectively Lack of support for uncertainty in information

Can I trust this information? At the command and control center PSHER detects inconsistency (justifications point to SIGINT Vs agent X) SIGINT is deemed more trusted by the decision maker Cautiously reduce trust in information source X Decision maker weighs in the severity of a possible biological attack and performs what if analysis (what if X is compromised? What if sensing device (SIGINT) had a minor glitch?, which information should be considered and which information should be discarded?) Courtesy: E.J. Wright and K. B. Laskey. Credibility Models for Multi-Source Fusion. In 9 th International Conference on Information Fusion, 2006

Overview Encode information as axioms in a BDL KB Detect inconsistencies and weighted justifications using possible world reasoning Use justifications to assess trust in information sources trust scoring mechanism –Weighted scheme based on prior trust (belief) in information sources and weight of justification

Characteristics of the trust model Security: –robust to shilling –robust to bad-mouthing Scalability: –scale with the volume of information and the number of information sources security-scalability trade-off –Cost of an exhaustive justification search –Cost of a perfectly random uniform sample

Trust Assessment: Degree of unsatisfiability Probabilistic Socrates example: –Axiom1: p1, Axiom2: p2, Axiom3: p3 –8 possible worlds (power set) Only one inconsistent world: {Axiom1, Axiom2, Axiom3} –Probability measure of a possible world derived from the join prob. distribution of BN Pr({Axiom1, Axiom2}) = p1*p2*(1-p3) –Degree of Unsatisfiability DU = p1*p2*p3

Trust value of a source S: Beta(α,β) –α (reward) : function of non conflicting interesting axioms –β (penalty): function of conflicting axioms Compute justifications of K = (A, T, BN) –J (A,T) –(J, BN) is consistent to the degree d<1 –For all J s.t. J J, (J, BN) consistent to the degree 1 How to assign penalty to sources involved in a justification? –Probability measure, weight(J), of a justification J : DU((J,BN)) –Penalty(J) proportional to weight(J) –Penalty(J) distributed across sources contributing axioms to J inverse proportionally to their previous trust value Trust Assessment: Justification Weight

Security-Scalability Tradeoff Impracticality of computing all justifications –Exhaustive exploration of Reiter Search Tree Alternative approach: unbiased sampling –Malicious source cannot systematically hide conflicts Retaining the first K node in the Reiter Search not a solution: –The probability π(vd) the node vd in the path to be selected is –π(vd) = (1/|vi|) Tradeoff : select node vi with probability min(β/π(vi), 1) with β > 0

Experimental evaluation

Summary Decision Support System for Secure Information Flows –Uncertainty: support inconsistent KB and reason over uncertain information –Derived trust values from data –Flexibility: e.g., sensitivity of tactical information decays with space, time and external events –Situation-awareness: e.g., encodes need-to-know based access control policies –Supports for explanations : support intelligent information downgrade and provenance data for what if analysis

THANKS! Contact: Achille Fokoue Email: achille@us.ibm.com

Scenario Coalition: A & B Geo location G={G 1, …,G 4 } As operations described in the table

Summarization effectiveness OntologyInstancesRole Assertions IR A Biopax261,149582,65581583 UOBM-142,585214,17741016,233 UOBM-5179,871927,85459835,375 UOBM-10351,4221,816,15367349,176 UOBM-301,106,8586,494,95076579,845 NIMD1,278,5401,999,7871955 ST874,3193,595,13221183 I – Instances after summarization RA – Role assertions after summarization

Filtering effectiveness OntologyInstancesRole Assertions IR A Biopax261,149582,6553898 UOBM-142,585214,177280284 UOBM-5179,871927,854426444 UOBM-10351,4221,816,153474492 UOBM-301,106,8586,494,950545574 NIMD1,278,5401,999,78721 ST874,3193,595,1321850 I – Instances after filtering RA – Role assertions after filtering

Refinement (AAAI 2007) What if summary is inconsistent? –Either, Original ABox has a real inconsistency Or, ABox was consistent but the process of summarization introduced fake inconsistency in the summary Therefore, we follow a process of Refinement to check for real inconsistency Refinement = Selectively decompress portions of the summary Use Justifications for the inconsistency to select portion of summary to refine –Justification = minimal set of assertions responsible for inconsistency Repeat process iteratively till refined summary is consistent or justification is precise

Refinement: Resolving inconsistencies in a summary C1C1 M1M1 H1H1 isTaughtBy C2C2 M2M2 H2H2 C3C3 W1W1 Original ABox likes P1P1 P3P3 P2P2 Summary M H likes P C W isTaughtBy Legend: C – Course P - Person M - Man W – Woman H - Hobby M H likes P Cx W isTaughtBy Cy M likes Px Cx W isTaughtBy Cy Py H After 1 st RefinementAfter 2 nd Refinement – Consistent Summary Summary is inconsistent Summary still inconsistent! C {C1, C2, C3} Cx {C1, C2}Cy {C3} isTaughtBy Py {P3}Px {P1, P2} isTaughtBy TBox: Functional (isTaughtBy) Disjoint (Man, Woman) isTaughtBy

C1C1 M1M1 H1H1 C2C2 M2M2 H2H2 C3C3 W1W1 Original ABox likes P1P1 P3P3 P2P2 Summary M H likes P C W isTaughtBy Legend: C – Course P - Person M - Man W – Woman H - Hobby M H likes P Cx W isTaughtBy Cy M likes Px Cx W isTaughtBy Cy Py H After 1 st RefinementAfter 2 nd Refinement – Consistent Summary Summary is inconsistent Summary still inconsistent! C {C1, C2, C3} Cx {C1, C2}Cy {C3} isTaughtBy Py {P3}Px {P1, P2} Sample Q: PeopleWithHobby? Not(Q) Solns: P1, P2 Px Not(Q) Refinement: Solving Membership Query (AAAI 2007) TBox: Functional (isTaughtBy) Disjoint (Man, Woman) isTaughtBy

Results : Consistency Check OntologyInstancesRole AssertionsTime for consistency check (in s) Biopax261,149582,6552.3 UOBM-142,585214,1772.9 UOBM-5179,871927,8545.4 UOBM-10351,4221,816,1535.1 UOBM-301,106,8586,494,9507.9 NIMD1,278,5401,999,7870.8 ST874,3193,595,1320.4

Results: Membership Query Answering OntologyType assertionsRole Assertions UOBM-125,453214,177 UOBM-10224,8791,816,153 UOBM-30709,1596,494,950 ReasonerDatasetAvg. Time (in sec) St. Dev (in sec) Range (in sec) KAON2UOBM-121118 - 37 KAON2UOBM-1044823414 - 530 SHERUOBM-1442 - 24 SHERUOBM-1015266 - 191 SHERUOBM-30356312 - 391

Reasoning and Assessing Trust in Uncertain Information using Bayesian Description Logics Achille Fokoue, Mudhakar Srivatsa (IBM-US) Rob Young (dstl-UK)

Similar presentations

Presentation on theme: "Reasoning and Assessing Trust in Uncertain Information using Bayesian Description Logics Achille Fokoue, Mudhakar Srivatsa (IBM-US) Rob Young (dstl-UK)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reasoning and Assessing Trust in Uncertain Information using Bayesian Description Logics Achille Fokoue, Mudhakar Srivatsa (IBM-US) Rob Young (dstl-UK)

Similar presentations

Presentation on theme: "Reasoning and Assessing Trust in Uncertain Information using Bayesian Description Logics Achille Fokoue, Mudhakar Srivatsa (IBM-US) Rob Young (dstl-UK)"— Presentation transcript:

Similar presentations

About project

Feedback