Xin Luna Dong (Google Inc.) Divesh Srivastava (AT&T 5/2013.

Slides:



Advertisements
Similar presentations
Gated Graphs and Causal Inference
Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Xin Luna Dong AT&T Labs-Research Joint work w. Laure Berti-Equille, Yifan Hu, Divesh
Laure Berti (Universite de Rennes 1), Anish Das Sarma (Stanford), Xin Luna Dong (AT&T), Amelie Marian (Rutgers), Divesh Srivastava (AT&T)
~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
Fast Algorithms For Hierarchical Range Histogram Constructions
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.
Dynamic Bayesian Networks (DBNs)
Fusion in web data extraction
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Online Data Fusion School of Computing National University of Singapore AT&T Shannon Research Labs Xuan Liu, Xin Luna Dong, Beng Chin Ooi, Divesh Srivastava.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
The adjustment of the observations
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
Introduction of Probabilistic Reasoning and Bayesian Networks
22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava AT&T Labs-Research.
Ensemble Learning: An Introduction
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Naive Bayes Classifier
Anindya Bhattacharya and Rajat K. De Bioinformatics, 2008.
Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Online Data Fusion School of Computing National University of Singapore AT&T Shannon Research Labs Xuan Liu, Xin Luna Dong, Beng Chin Ooi, Divesh Srivastava.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
Statistics (cont.) Psych 231: Research Methods in Psychology.
Introduction to Bayesian Networks
27 February 2001What is Confidence?Slide 1 What is Confidence? How to Handle Overfitting When Given Few Examples Top Changwatchai AIML seminar 27 February.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Characterizing the Uncertainty of Web Data: Models and Experiences Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Università degli Studi.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Truth Discovery with Multiple Conflicting Information Providers on the Web KDD 07.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
Introduction on Graphic Models
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Inferential Statistics Psych 231: Research Methods in Psychology.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Slides from Luna Dong’s VLDB Tutorials
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Understanding Results
Hypothesis Testing: Hypotheses
Data Integration with Dependent Sources
P-value Approach for Test Conclusion
A Modified Naïve Possibilistic Classifier for Numerical Data
Psych 231: Research Methods in Psychology
Naive Bayes Classifier
1 Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as.
A task of induction to find patterns
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Xin Luna Dong (Google Inc.) Divesh Srivastava (AT&T 5/2013

Conflicts on the Web FlightViewFlightAware Orbitz 6:15 PM 6:22 PM 9:40 PM 8:33 PM 9:54 PM

Copying on the Web

Data Fusion  Data fusion resolves data conflicts and finds the truth S1S2S3S4S5 StonebrakerMITberkeleyMIT MS DewittMSRmsrUWisc BernsteinMSRmsrMSR CareyUCIat&tBEA HalevyGooglegoogleUW

Data Fusion  Data fusion resolves data conflicts and finds the truth  Naïve voting does not work well S1S2S3S4S5 StonebrakerMITberkeleyMIT MS DewittMSRmsrUWisc BernsteinMSRmsrMSR CareyUCIat&tBEA HalevyGooglegoogleUW

Data Fusion  Data fusion resolves data conflicts and finds the truth  Naïve voting does not work well  Two important improvements  Source accuracy  Copy detection  But WHY??? S1S2S3S4S5 StonebrakerMITberkeleyMIT MS DewittMSRmsrUWisc BernsteinMSRmsrMSR CareyUCIat&tBEA HalevyGooglegoogleUW

An Exhaustive but Horrible Explanation Three values are provided for Carey’s affiliation. I. If UCI is true, then we reason as follows. 1) Source S1 provides the correct value. Since S1 has accuracy.97, the probability that it provides this correct value is.97. 2) Source S2 provides a wrong value. Since S2 has accuracy.61, the probability that it provides a wrong value is =.39. If we assume there are 100 uniformly distributed wrong values in the domain, the probability that S2 provides the particular wrong value AT&T is.39/100 = ) Source S3 provides a wrong value. Since S3 has accuracy.4, … the probability that it provides BEA is (1-.4)/100 = ) Source S4 either provides a wrong value independently or copies this wrong value from S3. It has probability.98 to copy from S3, so probability =.02 to provide the value independently; in this case, its accuracy is.4, so the probability that it provides BEA Is ) Source S5 either provides a wrong value independently or copies this wrong value fromS3 orS4. It has probability.99 to copy fromS3 and probability.99 to copy fromS4, so probability (1-.99)(1-.99) =.0001 to provide the value independently; in this case, its accuracy is.21, so the probability that it provides BEA is Thus, the probability of our observed data conditioned on UCI being true is.97*.0039*.006* * = 2.1* II. If AT&T is true, …the probability of our observed data is 9.9* III. If BEA is true, … the probability of our observed data is 4.6* IV. If none of the provided values is true, … the probability of our observed data is 6.3* Thus, UCI has the maximum a posteriori probability to be true (its conditional probability is.91 according to the Bayes Rule).

A Compact and Intuitive Explanation (1) S1, the provider of value UCI, has the highest accuracy (2) Copying is very likely between S3, S4, and S5, the providers of value BEA S1S2S3S4S5 StonebrakerMITBerkeleyMIT MS DewittMSR UWisc BernsteinMSR CareyUCIAT&TBEA HalevyGoogle UW How to generate?

To Some Users This Is NOT Enough (1) S1, the provider of value UCI, has the highest accuracy (2) Copying is very likely between S3, S4, and S5, the providers of value BEA S1S2S3S4S5 StonebrakerMITBerkeleyMIT MS DewittMSR UWisc BernsteinMSR CareyUCIAT&TBEA HalevyGoogle UW WHY is S1 considered as the most accurate source? WHY is copying considered likely between S3, S4, and S5? Iterative reasoning

A Careless Explanation (1) S1, the provider of value UCI, has the highest accuracy  S1 provides MIT, MSR, MSR, UCI, Google, which are all correct (2) Copying is very likely between S3, S4, and S5, the providers of value BEA  S3 andS4 share all five values, and especially, make the same three mistakes UWisc, BEA, UW; this is unusual for independent sources, so copying is likely S1S2S3S4S5 StonebrakerMITBerkeleyMIT MS DewittMSR UWisc BernsteinMSR CareyUCIAT&TBEA HalevyGoogle UW

A Verbose Provenance-Style Explanation

A Compact Explanation P(UCI)> P(BEA) A(S1)>A(S3) P(MSR)>P(Uwisc)P(Google)>P(UW) Copying is more likely between S3, S4, S5 than between S1 and S2, as the former group shares more common values Copying between S3, S4, S5 S1S2S3S4S5 StonebrakerMITBerkeleyMIT MS DewittMSR UWisc BernsteinMSR CareyUCIAT&TBEA HalevyGoogle UW How to generate?

Problem and Contributions  Explaining data-fusion decisions by  Bayesian analysis (MAP)  iterative reasoning  Contributions  Snapshot explanation: lists of positive and negative evidence considered in MAP  Comprehensive explanation: DAG where children nodes represent evidence for parent nodes  Keys: 1) Correct; 2) Compact; 3) Efficient

Outline Motivations and contributions  Techniques  Snapshot explanations  Comprehensive explanations  Related work and conclusions

Explaining the Decision —Snapshot Explanation  MAP Analysis  How to explain ? > > > > >

List Explanation  The list explanation for decision W versus an alternate decision W’ in MAP analysis is in the form of (L+, L-)  L+ is the list of positive evidence for W  L- is the list of negative evidence for W (positive for W’)  Each evidence is associated w. a score  The sum of the scores for positive evidence is higher than the sum of the scores for negative evidence  A snapshot explanation for W contains a set of list explanations, one for each alternative decision in MAP analysis

An Example List Explanation ScoreEvidence Pos 1.6S1 provides a different value from S2 on Stonebraker 1.6S1 provides a different value from S2 on Carey 1.0S1 uses a different format from S2 although shares the same (true) value on Dewitt 1.0S1 uses a different format from S2 although shares the same (true) value on Bernstein 1.0S1 uses a different format from S2 although shares the same (true) value on Halevy 0.7The a priori belief is that S1 is more likely to be independent of S2  Problems  Hidden evidence: e.g., negative evidence—S1 provides the same value as S2 on Dewitt, Bernstein, Halevy  Long lists: #evidence in the list <= #data items + 1

Experiments on AbeBooks Data  AbeBooks Data:  894 data sources (bookstores)  1265*2 data items (book name and authors)  listings  Four types of decisions I. Truth discovery II. Copy detection III. Copy direction IV. Copy pattern (by books or by attributes)

Length of Snapshot Explanations

Categorizing and Aggregating Evidence ScoreEvidence Pos 1.6S1 provides a different value from S2 on Stonebraker 1.6S1 provides a different value from S2 on Carey 1.0S1 uses a different format from S2 although shares the same (true) value on Dewitt 1.0S1 uses a different format from S2 although shares the same (true) value on Bernstein 1.0S1 uses a different format from S2 although shares the same (true) value on Halevy 0.7The a priori belief is that S1 is more likely to be independent of S2 Separating evidence Classifying and aggregating evidence

Improved List Explanation ScoreEvidence Pos 3.2S1 provides different values from S2 on 2 data items 3.06Among the items for which S1 and S2 provide the same value, S1 uses different formats for 3 items 0.7The a priori belief is that S1 is more likely.7 to be independent of S2 Neg 0.06S1 provides the same true value for 3 items as S2  Problems  The lists can still be long: #evidence in the list <= #categories

Length of Snapshot Explanations

Shortening by one order of magnitude

Shortening Lists  Example: lists of scores  L+ = {1000, 500, 60, 2, 1}  L- = {950, 50, 5}  Good shortening  L+ = {1000, 500}  L- = {950}  Bad shortening I  L+ = {1000, 500}  L- = {}  Bad shortening II  L+ = {1000}  L- = {950} No negative evidence Only slightly stronger

Shortening Lists by Tail Cutting  Example: lists of scores  L+ = {1000, 500, 60, 2, 1}  L- = {950, 50, 5}  Shortening by tail cutting  5 positive evidence and we show top-2: L+ = {1000, 500}  3 negative evidence and we show top-2: L- = {950, 50}  Correctness: Score pos >= > >= Score neg  Tail-cutting problem: minimize s+t such that

Shortening Lists by Difference Keeping  Example: lists of scores  L+ = {1000, 500, 60, 2, 1}  L- = {950, 50, 5}  Diff(Score pos, Score neg ) = 558  Shortening by difference keeping  L+ = {1000, 500}  L- = {950}  Diff(Score pos, Score neg ) = 550 (similar to 558)  Difference-keeping problem: minimize such that

A Further Shortened List Explanation ScoreEvidence Pos (3 evid- ence) 3.2S1 provides different values from S2 on 2 data items Neg 0.06S1 provides the same true value for 3 items as S2  Choosing the shortest lists generated by tail cutting and difference keeping

Length of Snapshot Explanations

Further shortening by half

Length of Snapshot Explanations TOP-K does not shorten much Thresholding on scores shortens a lot of but makes a lot of mistakes Combining tail cutting and diff keeping is effective and correct

Outline Motivations and contributions  Techniques Snapshot explanations  Comprehensive explanations  Related work and conclusions

Explaining the Explanation —Comprehensive Explanation

DAG Explanation  The DAG explanation for iterative MAP decision W is a DAG (N, E, R)  N: Each node represents a decision and its list explanations  E: Each edge indicates that the decision in the child node is positive evidence for that of the parent node  R: The root node represents decision W

Full Explanation DAG  Problem: huge when #iterations is large  Many repeated sub-graphs

Critical-Round Explanation DAG  The critical round of decision is the first round before Round#m when W is made (i.e., not W is made in the previous round or Round#1).  For each decision only show its evidence in W’s critical round.

Size of Comprehensive Explanations Critical-round DAG explanations are significantly smaller Full DAG explanations can often be huge

Related Work  Explanation for data-management tasks  Queries [Buneman et al., 2008][Chapman et al., 2009]  Workflows [Davidson et al., 2008]  Schema mappings [Glavic et al., 2010]  Information extraction [Huang et al., 2008]  Explaining evidence propagation in Bayesian network [Druzdzel, 1996][Lacave et al., 2000]  Explaining iterative reasoning [Das Sarma et al., 2010]

Conclusions  Many data-fusion decisions are made through iterative MAP analysis  Explanations  Snapshot explanations list positive and negative evidence in MAP analysis (also applicable for other MAP analysis)  Comprehensive explanations trace iterative reasoning (also applicable for other iterative reasoning)  Keys: Correct, Compact, Efficient

Fusion data sets: lunadong.com/fusionDataSets.htm