From Association Analysis to Causal Discovery Prof Jiuyong Li University of South Australia.

Slides:



Advertisements
Similar presentations
Research Skills Workshop Designing a Project
Advertisements

Mining Causal Association Rules Jiuyong Li, Thuc Duy Le, Lin Liu, Jixue Liu, Zhou Jin, and Bingyu Sun University of South Australia Adelaide, Australia.
Correlational and Differential Research
Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah.
Cross Sectional Designs
Deriving Biological Inferences From Epidemiologic Studies.
Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.
GROUP-LEVEL DESIGNS Chapter 9.
Correlation and regression Dr. Ghada Abo-Zaid
What is Statistical Modeling
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Statistical Methods Chichang Jou Tamkang University.
Chapter Three Research Design.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Are exposures associated with disease?
Chapter 1: Introduction to Statistics
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
Study Design. Study Designs Descriptive Studies Record events, observations or activities,documentaries No comparison group or intervention Describe.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Causation and the Rules of Inference Classes 4 and 5.
Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University.
1 Probability and Statistics  What is probability?  What is statistics?
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Correlational Research Chapter Fifteen Bring Schraw et al.
1 From Association Rules To Causality Presenters: Amol Shukla, University of Waterloo Claude-Guy Quimper, University of Waterloo.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Unit 1 Sections 1-1 & : Introduction What is Statistics?  Statistics – the science of conducting studies to collect, organize, summarize, analyze,
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
3-1 Copyright © 2010 Pearson Education, Inc. Chapter Three Research Design.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
Single-Subject and Correlational Research Bring Schraw et al.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
Case control & cohort studies
Introduction to General Epidemiology (2) By: Dr. Khalid El Tohami.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Epidemiological Study Designs And Measures Of Risks (1)
Methods of Presenting and Interpreting Information Class 9.
Relative and Attributable Risks
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
Lecture8 Test forcomparison of proportion
Comparison of three Observational Analytical strategies
Epidemiological Studies
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Purpose of Research Research may be broadly classified into two areas; basic and applied research. The primary purpose of basic research (as opposed to.
بسم الله الرحمن الرحيم COHORT STUDIES.
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Categorical Data Aims Loglinear models Categorical data
Discriminative Frequent Pattern Analysis for Effective Classification
Narrative Reviews Limitations: Subjectivity inherent:
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Interpreting Epidemiologic Results.
RISK ASSESSMENT, Association and causation
Almost-Exact Matching for Causal Inference
Presentation transcript:

From Association Analysis to Causal Discovery Prof Jiuyong Li University of South Australia

Association analysis Diapers -> Beer Bread & Butter -> Milk

Positive correlation of birth rate to stork population increasing the stork population would increase the birth rate?

Further evidence for Causality ≠ Associations Simpson paradox RecoveredNot recoveredSumRecover rate Drug % No Drug % FemaleRecoveredNot recoveredSumRecover rate Drug281020% No Drug % MaleRecoveredNot recoveredSumRecover rate Drug % No Drug731070%

Association and Causal Relationship Two variables X and Y. Prob(Y | X) ≠ P(Y), X is associated with Y (association rules) Prob(Y | do X) ≠ Prob(Y | X) How does Y vary when X changes? The key, How to estimate Prob(Y | do X)? In association analysis, the relationship of X and Y is analysed in isolation. However, the relationship between X and Y is affected by other variables. 5

Causal discovery 1 Randomised controlled trials – Gold standard method – Expensive – Infeasible Association = causation

Causal discovery 2 Bayesian network based causal inference – Do-calculus (Pearl 2000) – IDA (Maathuis et al. 2009) – To infer causal effects in a Bayesian network. – However – Constructing a Bayesian network is NP hard – Low scalability to large number of variables

Leaning causal structures PC algorithm (Spirtes, Glymour and Scheines) – Not (A ╨ B | Z), there is an edge between A and B. – The search space exponentially increases with the number of variables. Constraint based search – CCC (G. F. Cooper, 1997) – CCU (C. Silverstein et. al. 2000) – Efficiently removing non- causal relationships. AC B ABCABC CCU AC B A  B  C, A  B  C, C  A  B CCC

Association rules Many efficient algorithms Hundreds of thousands to millions of rules. – Many are spurious. Interpretability – Association rules do not indicate causal effects.

Causal rules Discover causal relationships using partial association and simulated cohort study. Do not rely on Bayesian network structure learning. The discovery of causal rules also have strong theoretical support. Discover both single cause and combined causes. Can be discovered efficiently. Z. Jin, J. Li, L. Liu, T. D. Le, B. Sun, and R. Wang, Discovery of causal rules using partial association. ICDM, 2012 J. Li, T. D. Le, L. Liu, J. Liu, Z. Jin, and B. Sun. Mining causal association rules. In Proceedings of ICDM Workshop on Causal Discovery (CD), 2013.

Problem ABCDEFY#repeats Discover causal rules from large databases of binary variables A  Y C  Y BF  Y DE  Y

Partial association test IJ K IJK IJ K M. W. Birch, Nonzero partial association

Partial association test – an example 4. Partial association test. ABCDEFYG#repeat

Fast partial association test K denotes all possible variable combinations, the number is very large. Counting the frequencies of the combinations is also time consuming. Our solution: – Sort data and count frequencies of the equivalence classes. – Only use the combinations existing in the data set.

Pruning strategies Definition (Redundant causal rules): Assume that X ⊂ W, if X → Y is a causal rule, rule W → Y is redundant as it does not provide new information. Definition (Condition for testing causal rules): We only test a combined causal rule XV → Y if X and Y have a zero association and V and Y have a zero association (cannot pass the qui- square test in step 3).

Algorithm ABCDEFGY#repeats Prune the variable set (support) 2. Create the contingency table for each variable X x Y=1Y=0Total X=1n 11 n 12 n 1. X=0n 21 n 22 n 2. Totaln.1 n.2 n 3. Calculate the If go to next step 4. Partial association test. If PA(X, Y, K) is nonzero then X  Y is a causal rule. 5. Repeat 1-4 for each variable which is the combination of variables in set N If move X to a set N positive association zero association

Experimental evaluations We use the Arrhythmia data set in UCI machine learning repository. – We need to classify the presence and absence of cardiac arrhythmia. The data set contains 452 records and each record obtains 279 data attributes and one class attribute Our results are quite consistent with the results from CCC method. Some rules in CCC are removed by our method as they cannot pass the partial association test. Our method can discover the combined rules. CCC and CCU methods are not set to discover these rules.

Comparison with CCC and CCU

Experimental evaluations Figure 1: Extraction Time Comparison (20K Records)Figure 1: Extraction Time Comparison (100K Records)

Summary 1 Simpson paradox – Associations might be inconsistent in subsets Partial association test – Test the persistency of associations in all possible partitions. – Statistically sound. – Efficiency in sparse data. What else?

Cohort study 1 Defined population Expose Not expose Not have a disease Have a disease Not have a disease Have a disease Prospective: follow up. Retrospective: look back. Historic study.

Cohort study 2 Cohorts: share common characteristics but exposed or not exposed. Determine how the exposure causes an outcome. Measure: odds ratio = (a/b) / (c/d) DiseasedHealthy Exposedab Not exposedcd

Limitations of cohort study Need to know a hypothesis beforehand Domain experts determine the control variables. Collect data and test the hypothesis. Not for data exploration. We need – Given a data set without any hypotheses. – An automatic method to find and validate hypotheses. – For data exploration.

Control variables If we do not control covariates (especially those correlated to the outcome), we could not determine the true cause. Too many control variables result too few matched cases in data. – How many people with the same race, gender, blood type, hair colour, eye colour, education level, …. Irrelevant variables should not be controlled. – Eye colour may not relevant to the study. Cause Outcome Other factors

Matches Exact matching – Exact matches on all covariates. Infeasible. Limited exact matching – Exact matches on a few key covariates. Nearest neighbour matching – Find the closest neighbours Propensity score matching – Based on the predicted effect of a treatment of covariates.

Method1 ABCDEFY Discover causal association rules from large databases of binary variables A  Y ABCDEFY Fair dataset

Methods ABCDEFY Fair dataset A: Exposure variable {B,C,D,E,F}: controlled variable set. Rows with the same color for the controlled variable set are called matched record pairs. A=0 A=1Y=1Y=0 Y=1n 11 n 12 Y=0n 21 n 22 An association rule is a causal association rule if: A  Y

Algorithm 28 ABCDEFGY ……… Remove irrelevant variables (support, local support, association) 2.Find the exclusive variables of the exposure variable (support, association), i.e. G, F. The controlled variable set = {B, C, D, E}. x 3. Find the fair dataset. Search for all matched record pairs 4. Calculate the odds-ratio to identify if the testing rule is causal 5. Repeat 2-4 for each variable which is the combination of variables. Only consider combination of non-causal factors. For each association rule (e. g. ) A  Y ABCDEY ……… …… x

Experimental evaluations

Figure 1: Extraction Time Comparison (20K Records) CAR CCC CCU

Experimental evaluations

Causality – Judea Pearl Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, X1X1 X2X2 …X n-1 XnXn ……………

Methods IDA – Maathuis, H. M., Colombo, D., Kalisch, M., and Buhlmann, P. (2010). Predicting causal effects in large- scale systems from observational data. Nature Methods, 7(4), 247–

Conclusions Association analysis has been widely used in data mining, but associations do not indicate causal relationships. Association rule mining can be adapted for causal relationship discovery by combining some statistical methods. – Partial association test – Cohort study They are efficient alternatives for causal Bayesian network based methods. They are capable of finding combined causal factors.

Discussions Causality and classification – Estimate prob (Y| do X) instead of prob (Y|X). Feature section versus controlled variable selection. Evaluation of causes. – Not classification accuracy – Bayesian networks??

Research Collaborators Jixue Liu Lin Liu Thuc Le Jin Zhou Bin-yu Sun

Thank you for listening Questions please ??