CCF 贝叶斯网络在中国的应用和发展学术沙龙香港科技大学 BN 理论研究和应用的情况 2012-05-22.

CCF 贝叶斯网络在中国的应用和发展学术沙龙香港科技大学 BN 理论研究和应用的情况 2012-05-22

Overview l Early Work (1992-2002) n Inference: Variable Elimination n Inference: Local Structures n Others: Learning, Decision Making, Book l Latent Tree Models (2000 - ) n Theory and Algorithms n Applications  Multidimensional Clustering, Density Estimation, Latent Structure  Survey Data, Documents, Business Data n Traditional Chinese Medicine (TCM) n Extensions Page 2

Bayesian Networks Page 3

Variable Elimination l Papers: n N. L. Zhang and D. Poole (1994), A simple approach to Bayesian network computations, in Proc. of the 10th Canadian Conference on Artificial Intelligence, Banff, Alberta, Canada, May 16-22.A simple approach to Bayesian network computations n N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference,Journal of Artificial Intelligence Research, 5: 301-328.Exploiting causal independence in Bayesian network inference l Idea Page 4

Variable Elimination Page 5

Variable Elimination Page 6

Variable Elimination l First BN inference algorithm in Page 7 l Russell & Norvig wrote on page 529: l “The algorithm we describe is closest to that developed by Zhang and Poole (1994, 1996)” l Koller and Friedman wrote on page: l “… the variable elimination algorithm, as presented here, first described by Zhang and Poole (1994), …” l The K&F book cites 7 of our papers

Local Structure Page 8

Local Structures: Causal Independence l Papers: n N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference,Journal of Artificial Intelligence Research, 5: 301-328.Exploiting causal independence in Bayesian network inference n N. L. Zhang and D. Poole (1994), Intercausal independence and heterogeneous factorization,i in Proc. of the 10th Conference on Uncertainties in Artificial Intelligence., Seattle, USA, July 29-31Intercausal independence and heterogeneous factorization, Page 9

Local Structures: Causal Independence

Local Structure: Context Specific Independence l Papers: n N. L. Zhang and D. Poole (1999), On the role of context-specific independence in Probabilistic Reasoning IJCAI-99, 1288-1293.On the role of context-specific independence in Probabilistic Reasoning n D. Poole and N. L. Zhang (2003). Exploiting contextual independence in probablisitic inference. Journal of Artificial Intelligence Research, 18: 263-313.Exploiting contextual independence in probablisitic inference. Page 11

Other Works l Parameter Learning n N. L. Zhang (1996), Irrelevance and parameter learning in Bayesian networks, Artificial Intelligence, An International Journal, 88: 359-373.Irrelevance and parameter learning in Bayesian networks l Decision Making n N. L. Zhang (1998), Probabilistic Inference in Influence Diagrams, Computational Intelligence, 14(4): 475-497.Probabilistic Inference in Influence Diagrams, n N. L. Zhang R. Qi and D. Poole (1994) A computational theory of decision networks, International Journal of Approximate Reasoning, 1994, 11 (2): 83-158. PhD Thesis Page 12

Other Works Page 13

Overview l Early Work (1992-2002) n Inference: Variable Elimination n Inference: Local Structures n Others: Learning, Decision Making l Latent Tree Models (2000 - ) n Theory and Algorithms n Applications  Multidimensional Clustering, Density Estimation, Latent Structure  Survey Data, Documents, Business Data n Traditional Chinese Medicine (TCM) n Extensions Page 14

Latent Tree Models: Overview l Concept first mentioned by Pearl 1988 l We are the first one to conduct systematic research on LTMs. n N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. AAAI-02, 230-237. Hierarchical latent class models for cluster analysis n N. L. Zhang (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5(6):697--723, 2004.Hierarchical latent class models for cluster analysis l Earlier Followers: n Aarlborg U of Denmark, Norwegian University of Science and Technology l Recent papers from: n MIT, CMU, USC, Goergia Tech, Edinburgh Page 15

Latent Tree Models l Recent survey by French researcher: Page 16

Latent Tree Models (LTM) l Bayesian networks with n Rooted tree structure n Discrete random variables n Leaves observed (manifest variables) n Internal nodes latent (latent variables) l Also known as hierarchical latent class (HLC) models, HLC models P(Y1), P(Y2|Y1), P(X1|Y2), P(X2|Y2), …

Example Page 18 l Manifest variables n Math Grade, Science Grade, Literature Grade, History Grade l Latent variables n Analytic Skill, Literal Skill, Intelligence

Theory: Root Walking and Model Equivalence l M1: root walks to X2; M2: root walks to X3 l Root walking leads to equivalent models on manifest variables l Implications: n Cannot determine edge orientation from data n Can only learn unrooted models

l Regular latent tree models: For any latent node Z with neighbors X1, X2, …, Regularity l Can focus on regular models only n Irregular models can be made regular n Regularized models better than irregular models l The set of all such models is finite.

Effective Dimension l Standard dimension: n Number of free parameters l Effective dimension n X1, X2, …, Xn: observed variables n P(X1, X2, …, Xn) is a point in a high-D space for each value of the parameter n Spans a manifold as parameter value varies. n Effective dimension: dimension of the manifold. l Parsimonious model: n Standard dimension = effective dimension n Open question: How to test parsimony? Page 21

Effective Dimension Page 22 l Paper: n N. L. Zhang and Tomas Kocka (2004). Effective dimensions of hierarchical latent class models. Journal of Artificial Intelligence Research, 21: 1-17. Effective dimensions of hierarchical latent class models. l Open question: Effective of LTM with one latent variable

Learning Latent Tree Models Determine l Number of latent variables l Cardinality of each latent variable l Model Structure l Conditional probability distributions

Search-Based Learning: Model Selection l Bayesian score: posterior probability P(m|D) n P(m|D) = P(m) ∫ P(D|m, θ) d θ/ P(D) l BIC Score: large sample approximation BIC(m|D) = log P(D|m, θ *) – d logN/2 l BICe Score: BICe(m|D) = log P(D|m, θ *) – d e logN/2 effective dimension d e. n Effective dimensions are difficult to compute n BICe not realistic

Search Algorithms l Papers: n T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246- 2269.Model-based multidimensional clustering of categorical data n N. L. Zhang and T. Kocka (2004). Efficient Learning of Hierarchical Latent Class Models. ICTAI-2004 Efficient Learning of Hierarchical Latent Class Models l Double hill climbing (DHC), 2002 n 7 manifest variables. l Single hill climbing (SHC), 2004 n 12 manifest variables l Heuristic SHC (HSHC), 2004 n 50 manifest variables l EAST, 2011 n 100+ manifest variables l Recent fast algorithm for specific applications.

Illustration of the search process

Algorithm by Others l Variable clustering method n S. Harmeling and C.K. I. Williams. Greedy learning of binary latent trees (2011). IEEE Transactions on Pattern Analysis and Machine Intel ligence, 33(6), 1087- 1097.Greedy learning of binary latent trees n Raphaël Mourad, Christine Sinoquet, Philippe Leray (2010). A hierarchical Bayesian network approach for linkage disequilibrium modeling and data- dimensionality reduction prior to genome-wide association studies. BMC Bioinformatics 2011, 12:16doi:10.1186/1471-2105-12-16.A hierarchical Bayesian network approach for linkage disequilibrium modeling and data- dimensionality reduction prior to genome-wide association studies n Fast, model quality may be poor l Adaptation of Evolution Tree Algorithms n Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, and Alan S. Willsky （ 2011 ）. Learning latent tree graphical models. Journal of Machine Learning Research 1 (2011) 1-48.Learning latent tree graphical models n Fast, has consistence proof, for special LTMs only Page 27

Density Estimation l Characteristics of LTMs n Are computationally very simple to work with. n Can represent complex relationships among manifest variables. n Useful tool for density estimation.

Density Estimation l New approximate inference algorithm for Bayesian networks (Wang, Zhang and Chen, AAAI 08, Exceptional Paper) Sample LTAB Algo sparse sparse dense dense

Multidimensional Clustering l Paper: l T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269.Model-based multidimensional clustering of categorical data l Cluster Analysis n Grouping of objects into clusters so that objects in the same cluster are similar in some sense Page 31

How to Cluster Those? Page 32

How to Cluster Those? Page 33 Style of picture

How to Cluster Those? Page 34 Type of object in picture

How to Cluster Those? Page 35 Multidimensional clustering / Multi-Clustering l How to partition data in multiple ways? n Latent tree models

Latent Tree Models & Multidimensional Clustering l Model relationship between n Observed / Manifest variables  Math Grade, Science Grade, Literature Grade, History Grade n Latent variables  Analytic Skill, Literal Skill, Intelligence l Each latent variable gives a partition n Intelligence: Low, medium, high n Analytic skill: Low, medium, high

ICAC Data // 31 variables, 1200 samples C_City: s0 s1 s2 s3 // very common, quit common, uncommon,.. C_Gov: s0 s1 s2 s3 C_Bus: s0 s1 s2 s3 Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable,... Tolerance_C_Bus: s0 s1 s2 s3 WillingReport_C: s0 s1 s2 // yes, no, depends LeaveContactInfo: s0 s1 // yes, no I_EncourageReport: s0 s1 s2 s3 s4 // very sufficient, sufficient, average,... I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e I_Deterrence:s0 s1 s2 s3 s4 // very sufficient, sufficient, average,... ….. -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0 -1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0 -1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0 ….

Latent Structure Discovery Y2: Demographic info; Y3: Tolerance toward corruption Y4: ICAC performance; Y7: ICAC accountability Y5: Change in level of corruption; Y6: Level of corruption

Multidimensional Clustering Y2=s0: Low income youngsters; Y2=s1: Women with no/low income Y2=s2: people with good education and good income; Y2=s3: people with poor education and average income

Multidimensional Clustering Y3=s0: people who find corruption totally intolerable; 57% Y3=s1: people who find corruption intolerable; 27% Y3=s2: people who find corruption tolerable; 15% Interesting finding: Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus Y3=s0: Same attitude toward C-Gov and C-Bus People who are tough on corruption are equally tough toward C-Gov and C-Bus. People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv

Multidimensional Clustering Interesting finding: Relationship btw background and tolerance toward corruption Y2=s2: ( good education and good income) the least tolerant. 4% tolerable Y2=s3: (poor education and average income) the most tolerant. 32% tolerable The other two classes are in between.

Marketing Data Page 42

Latent Tree Analysis of Text Data l The WebKB Data Set l 1041 web pages collected from 4 CS departments in 1997 l 336 words Page 43

Latent Tree Model for WebKB Data by BI Algorithm Page 44 89 latent variables

Latent Tree Modes for WebKB Data

LTM for Topic Detection l Topic n A latent state n A collection of document l A document can belong to multiple topics 100% Page 48

LTM vs LDA for Topic Detection Page 49 l LTM n Topic  A latent state  A collection of document n A document can belong to multiple topics 100% l LDA n Topic:  Distribution over the entire vocabulary.  The probabilities of the words add to one. n Document:  Distribution over topics.  If a document contains more of one topic, then it contains less of other topics.

Latent Tree Analysis Summary l Finds meaningful facets of data l Identify natural clusters along each facet. l Gives clear picture of what is in data. Page 50

LTM for Spectral Clustering l Original Data Set l Eigenvectors of Laplacian Matrix l Rounding: Eigenvectors to final partition Page 51

LTM for Spectral Clustering l Rounding: n Determine number of clusters n Determine the final partition n No good method available l LTM Method: Page 52

LTM and TCM l Papers n N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine. 42: 229- 245. Took 8 yearsLatent tree models and diagnosis in traditional Chinese medicine n N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Statistical Validation of TCM Theories. Journal of Alternative and Complementary Medicine, 14(5):583- 7. (Featured at TCM Wiki).Statistical Validation of TCM TheorieTCM Wiki n 张连文, 袁世宏，王天芳，赵燕等. 隐结构分析与西医疾病的辨证分型 (I): 基本原理. 世界科学技术 --- 中医药现代化, 13 卷 (3 期 ): 498 ～ 502, 2011. 隐结构分析与西医疾病的辨证分型 n 张连文, 许朝霞，王忆勤，刘腾飞等. 隐结构分析与西医疾病的辨证分型 (II): 综合聚类. 世界科学技术 --- 中医药现代化, 14 卷 (2 期 ), 2012. 隐结构分析与西医疾病的辨证分型 (II): 综合聚类 Page 54

LTM and TCM: Objectives l Statistical validation of TCM postulates [Review of a recent paper] I am very interested in what these authors are trying to do. They are dealing with an important epistemological problem. To go from the many symptoms and signs that patients present, to construct a consistent and other-observer identifiable constellation, is a core task of the medical practitioner. A kind of feedback occurs between what a practitioner is taught/finds listed in books, and what that practitioner encounters in the clinic. The better the constellation is understood, the more accurate the clustering of symptoms, the more consistent is the identification of syndromes among practitioners and through time. While these constellations have been worked into widely-accepted ‘disease constructs’ for biomedicine for some time which are widely accepted as ‘real,’ this is not quite as true for TCM constellations. This latent variable study is interesting not only in itself, but also as providing evidence that what TCM ‘says’ is so, shows up during analysis as demonstrably so. Page 55

LTM and TCM: Objectives l TCM postulates explain occurrence of Symptoms ： n When KIDNEY YANG is in deficiency, it cannot warm the body and the patient feels cold, resulting in intolerance to cold, cold limbs, … n Manifest variables ： Directly observed: Feel cold, cold limbs n Latent variable: Not directly observed: Kidney Yang deficiency n Latent Structure: Relationships between latent variables and manifest variables l Statistical validation of TCM postulates l ) Page 56

Latent Tree Analysis of Symptom Data l Similar to WebKB data n Web page containing words n Patient having symptoms l What will be the result of latent tree analysis? n Different facets of data revealed n Natural clusters along each facet identified n Each facet involves a few symptoms  May correspond to a syndrome  Providing validation to TCM postulates  Providing evidence for syndrome differentiation Page 57

Latent Tree Model for Kidney Data Latent structure matches relevant TCM postulate l Providing validation to TCM postulate Page 58

Latent Tree Model for Kidney Data l Work reported in n N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine. 42: 229-245.Latent tree models and diagnosis in traditional Chinese medicine l Email from: n Bridie Andrews: Bentley University, Boston n Dominique Haughton: ditto, Fellow of American Statistics Association n Lisa Conboy: Harvard Medical School “We are very interested in your paper on “Latent tree models and diagnosis in traditional Chinese medicine”, and are planning to repeat your method using some data we have here on about 270 cases of “irritable bowel syndrome” and their differing TCM diagnoses.” Page 59

Results on Many Data Sets from 973 Project Page 60

Providing Evidence of Syndrome Differentiation Page 61

Providing Evidence of Syndrome Differentiation l How to produce evidence for TCM syndrome diagnosis using latent structure analysis? Page 62

Providing Evidence of Syndrome Differentiation l Imagine sub-typing WM disease D from TCM perspective n Expected conclusion ： several syndromes among D patients l Also providing a basis for distinguishing syndrome Z patients from other D patients Page 63

Picture 2 Page 64

Example: Model for Depression Data Page 65

Example: Model for Depression Data l Evidence provided by Y8 for syndrome classification ： n Two classes: 有胸膈气机不畅, 无胸膈气机不畅 n Sizes of the classes: 48% ， 52% ； n Symptoms important for distinguishing between the two classes (descending order of importance): 憋气、气短、胸闷 and 太息. Others play little role Page 66

Latent Tree Analysis of Prescription Data l Data n Guanganman Hospital n 1287 formulae prescribed for patients with Disharmony between Liver and Spleen Page 67

Latent Tree Model Page 68

Some Partitions Obtained Page 69

Pouch Latent Tree Models (PLTMs) l Probabilistic graphical model with continuous observed variables (X’s) and discrete latent variables (Y’s). l Tree structure Bayesian network except several observed variables can appear in the same node, a pouch. Page 71 l P(X1, X2|Y2), P(X3|Y2), …., P(X7, X8, X9|X4): Gaussian distributions In general: l P(Y2|Y1), …., P(Y4|Y1), P(Y1): Multinomial distributions In general:

Pouch Latent Tree Models (PLTM) l One possible PLTM for the transcript data Page 72 PLTM generalizes Gaussian Mixture Model (GMM), which is PLTM with a single pouch and single latent variable

The UCI Image Data l Each instance represents 3x3 pixel region of an image n with 18 attributes, labeled. l Class labels were first removed and remaining data analyzed using PLTM. Page 73 l Pouches capture natural facets well: l From left to right: Line-Density, Edge, Color, Coordinates l Latent variables represent clusterings

The UCI Image Data l Y1 matches true class partition well. l Y3: partition based on edge and color l Y4: partition based on centroid.col Page 74 l Feature curve: Normalized MI between a latent variables and attributes l Y2 represents a partition along line-density facet l Y1 represents a partition along color facet l Interesting finding: Y1 strongly correlated with centroid.row

谢谢！

CCF 贝叶斯网络在中国的应用和发展学术沙龙香港科技大学 BN 理论研究和应用的情况 2012-05-22.

Similar presentations

Presentation on theme: "CCF 贝叶斯网络在中国的应用和发展学术沙龙香港科技大学 BN 理论研究和应用的情况 2012-05-22."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CCF 贝叶斯网络在中国的应用和发展学术沙龙 香港科技大学 BN 理论研究和应用的情况 2012-05-22.

Similar presentations

Presentation on theme: "CCF 贝叶斯网络在中国的应用和发展学术沙龙 香港科技大学 BN 理论研究和应用的情况 2012-05-22."— Presentation transcript:

Similar presentations

About project

Feedback

CCF 贝叶斯网络在中国的应用和发展学术沙龙香港科技大学 BN 理论研究和应用的情况 2012-05-22.

Presentation on theme: "CCF 贝叶斯网络在中国的应用和发展学术沙龙香港科技大学 BN 理论研究和应用的情况 2012-05-22."— Presentation transcript: