Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Markov Networks Alan Ritter.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
Introduction to Conditional Random Fields John Osborne Sept 4, 2009.
John Lafferty, Andrew McCallum, Fernando Pereira
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Probabilistic networks Inference and Other Problems Hans L. Bodlaender Utrecht University.
Introduction of Probabilistic Reasoning and Bayesian Networks
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Learning Seminar, 2004 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data J. Lafferty, A. McCallum, F. Pereira Presentation:
Bayesian Belief Networks
Conditional Random Fields
Reasoning Under Uncertainty Artificial Intelligence CSPP February 18, 2004.
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Feb, 27, 2015 Slide Sources Raymond J. Mooney University of.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
Named Entity Recognition and the Stanford NER Software Jenny Rose Finkel Stanford University March 9, 2007.
11 CS 388: Natural Language Processing: Discriminative Training and Conditional Random Fields (CRFs) for Sequence Labeling Raymond J. Mooney University.
A Brief Introduction to Graphical Models
Graphical models for part of speech tagging
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon Machine Learning for Sequential.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
John Lafferty Andrew McCallum Fernando Pereira
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Bayesian Inference Artificial Intelligence CMSC February 26, 2002.
Introduction on Graphic Models
Reasoning Under Uncertainty Artificial Intelligence CMSC February 19, 2008.
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
CS 2750: Machine Learning Directed Graphical Models
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Qian Liu CSE spring University of Pennsylvania
Read R&N Ch Next lecture: Read R&N
CSC 594 Topics in AI – Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Reasoning Under Uncertainty Artificial Intelligence CSPP February 18, 2004.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Sequential Learning with Dependency Nets
Reasoning Under Uncertainty
Presentation transcript:

Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9,

Roadmap Graphical Models Modeling independence Models revisited Generative & discriminative models Conditional random fields Linear chain models Skip chain models 2

Preview Conditional random fields Undirected graphical model Due to Lafferty, McCallum, and Pereira,

Preview Conditional random fields Undirected graphical model Due to Lafferty, McCallum, and Pereira, 2001 Discriminative model Supports integration of rich feature sets 4

Preview Conditional random fields Undirected graphical model Due to Lafferty, McCallum, and Pereira, 2001 Discriminative model Supports integration of rich feature sets Allows range of dependency structures Linear-chain, skip-chain, general Can encode long-distance dependencies 5

Preview Conditional random fields Undirected graphical model Due to Lafferty, McCallum, and Pereira, 2001 Discriminative model Supports integration of rich feature sets Allows range of dependency structures Linear-chain, skip-chain, general Can encode long-distance dependencies Used diverse NLP sequence labeling tasks: Named entity recognition, coreference resolution, etc 6

Graphical Models 7

Graphical model Simple, graphical notation for conditional independence Probabilistic model where: Graph structure denotes conditional independence b/t random variables 8

Graphical Models Graphical model Simple, graphical notation for conditional independence Probabilistic model where: Graph structure denotes conditional independence b/t random variables Nodes: random variables 9

Graphical Models Graphical model Simple, graphical notation for conditional independence Probabilistic model where: Graph structure denotes conditional independence b/t random variables Nodes: random variables Edges: dependency relation between random variables 10

Graphical Models Graphical model Simple, graphical notation for conditional independence Probabilistic model where: Graph structure denotes conditional independence b/t random variables Nodes: random variables Edges: dependency relation between random variables Model types: Bayesian Networks Markov Random Fields 11

Modeling (In)dependence Bayesian network 12

Modeling (In)dependence Bayesian network Directed acyclic graph (DAG) 13

Modeling (In)dependence Bayesian network Directed acyclic graph (DAG) Nodes = Random Variables Arc ~ directly influences, conditional dependency 14

Modeling (In)dependence Bayesian network Directed acyclic graph (DAG) Nodes = Random Variables Arc ~ directly influences, conditional dependency Arcs = Child depends on parent(s) No arcs = independent (0 incoming: only a priori) Parents of X = For each X need 15

Example I 16 Russel & Norvig, AIMA

Example I 17 Russel & Norvig, AIMA

Example I 18 Russel & Norvig, AIMA

Simple Bayesian Network MCBN1 ABCDE A B depends on C depends on D depends on E depends on Need: Truth table 19

Simple Bayesian Network MCBN1 ABCDE A = only a priori B depends on C depends on D depends on E depends on Need: P(A) Truth table 2 20

Simple Bayesian Network MCBN1 ABCDE A = only a priori B depends on A C depends on D depends on E depends on Need: P(A) P(B|A) Truth table 2 2*2 21

Simple Bayesian Network MCBN1 ABCDE A = only a priori B depends on A C depends on A D depends on E depends on Need: P(A) P(B|A) P(C|A) Truth table 2 2*2 22

Simple Bayesian Network MCBN1 ABCDE A = only a priori B depends on A C depends on A D depends on B,C E depends on C Need: P(A) P(B|A) P(C|A) P(D|B,C) P(E|C) Truth table 2 2*2 2*2*2 2*2 23

Holmes Example (Pearl) Holmes is worried that his house will be burgled. For the time period of interest, there is a 10^-4 a priori chance of this happening, and Holmes has installed a burglar alarm to try to forestall this event. The alarm is 95% reliable in sounding when a burglary happens, but also has a false positive rate of 1%. Holmes ’ neighbor, Watson, is 90% sure to call Holmes at his office if the alarm sounds, but he is also a bit of a practical joker and, knowing Holmes ’ concern, might (30%) call even if the alarm is silent. Holmes ’ other neighbor Mrs. Gibbons is a well-known lush and often befuddled, but Holmes believes that she is four times more likely to call him if there is an alarm than not. 24

Holmes Example: Model There a four binary random variables: 25

Holmes Example: Model There a four binary random variables: B: whether Holmes ’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called BAWG 26

Holmes Example: Model There a four binary random variables: B: whether Holmes ’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called BAWG 27

Holmes Example: Model There a four binary random variables: B: whether Holmes ’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called BAWG 28

Holmes Example: Model There a four binary random variables: B: whether Holmes ’ house has been burgled A: whether his alarm sounded W: whether Watson called G: whether Gibbons called BAWG 29

Holmes Example: Tables B = #t B=#f A=#t A=#f B #t #f W=#t W=#fA #t #f G=#t G=#fA #t #f

Bayes’ Nets: Markov Property Bayes’s Nets: Satisfy the local Markov property Variables: conditionally independent of non-descendents given their parents 31

Bayes’ Nets: Markov Property Bayes’s Nets: Satisfy the local Markov property Variables: conditionally independent of non-descendents given their parents 32

Bayes’ Nets: Markov Property Bayes’s Nets: Satisfy the local Markov property Variables: conditionally independent of non-descendents given their parents 33

Simple Bayesian Network MCBN1 ABCDE A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)= 34

Simple Bayesian Network MCBN1 ABCDE A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A) 35

Simple Bayesian Network MCBN1 ABCDE A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)P(B|A) 36

Simple Bayesian Network MCBN1 ABCDE A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)P(B|A)P(C|A) 37

Simple Bayesian Network MCBN1 ABCDE A = only a priori B depends on A C depends on A D depends on B,C E depends on C P(A,B,C,D,E)=P(A)P(B|A)P(C|A)P(D|B,C)P(E|C) There exist algorithms for training, inference on BNs 38

Naïve Bayes Model Bayes’ Net: Conditional independence of features given class Y f1f1 f2f2 f3f3 fkfk 39

Naïve Bayes Model Bayes’ Net: Conditional independence of features given class Y f1f1 f2f2 f3f3 fkfk 40

Naïve Bayes Model Bayes’ Net: Conditional independence of features given class Y f1f1 f2f2 f3f3 fkfk 41

Hidden Markov Model Bayesian Network where: y t depends on 42

Hidden Markov Model Bayesian Network where: y t depends on y t-1 x t 43

Hidden Markov Model Bayesian Network where: y t depends on y t-1 x t depends on y t y1y1 y2y2 y3y3 ykyk x 1 x 2 x 3 x k 44

Hidden Markov Model Bayesian Network where: y t depends on y t-1 x t depends on y t y1y1 y2y2 y3y3 ykyk x 1 x 2 x 3 x k 45

Hidden Markov Model Bayesian Network where: y t depends on y t-1 x t depends on y t y1y1 y2y2 y3y3 ykyk x 1 x 2 x 3 x k 46

Hidden Markov Model Bayesian Network where: y t depends on y t-1 x t depends on y t y1y1 y2y2 y3y3 ykyk x 1 x 2 x 3 x k 47

Generative Models Both Naïve Bayes and HMMs are generative models 48

Generative Models Both Naïve Bayes and HMMs are generative models We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y. (Sutton & McCallum, 2006) State y generates an observation (instance) x 49

Generative Models Both Naïve Bayes and HMMs are generative models We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y. (Sutton & McCallum, 2006) State y generates an observation (instance) x Maximum Entropy and linear-chain Conditional Random Fields (CRFs) are, respectively, their discriminative model counterparts 50

Markov Random Fields aka Markov Network Graphical representation of probabilistic model Undirected graph Can represent cyclic dependencies (vs DAG in Bayesian Networks, can represent induced dep) 51

Markov Random Fields aka Markov Network Graphical representation of probabilistic model Undirected graph Can represent cyclic dependencies (vs DAG in Bayesian Networks, can represent induced dep) Also satisfy local Markov property: where ne(X) are the neighbors of X 52

Factorizing MRFs Many MRFs can be analyzed in terms of cliques Clique: in undirected graph G(V,E), clique is a subset of vertices v in V, s.t. for every pair of vertices v i,v j, there exists E(v i,v j ) Example due to F. Xia 53

Factorizing MRFs Many MRFs can be analyzed in terms of cliques Clique: in undirected graph G(V,E), clique is a subset of vertices v in V, s.t. for every pair of vertices v i,v j, there exists E(v i,v j ) Maximal clique can not be extended Example due to F. Xia 54

Factorizing MRFs Many MRFs can be analyzed in terms of cliques Clique: in undirected graph G(V,E), clique is a subset of vertices v in V, s.t. for every pair of vertices v i,v j, there exists E(v i,v j ) Maximal clique can not be extended Maximum clique is largest clique in G. Clique: Maximal clique: Maximum clique: Example due to F. Xia A B C ED 55

MRFs Given an undirected graph G(V,E), random vars: X Cliques over G: cl(G) Example due to F. Xia 56

MRFs Given an undirected graph G(V,E), random vars: X Cliques over G: cl(G) B C ED Example due to F. Xia 57

MRFs Given an undirected graph G(V,E), random vars: X Cliques over G: cl(G) B C ED Example due to F. Xia 58

Conditional Random Fields Definition due to Lafferty et al, 2001: Let G = (V,E) be a graph such that Y=(Y v ) vinV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Y v obey the Markov property with respect to the graph: p(Y v |X,Y w,w!=v)=p(Y v |X,Y w,w~v), where w ∼ v means that w and v are neighbors in G 59

Conditional Random Fields Definition due to Lafferty et al, 2001: Let G = (V,E) be a graph such that Y=(Y v ) vinV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Y v obey the Markov property with respect to the graph: p(Y v |X,Y w,w!=v)=p(Y v |X,Y w,w~v), where w ∼ v means that w and v are neighbors in G. A CRF is a Markov Random Field globally conditioned on the observation X, and has the form: 60

Linear-Chain CRF CRFs can have arbitrary graphical structure, but.. 61

Linear-Chain CRF CRFs can have arbitrary graphical structure, but.. Most common form is linear chain Supports sequence modeling Many sequence labeling NLP problems: Named Entity Recognition (NER), Coreference 62

Linear-Chain CRF CRFs can have arbitrary graphical structure, but.. Most common form is linear chain Supports sequence modeling Many sequence labeling NLP problems: Named Entity Recognition (NER), Coreference Similar to combining HMM sequence w/MaxEnt model Supports sequence structure like HMM but HMMs can’t do rich feature structure 63

Linear-Chain CRF CRFs can have arbitrary graphical structure, but.. Most common form is linear chain Supports sequence modeling Many sequence labeling NLP problems: Named Entity Recognition (NER), Coreference Similar to combining HMM sequence w/MaxEnt model Supports sequence structure like HMM but HMMs can’t do rich feature structure Supports rich, overlapping features like MaxEnt but MaxEnt doesn’t directly supports sequences labeling 64

Discriminative & Generative Model perspectives (Sutton & McCallum) 65

Linear-Chain CRFs Feature functions: In MaxEnt: f: X x Y  {0,1} e.g. f j (x,y) = 1, if x=“rifle” and y=talk.politics.guns, 0 o.w. 66

Linear-Chain CRFs Feature functions: In MaxEnt: f: X x Y  {0,1} e.g. f j (x,y) = 1, if x=“rifle” and y=talk.politics.guns, 0 o.w. In CRFs, f: Y x Y x X x T  R e.g. f k (y t,y t-1,x,t)=1, if y t =V and y t-1 =N and x t =“flies”,0 o.w. frequently indicator function, for efficiency 67

Linear-Chain CRFs Feature functions: In MaxEnt: f: X x Y  {0,1} e.g. f j (x,y) = 1, if x=“rifle” and y=talk.politics.guns, 0 o.w. In CRFs, f: Y x Y x X x T  R e.g. f k (y t,y t-1,x,t)=1, if y t =V and y t-1 =N and x t =“flies”,0 o.w. frequently indicator function, for efficiency 68

Linear-Chain CRFs 69

Linear-Chain CRFs 70

Linear-chain CRFs: Training & Decoding Training: 71

Linear-chain CRFs: Training & Decoding Training: Learn λ j Approach similar to MaxEnt: e.g. L-BFGS 72

Linear-chain CRFs: Training & Decoding Training: Learn λ j Approach similar to MaxEnt: e.g. L-BFGS Decoding: Compute label sequence that optimizes P(y|x) Can use approaches like HMM, e.g. Viterbi 73

Skip-chain CRFs 74

Motivation Long-distance dependencies: 75

Motivation Long-distance dependencies: Linear chain CRFs, HMMs, beam search, etc All make very local Markov assumptions Preceding label; current data given current label Good for some tasks 76

Motivation Long-distance dependencies: Linear chain CRFs, HMMs, beam search, etc All make very local Markov assumptions Preceding label; current data given current label Good for some tasks However, longer context can be useful e.g. NER: Repeated capitalized words should get same tag 77

Motivation Long-distance dependencies: Linear chain CRFs, HMMs, beam search, etc All make local Markov assumptions Preceding label; current data given current label Good for some tasks However, longer context can be useful e.g. NER: Repeated capitalized words should get same tag 78

Skip-Chain CRFs Basic approach: Augment linear-chain CRF model with Long-distance ‘skip edges’ Add evidence from both endpoints 79

Skip-Chain CRFs Basic approach: Augment linear-chain CRF model with Long-distance ‘skip edges’ Add evidence from both endpoints Which edges? 80

Skip-Chain CRFs Basic approach: Augment linear-chain CRF model with Long-distance ‘skip edges’ Add evidence from both endpoints Which edges? Identical words, words with same stem? 81

Skip-Chain CRFs Basic approach: Augment linear-chain CRF model with Long-distance ‘skip edges’ Add evidence from both endpoints Which edges? Identical words, words with same stem? How many edges? 82

Skip-Chain CRFs Basic approach: Augment linear-chain CRF model with Long-distance ‘skip edges’ Add evidence from both endpoints Which edges? Identical words, words with same stem? How many edges? Not too many, increases inference cost 83

Skip Chain CRF Model Two clique templates: Standard linear chain template 84

Skip Chain CRF Model Two clique templates: Standard linear chain template Skip edge template 85

Skip Chain CRF Model Two clique templates: Standard linear chain template Skip edge template 86

Skip Chain CRF Model Two clique templates: Standard linear chain template Skip edge template 87

Skip Chain NER Named Entity Recognition: Task: start time, end time, speaker, location In corpus of seminar announcement s 88

Skip Chain NER Named Entity Recognition: Task: start time, end time, speaker, location In corpus of seminar announcement s All approaches: Orthographic, gazeteer, POS features Within preceding, following 4 word window 89

Skip Chain NER Named Entity Recognition: Task: start time, end time, speaker, location In corpus of seminar announcement s All approaches: Orthographic, gazeteer, POS features Within preceding, following 4 word window Skip chain CRFs: Skip edges between identical capitalized words 90

NER Features 91

Skip Chain NER Results Skip chain improves substantially on ‘speaker’ recognition - Slight reduction in accuracy for times 92

Summary Conditional random fields (CRFs) Undirected graphical model Compare with Bayesian Networks, Markov Random Fields 93

Summary Conditional random fields (CRFs) Undirected graphical model Compare with Bayesian Networks, Markov Random Fields Linear-chain models HMM sequence structure + MaxEnt feature models 94

Summary Conditional random fields (CRFs) Undirected graphical model Compare with Bayesian Networks, Markov Random Fields Linear-chain models HMM sequence structure + MaxEnt feature models Skip-chain models Augment with longer distance dependencies Pros: 95

Summary Conditional random fields (CRFs) Undirected graphical model Compare with Bayesian Networks, Markov Random Fields Linear-chain models HMM sequence structure + MaxEnt feature models Skip-chain models Augment with longer distance dependencies Pros: Good performance Cons: 96

Summary Conditional random fields (CRFs) Undirected graphical model Compare with Bayesian Networks, Markov Random Fields Linear-chain models HMM sequence structure + MaxEnt feature models Skip-chain models Augment with longer distance dependencies Pros: Good performance Cons: Compute intensive 97

HW #5 98

HW #5: Beam Search Apply Beam Search to MaxEnt sequence decoding Task: POS tagging Given files: test data: usual format boundary file: sentence lengths model file Comparisons: Different topN, topK, beam_width 99

Tag Context Following Ratnaparkhi ‘96, model uses previous tag (prevT=tag) and previous tag bigram (prevTwoTags=tag i-2 +tag i-1 ) These are NOT in the data file; you compute them on the fly. Notes: Due to sparseness, it is possible a bigram may not appear in the model file. Skip it. These are feature functions: If you have a different candidate tag for the same word, weights will differ. 100

Uncertainty Real world tasks: Partially observable, stochastic, extremely complex Probabilities capture “Ignorance & Laziness” Lack relevant facts, conditions Failure to enumerate all conditions, exceptions 101

Motivation Uncertainty in medical diagnosis Diseases produce symptoms In diagnosis, observed symptoms => disease ID Uncertainties Symptoms may not occur Symptoms may not be reported Diagnostic tests not perfect False positive, false negative How do we estimate confidence? 102