Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conditional Random Fields & Table Extraction Dongfang Xu School of Information.

Similar presentations


Presentation on theme: "Conditional Random Fields & Table Extraction Dongfang Xu School of Information."— Presentation transcript:

1 Conditional Random Fields & Table Extraction Dongfang Xu School of Information

2 Outline CRF – Models Introduction – How to model & POS tag Example – Differences with LR & HMMs Table Extraction – Labels & Features – Training & Inference – Evaluation

3 Introduction CRF – Combine classifier with sequential character – Previous state has influence on current state Undirected Graph

4 How to model Target p(Y|X) Modeling – Transition Function – State Function – F(Y,X) = P(Y,X) = p(y 1 |y 0 )p(y 2 |y 1 )…p(y n |y n-1 ) p(y 1 |X) p(y 2 |X)…p(y n |X) P(X)=

5 How to model Target p(Y|X) Modeling – Transition Function – State Function – F(Y,X) – Normalize

6 How to model Target p(Y|X) Modeling CRF Model

7 POS tag Example Linear-Chain CRF – score a sequence of label L of sentence – Feature function example

8 POS tag Example Linear-Chain CRF – score a sequence of label L of sentence – Feature function example – Build model

9 POS tag Example Linear-Chain CRF – score a sequence of label L of sentence – Feature function example – Build model – Learning weights for each features Sentences and associated part-of-speech labels Find the weights: 1. maximize conditional log-likelihood for training set D={[O,L] 1,[O,L] 2 …[O,L] n } 2. Gradient ascent until some stopping conditions

10 Differences with LR and HMM Logistical Regression Treat feature function as ∑u k f k (s,i,l i,l i-1 )=β k0 + β k T x), LR is log-linear model for classification, and CDR is a log-linear model for sequential labels.

11 Differences with LR and HMM HMM is equivalent for CRF 1. Target P(Y,X), generative model  Transition probability (weights for transition feature) and emission probability (weights for state feature) 2. For the data sparseness, need smoothing method to consider all pairs (X,Y)

12 Differences with LR and HMM HMM is equivalent for CRF 1. Target P(Y,X), generative model 2. For the data sparseness, need smoothing method to consider all pairs (X,Y) 3. CRF has a large set of features, since HMM constraints to binary transition and each x dependent only its current state.

13 Outline CRF – Models Introduction – How to model – Differences with LR & HMMs Table Extraction – Labels & Features – Training & Inference – Evaluation – Limitaitons

14 Table Extraction CRF & TE(Pinto) – Emphasize the necessity of Layout and Contents – Input: plain texts of government stat reports – Locate the table and label each line with tag simultaneously. – CRF outperformed the heuristics method, and has consistent performances.

15 Table Extraction Labels and Feature sets – Labels 1. Non-table labels: non-table, blank-line, separator. 2. Header labels: title, super header, table header, sub header, section header. 3. Data row: data row, section data row. 4. Caption: Table foot, table caption.

16 Table Extraction Labels and Feature sets – Labels – Feature 1.White space features. 2.Text features. 3.Separator features

17 Table Extraction Labels and Feature sets – Labels – Feature – Feature representation 1. Binary scores 0, 1 for feature values. 2. Can also use continuous features, like the percentage of white space. 3. Conjunctions of features to capture relationships of labels.

18 Table Extraction Training – Maximize conditional log-likelihood function arg max (ƛ) Inference – Different algorithms are used: forward & backward, viterbi, etc.

19 Evaluation Training data 52 documents, 31915 lines of text, 5764 table lines. Evaluation

20 Evaluation Training data 52 documents, 31915 lines of text, 5764 table lines. Evaluation

21 Thank you! Q&A


Download ppt "Conditional Random Fields & Table Extraction Dongfang Xu School of Information."

Similar presentations


Ads by Google