Presentation is loading. Please wait.

Presentation is loading. Please wait.

11/12/2018 FACTORIE: Efficient Probabilistic Programming for Relational Factor Graphs via Imperative Declarations of Structure, Inference and Learning.

Similar presentations


Presentation on theme: "11/12/2018 FACTORIE: Efficient Probabilistic Programming for Relational Factor Graphs via Imperative Declarations of Structure, Inference and Learning."— Presentation transcript:

1 11/12/2018 FACTORIE: Efficient Probabilistic Programming for Relational Factor Graphs via Imperative Declarations of Structure, Inference and Learning Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts Joint work with Aron Culotta, Khashayar Rohanemanesh, Michael Wick, Karl Schultz, Sameer Singh. NIPS Workshop on Probabilistic Programming, December 13, 2008

2 (Linear Chain) Conditional Random Fields
11/12/2018 [Lafferty, McCallum, Pereira 2001] Undirected graphical model, trained to maximize conditional probability of output (sequence) given input (sequence) Finite state model Graphical model OTHER PERSON OTHER ORG TITLE … output seq y y y y y t - 1 t t+1 t+2 t+3 FSM states . . . observations x x x x x t - 1 t t +1 t +2 t +3 said Jones a Microsoft VP … input seq A CRF is simply an undirected graphical model trained to maximize a conditional probability. First explorations with these models centered on finite state models, represented as linear-chain graphical models, with joint probability distribution over state sequence Y calculated as a normalized product over potentials on cliques of the graph. As is often traditional in NLP and other application areas, these potentials are defined to be log-linear combination of weights on features of the clique values. The chief excitement from an application point of view is the ability to use rich and arbitrary features of the input without complicating inference. Yielded many good results, for example… Explosion of interest in many different conferences. where Asian word segmentation [COLING’04], [ACL’04] IE from Research papers [HTL’04] Object classification in images [CVPR ‘04] Wide-spread interest, positive experimental results in many applications. Noun phrase, Named entity [HLT’03], [CoNLL’03] Protein structure prediction [ICML’04] IE from Bioinformatics text [Bioinformatics ‘04],…

3 1. Jointly labeling cascaded sequences Factorial CRFs
11/12/2018 1. Jointly labeling cascaded sequences Factorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words In some situations we want to output not a single label sequence but several, for example… Rather than performing each of these labeling tasks serially in a cascade, Factorial CRFs perform all three tasks jointly, Providing matching accuracy with only 50% of the training data Joint prediction of part-of-speech and noun-phrase in newswire, matching accuracy with only 50% of the training data. Inference: Loopy Belief Propagation

4 2. Jointly labeling distant mentions Skip-chain CRFs
11/12/2018 2. Jointly labeling distant mentions Skip-chain CRFs [Sutton, McCallum, SRL 2004] Senator Joe Green said today … Green ran for … 14% reduction in error on most repeated field in seminar announcements. Skip-chain CRFs allow the model to capture dependencies among the labels of distant words, Such as this occurrence of “Green” in which there is a lot of local evidence that it is a person’s name, and this one in which there isn’t. Inference: Tree reparameterized BP See also [Finkel, et al, 2005] [Wainwright et al, 2002]

5 3. Joint co-reference among all pairs Affinity Matrix CRF
11/12/2018 3. Joint co-reference among all pairs Affinity Matrix CRF Mr. Hill [McCallum, Wellner, IJCAI WS 2003, NIPS 2004] “Entity resolution” “Object correspondence” Dana Hill Amy Hall ~25% reduction in error on co-reference of proper nouns in newswire. Traditionally in NLP co-reference has been performed by making independent coreference decisions on each pair of entity mentions. An Affinity Matrix CRF jointly makes all coreference decisions together, accounting for multiple constraints. Dana she Inference: Correlational clustering graph partitioning [Bansal, Blum, Chawla, 2002]

6 CRFs Providing Current State-of-the-Art
11/12/2018 CRFs Providing Current State-of-the-Art POS tagging, Chunking, Named Entity Co-reference (Dependency) Parsing Sequence Alignment Schema Alignment Semantic Role Labeling After building many models by hand,... we finally feel brave enough to build a general toolkit. Driven “bottom-up” from our needs & experiences;... not “top-down”

7 Entity Resolution “mention” Mr. Hill “mention” “mention” Dana Hill Amy
11/12/2018 Entity Resolution “mention” Mr. Hill “mention” “mention” Dana Hill Amy Hall “mention” Dana she “mention”

8 Entity Resolution Mr. Hill “entity” Dana Hill Amy Hall Dana she
11/12/2018 Entity Resolution Mr. Hill “entity” Dana Hill Amy Hall Dana she “entity”

9 Entity Resolution Mr. Hill “entity” Dana Hill Amy Hall “entity” Dana
11/12/2018 Entity Resolution Mr. Hill “entity” Dana Hill Amy Hall “entity” Dana she

10 Entity Resolution Mr. Hill “entity” “entity” Dana Hill Amy Hall Dana
11/12/2018 Entity Resolution Mr. Hill “entity” “entity” Dana Hill Amy Hall Dana she “entity”

11 Pairwise Affinity CRF for Co-reference
11/12/2018 Pairwise Affinity CRF for Co-reference [McCallum & Wellner, 2003, ICML] Mr. Hill C N Dana Hill Amy Hall N C N C N N C Dana she C

12 Pairwise Affinity is not Enough
11/12/2018 Pairwise Affinity is not Enough Mr. Hill C C Dana Hill Amy Hall N C N C N N N Dana she C

13 Pairwise Affinity is not Enough
11/12/2018 Pairwise Affinity is not Enough she C N she Amy Hall N C N C N N C she she N

14 Pairwise Comparisons Not Enough Examples:
11/12/2018 Pairwise Comparisons Not Enough Examples:  mentions are pronouns? Entities have multiple attributes (name, , institution, location); need to measure “compatibility” among them. Having 2 “given names” is common, but not 4. e.g. Howard M. Dean / Martin, Dean / Howard Martin Need to measure size of the clusters of mentions.  a pair of lastname strings that differ > 5? We need to ask ,  questions about a set of mentions We want first-order logic!

15 Pairwise Affinity is not Enough
11/12/2018 Pairwise Affinity is not Enough she C N she Amy Hall N C N C N N C she she N

16 Partition Affinity CRF
11/12/2018 Partition Affinity CRF she she Amy Hall she she Ask arbitrary questions about all entities in a partition with first-order logic... ... bringing together LOGIC and PROBABILITY

17 Partition Affinity CRF
11/12/2018 Partition Affinity CRF she she Amy Hall she she

18 Partition Affinity CRF
11/12/2018 Partition Affinity CRF she she Amy Hall she she

19 Partition Affinity CRF
11/12/2018 Partition Affinity CRF she she Amy Hall she she

20 Partition Affinity CRF
11/12/2018 Partition Affinity CRF she she Amy Hall she she

21 11/12/2018 This space complexity is common in probabilistic first-order logic models

22 Don’t represent all alternatives...
11/12/2018 Don’t represent all alternatives... she Amy Hall

23 Don’t represent all alternatives... just one
11/12/2018 Don’t represent all alternatives... just one at a time she Amy Hall Proposal Distribution Stochastic Jump she Amy Hall Inference by MCMC, e.g. Metropolis-Hastings like BLOG, and others. Acceptance proportional to ratio of unnormalized scores. Only examine the differences!

24 Partition Affinity CRF Experiments
11/12/2018 To our knowledge, best previously reported results: % % % Better Training Likelihood-based Training Rank-based Training Partition Affinity 69.2 79.3 Pairwise 62.4 72.5 State-of-the-art Better Representation B-Cubed F1 Score on ACE 2004 Noun Coreference [Culotta, Wick, Hall, McCallum, 2007]

25 Logic + Probability Significant interest in this combination
11/12/2018 Logic + Probability Significant interest in this combination Poole, Muggleton, DeRaedt, Sato, Domingos,... We now hypothesize that in this combination the “logic” aspect is mostly a red herring. Power: repeated relational structures and tied parameters Logic is one way to specify these structures, but not the only one, and perhaps not the best. In deterministic programming, Prolog replaced by imperative lang’s programmers have to keep imperative solver in mind afterall much domain knowledge is procedural anyway Logical inference replaced by probabilistic inference.

26 Declarative Model Specification
11/12/2018 Declarative Model Specification Biggest advance in AI & ML in the last 2 decades Gone too far? Much domain knowledge is also procedural. Logic + Probability  Imperative + Probability Rising interest: Church, Csoft,... Our approach Preserve the declarative statistical semantics of factor graphs Provide imperative hooks to define structure, parameterization, inference, learning. Efficient. Easy-to-use.

27 Our Design Goals Represent factor graphs Scalability
11/12/2018 Our Design Goals Represent factor graphs emphasis on discriminative undirected models Scalability input data, output configuration, factors, tree-width observed data that cannot fit in memory exponential number of factors Efficient discriminative parameter estimation sensitive to the expense of inference Integrate declarative & procedural knowledge natural, easy-to-use upcoming slides: 3 examples of injecting imperativ-ism into factor graphs

28 FACTORIE Factor Graphs, Imperative, Extensible
11/12/2018 FACTORIE Factor Graphs, Imperative, Extensible Implemented as a library in Scala [Martin Odersky] object oriented & functional type inference lazy evaluation everything an object (int, float,...) nice syntax for creating “domain-specific languages” runs in JVM (complete interoperation with Java) “Haskell++ in a Java style” Library, not new “little language” all familiar Java constructs & libraries available to you integrate data pre-processing & eval. w/ model spec Scala makes syntax not too bad. But not as compact as a dedicated language (BLOG, MLN)

29 Scala New variable var myHometown : String var myAltitude = 10523.2
11/12/2018 Scala New variable var myHometown : String var myAltitude = New constant variable val myName = “Andrew” New method def climb(increment:double) = myAltitude += increment New class class Skier extends Person New trait (like Java interface with implementations) trait FirstAid { def applyBandage = ... } New class with trait class BackcountrySkier extends Skier with FirstAid New static object [generic] object GlobalSkierTable extends ArrayList[Skier]

30 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) class Token(word:String) extends EnumVariable(word) T F F T F F Labels Words Bill loves skiing Tom loves snowshoeing

31 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq class Token(word:String) extends EnumVariable(word) with VarSeq label.prev label.next T F F T F F Labels Words Bill loves skiing Tom loves snowshoeing

32 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq class Token(word:String) extends EnumVariable(word) with VarSeq T F F T F F Labels Words Bill loves skiing Tom loves snowshoeing

33 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label Avoid representing relations by indices. Do it directly with members, pointers... arbitrary data structure. T F F T F F Labels Words Bill loves skiing Tom loves snowshoeing

34 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 T F F T F F Labels Words Bill loves skiing Tom loves snowshoeing

35 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] T F F T F F Labels Words Bill loves skiing Tom loves snowshoeing

36 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] T F F T F F Labels Words Bill loves skiing Tom loves snowshoeing

37 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] object StateTransitionTemplate extends Template2[Label,Label] T F F T F F Labels Words Bill loves skiing Tom loves snowshoeing

38 Imperativ-ism #1: Jump Function
11/12/2018 Imperativ-ism #1: Jump Function Proposal “jump function” Make changes to world state Sometimes simple, sometimes not Sample Gaussian with mean at old value Sample cluster to split, run stochastic greedy agglomerative clustering Gibbs sampling, one variable at a time poor mixing Rich jump function Natural place to embed domain knowledge about what variables should change in concert.

39 Imperativ-ism #1: Jump Function
11/12/2018 Imperativ-ism #1: Jump Function Proposal “jump function” Make changes to world state Sometimes simple, sometimes not Sample Gaussian with mean at old value Sample cluster to split, run stochastic greedy agglomerative clustering Gibbs sampling, one variable at a time poor mixing Rich jump function Natural place to embed domain knowledge about what variables should change in concert. Avoid some expensive deterministic factors with property-preserving jump functions (e.g. coref transitivity, dependency parsing projectivity)

40 Key Operation: Scoring a Proposal
11/12/2018 Key Operation: Scoring a Proposal Acceptance probability ~ ratio of model scores. Scores of factors that didn’t change cancel. To efficiently score: Proposal method runs.

41 Key Operation: Scoring a Proposal
11/12/2018 Key Operation: Scoring a Proposal Acceptance probability ~ ratio of model scores. Scores of factors that didn’t change cancel. To efficiently score: Proposal method runs. Automatically build a list of variables that changed.

42 Key Operation: Scoring a Proposal
11/12/2018 Key Operation: Scoring a Proposal Acceptance probability ~ ratio of model scores. Scores of factors that didn’t change cancel. To efficiently score: Proposal method runs. Automatically build a list of variables that changed. Find factors that touch changed variables

43 Key Operation: Scoring a Proposal
11/12/2018 Key Operation: Scoring a Proposal Acceptance probability ~ ratio of model scores. Scores of factors that didn’t change cancel. To efficiently score: Proposal method runs. Automatically build a list of variables that changed. Find factors that touch changed variables Find other (unchanged) variables needed to calculate those factors’ scores

44 Key Operation: Scoring a Proposal
11/12/2018 Key Operation: Scoring a Proposal Acceptance probability ~ ratio of model scores. Scores of factors that didn’t change cancel. To efficiently score: Proposal method runs. Automatically build a list of variables that changed. Find factors that touch changed variables Find other (unchanged) variables needed to calculate those factors’ scores

45 Key Operation: Scoring a Proposal
11/12/2018 Key Operation: Scoring a Proposal Acceptance probability ~ ratio of model scores. Scores of factors that didn’t change cancel. To efficiently score: Proposal method runs. Automatically build a list of variables that changed. Find factors that touch changed variables Find other (unchanged) variables needed to calculate those factors’ scores How to find factors from variables & vice versa? In BLOG, rich, highly-indexed data structure stores mapping variables  factors But complex to maintain as structure changes

46 Imperativ-ism #2: Model Structure
11/12/2018 Imperativ-ism #2: Model Structure Maintain no map structure between factors and variables Finding factors is easy. Usually # templates < 50. Primitive operation: Given factor template and one changed variable, find other variables In factor Template object, define imperative methods that do this. complete1(v1) returns (v1,v2,v3) complete2(v2) returns (v1,v2,v3) complete3(v3) returns (v1,v2,v3) I.e., use Turing-complete language to determine structure on the fly. If you want to use a data structure instead, access it in the method. If you want a higher-level language for specifying structure, write it terms of this primitive. Other nice attribute Easy to do value-conditioned structure. Case Factor Diagrams, etc. Not only avoid unrolling, don’t even allocate all factors for current config. Avoid object look-up by index! (Hairy when objects created/destroyed)

47 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] object StateTransitionTemplate extends Template2[Label,Label] Labels Words

48 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] object StateTransitionTemplate extends Template2[Label,Label] Labels Words

49 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] { def complete1(label:Label) = List((label, label.token)) object StateTransitionTemplate extends Template2[Label,Label] Labels Words

50 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] { def complete1(label:Label) = List((label, label.token)) def complete2(token:Token) = new Error // Tokens shouldn’t change object StateTransitionTemplate extends Template2[Label,Label] Labels Words

51 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] { def complete1(label:Label) = List((label, label.token)) def complete2(token:Token) = new Error // Tokens shouldn’t change object StateTransitionTemplate extends Template2[Label,Label] { def complete1(label:Label) = List((label, label.next))

52 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] { def complete1(label:Label) = List((label, label.token)) def complete2(token:Token) = new Error // Tokens shouldn’t change object StateTransitionTemplate extends Template2[Label,Label] { def complete1(label:Label) = List((label, label.next)) def complete2(label:Label) = List((label.prev, label))

53 Imperativ-ism #3: Neighbor-Sufficient Map
11/12/2018 Imperativ-ism #3: Neighbor-Sufficient Map “Neighbor Variables” of a factor Variables touching the factor “Sufficient Statistics” of a factor Vector, dot product with weights of log-linear factor  factor’s score Usually confounded. Separate them. Skip-chain NER. Instead of 5x5 parameters, just 2. (label1, label2)  label1 == label2 Labels Words Bill loves Paris Bill the painter ... PER O LOC PER O O

54 Example: Linear-Chain CRF for Segmentation
11/12/2018 Example: Linear-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] { def complete1(label:Label) = List((label, label.token)) def complete2(token:Token) = new Error // Tokens shouldn’t change object StateTransitionTemplate extends Template2[Label,Label] { def complete1(label:Label) = List((label, label.next)) def complete2(label:Label) = List((label.prev, label))

55 Example: Skip-Chain CRF for Segmentation
11/12/2018 Example: Skip-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] { def complete1(label:Label) = List((label, label.token)) def complete2(token:Token) = new Error // Tokens shouldn’t change object StateTransitionTemplate extends Template2[Label,Label] { def complete1(label:Label) = List((label, label.next)) def complete2(label:Label) = List((label.prev, label)) object SkipTemplate extends Template1[Bool] with Neighbors[Label,Label]

56 Example: Skip-Chain CRF for Segmentation
11/12/2018 Example: Skip-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] { def complete1(label:Label) = List((label, label.token)) def complete2(token:Token) = new Error // Tokens shouldn’t change object StateTransitionTemplate extends Template2[Label,Label] { def complete1(label:Label) = List((label, label.next)) def complete2(label:Label) = List((label.prev, label)) object SkipTemplate extends Template1[Bool] with Neighbors[Label,Label]{ def complete1(label:Label) = for (other <- label.seq) if (label.token == other.token) yield (label,other)

57 Example: Skip-Chain CRF for Segmentation
11/12/2018 Example: Skip-Chain CRF for Segmentation class Label(isBeg:boolean) extends Bool(isBeg) with VarSeq { val token : Token } class Token(word:String) extends EnumVariable(word) with VarSeq { val label : Label def longerThanSix = word.length > 6 // Factor templates object StateTemplate extends Template1[Label] object StateTokenTemplate extends Template2[Label,Token] { def complete1(label:Label) = List((label, label.token)) def complete2(token:Token) = new Error // Tokens shouldn’t change object StateTransitionTemplate extends Template2[Label,Label] { def complete1(label:Label) = List((label, label.next)) def complete2(label:Label) = List((label.prev, label)) object SkipTemplate extends Template1[Bool] with Neighbors[Label,Label]{ def complete1(label:Label) = for (other <- label.seq) if (label.token == other.token) yield def sufficient(l1:Label,l2:Label) = l1 == l2

58 Extensibility Many variables types provided: Create new ones!
11/12/2018 Extensibility Many variables types provided: boolean, int, float, String, categorical,... Create new ones! set-valued variable finite-state machine as a variable [JHU] Create new factor types Poisson, Dirichlet,...

59 Parameter Estimation in Large State Spaces
11/12/2018 Parameter Estimation in Large State Spaces Most methods require calculating gradient of log-likelihood, P(y1, y2, y3,... | x1, x2, x3,...)... ...which in turn requires “expectations of marginals,” P(y1| x1, x2, x3,...) But, getting marginal distributions by sampling can be inefficient due to large sample space. Alternative: Perceptron. Approx gradient from difference between true output and model’s predicted best output. But, even finding model’s predicted best output is expensive. Our Method: “Sample Rank” [Culotta, Wick, Hall, McCallum, HLT 2007] Learn to rank intermediate solutions P(y1=1, y2=0, y3=1,... | ...) > P(y1=0, y2=0, y3=1,... | ...) The main issue these approximations try to address is space complexity of training.

60 Ranking Intermediate Solutions Example
11/12/2018 Ranking Intermediate Solutions Example 3. ∆ Model = 10 ∆ Truth = -0.1 5. ∆ Model = 3 ∆ Truth = 0.3 1. 2. ∆ Model = -23 ∆ Truth = -0.2 4. ∆ Model = -10 ∆ Truth = -0.1 UPDATE Like Perceptron: Proof of convergence under Marginal Separability Parameters must correctly rank incorrect solutions.

61 FACTORIE Summary Factor graphs, ...scalable ...with imperative hooks
11/12/2018 FACTORIE Summary Factor graphs, ...scalable factors created on demand, only score diffs ...with imperative hooks jump function, override variable.set() for coordination model structure neighbor variables  sufficient statistics ...discriminative efficient online training by sample rank Combine declarative & procedural knowledge

62 11/12/2018

63

64 11/12/2018 Logic in FACTORIE Easy to implement first-order logic primitives imperatively... ForAll(collection, test) e.g. all mentions are capitalized ForAll(mentions, m => m.isCapitalized) e.g. all mentions in an entity are capitalized ForAll(entity.mentions, m => m.isCapitalized)

65 Ranking vs Classification Training
11/12/2018 Ranking vs Classification Training Instead of training [Powell, Mr. Powell, he] --> YES [Powell, Mr. Powell, she] --> NO ...Rather... [Powell, Mr. Powell, he] > [Powell, Mr. Powell, she] In general, higher-ranked example may contain errors [Powell, Mr. Powell, George, he] > [Powell, Mr. Powell, George, she]

66 Outline Information Extraction
11/12/2018 Outline Information Extraction entity extraction review: entity resolution, joint inference  accuracy++! entity resolution + canonicalization + schema alignment Science & Tools for Machine Learning & Inference new probabilisitic programming language: “FACTORIE” quickly training extractors when schema changes: “GE” Alignment for “Dipping” Learn CRF for robust alignment Structured Topic Models Fast on large collections: multi-core, compacted Incorporate arbitrary structured data: “DMR” Incorporate arbitrary graph structure: “GMRF”

67 11/12/2018 Contact: Andrew McCallum Possible Future Work Joint inference for information extraction, entity resolution, ... Probabilistic Programming Language Use Generalized Expectation to enable rapid training Learning alignments between DB and Text “Dipping a DB into a text collection” Use Generalized Expectation to inject domain knowledge Topic modeling over “structured data” + “structured graphs” Current work just scratches the surface Fast topic models parallelism special methods for streaming data Resource-bounded Information Gathering for Information Extraction

68 11/12/2018 End of Talk


Download ppt "11/12/2018 FACTORIE: Efficient Probabilistic Programming for Relational Factor Graphs via Imperative Declarations of Structure, Inference and Learning."

Similar presentations


Ads by Google