Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inducing Relations 1 Document 1... Boston was founded on November 17, 1630, by Puritan colonists from England... Document 2 Document 3... New York City.

Similar presentations


Presentation on theme: "Inducing Relations 1 Document 1... Boston was founded on November 17, 1630, by Puritan colonists from England... Document 2 Document 3... New York City."— Presentation transcript:

1 Inducing Relations 1 Document 1... Boston was founded on November 17, 1630, by Puritan colonists from England... Document 2 Document 3... New York City was settled by Europeans from The Netherlands in 1624...... San Francisco was founded in 1776 by the Spanish conquerors... Goal: Discover types of information salient to a domain and extract short phrases representing them

2 Application: Creating Metadata Machine-readable access mechanism for searching, browsing, and retrieving text 2 Medical Records LocationRotator cuff SeverityMild Tear Length2mm Disaster Reports InjuriesNone LocationMelbourne, FL TimeTuesday morning

3 Application: Generating Infoboxes Exploring the important attributes of new domains automatically 3 CambridgeSeattle

4 Regularities for Learning Relations Local lexical and orthographic similarity in expression of relation instances 4 Recurring specific syntactic patterns in relation occurrence … injured six people … injured 16 relief workers … four were hurt {number} injur* Evoking word Relation phrase Similar document-level positioning of relations injured killed sixThree

5 Note – this is the example for the document-level stuff for the slide above 5 A strong earthquake with a magnitude of 5.6 rocked the easternmost province of Irian Jaya on Friday. An earthquake of magnitude 6 is considered “severe,” capable of widespread damage near the epicenter. Beginning of Document End of Document

6 Highlights of the Approach Novel source of supervision: declarative human knowledge about constraints Rich combination of information sources that combines multiple layers of linguistic analysis Mathematical formalism that guides unsupervised learning with human knowledge

7 Indicators: is_verb010 earthquake100 hit010... Arguments: has_capital011 is_number000 height112... Input Representation Each potential indicator word and argument phrase encoded with features

8 8 injured VBN six people + discourse context VP S S NP syntactic context Output Representation Relation instances as indicator word and argument phrase pairings.

9 Our Two-Pronged Approach Model Structure: A generative model of hidden indicator and argument structure – Models local lexical and syntactic similarity – Biases toward consistent document-level structure Soft declarative constraints: Enforced during inference via posterior regularization – Restricts global syntactic patterns – Enforces relation instance uniqueness 9

10 Generating Indicators and Arguments For a single relation type, indicators and arguments drawn from relation-specific feature distributions 10 : parameters of indicator feature distributions : parameters of argument feature distributions

11 Backoff Distributions Remaining constituents generated from backoff feature distributions 11 : parameters of indicator feature distributions : parameters of argument feature distributions

12 Multiple Relations (maybe delete if no fit) 12 Each relation has its own Constituent features drawn from pointwise product over all : either indicator or backoff for each : either argument or backoff for each

13 Selecting Relation Locations Relation instance locations within document drawn from shared distribution Indicator and argument within sentence selected uniformly at random 13 Document 1 Sentence 1. Sentence 2. Sentence 3. Sentence 4. Document 2 Sentence 1. Sentence 2. Document 3 Sentence 1. Sentence 2. Sentence 3. Document 4 Sentence 1. Sentence 2. Sentence 3.

14 Summary of Generative Process 1.For each relation : a.Draw indicator, argument, and backoff distributions: b.Draw location distribution: 14

15 (continuation of previous slide) 2.For each document : a.For each relation : i.Select a sentence (or null): ii.Draw argument and indicator positions uniformly at random within sentence b.For each potential indicator word : i.Draw indicator features: is if this word selected as indicator, otherwise c.For each potential argument phrase : i.Draw argument features: 15

16 Model Properties Lexical similarity – Via features Recurring syntactic patterns – Via features and constraints during learning Regularities in document-level structure – Via document location distribution Issue: how do we break symmetry between relations? – Via constraints during learning 16

17 Variational Inference with Declarative Constraints Desired posterior: Optimize variational objective with mean field factorization: Model parameters Hidden structure (relations) Observed data (words, trees)

18 Syntactic Constraints 18 Counts number of relations in that match canonical syntactic pattern Biases toward relations that are syntactically plausible – Indicator is verb and argument is object of indicator – Indicator is noun and argument is modifier – Indicator and argument are subject/object of same verb Threshold of relations that must match syntactic pattern (80%)

19 Separation Constraints (Argument) 19 Counts number of relations whose arguments include word : no more than one relation should share the same argument word Encourages relations to be diverse – Arguments cannot be shared...

20 Separation Constraints (Indicator) 20 Counts number of relations whose indicators include word : allow some relations to share indicator words Encourages relations to be diverse – Indicators can be shared to an extent

21 Experimental Setup Experiments on two news domains Example Document 21 CorpusNumber of Documents Sentences/ Document Words/ Document Relation Types Finance10012.1262.915 Earthquake2009.3210.39 A strong earthquake rocked the Philippines island of Mindoro early Tuesday, killing at least two people and causing some damage, authorities said. The 3:15 am quake had a preliminary magnitude of 6.7 and was centered near Baco on northern Mindoro Island, about 75 miles south of Manila, according to the Philippine Institute of Vulcanology and Seismology. The U.S. Geological Survey in Menlo Park, Calif., put the quake's preliminary magnitude at 7.1. Gov. Rodolfo Valencia of the island's Oriental Mindoro province said two people reportedly were killed and that several buildings and bridges were damaged by the quake. Several homes near the shore reportedly were washed away by large waves, Valencia told Manila radio station DZBB. Telephone service was cut, he said. The quake swayed tall buildings in Manila. Institute spokesman Aris Jimenez said the quake occurred on the Lubang fault, one of the area's most active. A magnitude 6 quake can cause severe damage if centered under a populated area, while amgnitude 7 quake indicates a major quake capable of widespread, heavy damage.

22 Extracted Relations Location Magnitude 22 A strong earthquake rocked the Philippines island of Mindoro early Tuesday, killing at least two people and causing some damage, authorities said. The 3:15 am quake had a preliminary magnitude of 6.7 and was centered near Baco on northern Mindoro Island... Time

23 Generic versus Domain-specific Knowledge Generic Feature Representation – Indicator: word, POS, word stem – Argument: word, syntax label, headword of parent, dependency label to parent Domain-specific knowledge (relation independent) – Finance: prefer arguments with numbers – Earthquake: prefer relations in first two sentences of each document 23

24 Main Results (Sentence F-score) FinanceEarthquake USP: Unsupervised semantic parsing(Poon and Domingos 2009) CLUTO: CLUTO sentence clustering Mallows: Mallows content model sentence clustering (Chen et al 2009)

25 Main Results (Token F-score) 25 FinanceEarthquake USP: Unsupervised semantic parsing(Poon and Domingos 2009)

26 Constraint Ablation Analysis What happens as we modify declarative constraints? No-sep:No separation constraints No-syn:No syntactic constraints Hard-syn:Always enforce syntactic constraints 26 FinanceEarthquake

27 What if we had Annotated Data? 27


Download ppt "Inducing Relations 1 Document 1... Boston was founded on November 17, 1630, by Puritan colonists from England... Document 2 Document 3... New York City."

Similar presentations


Ads by Google