Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Grammar Extraction and Refinement from an HPSG Corpus Kiril Simov BulTreeBank Project (www.BulTreeBank.org) Linguistic Modeling Laboratory, Bulgarian.

Similar presentations


Presentation on theme: "1 Grammar Extraction and Refinement from an HPSG Corpus Kiril Simov BulTreeBank Project (www.BulTreeBank.org) Linguistic Modeling Laboratory, Bulgarian."— Presentation transcript:

1 1 Grammar Extraction and Refinement from an HPSG Corpus Kiril Simov BulTreeBank Project (www.BulTreeBank.org) Linguistic Modeling Laboratory, Bulgarian Academy of Sciences kivs@bultreebank.org ESSLLI'2002 Workshop on Machine Learning Approaches in Computational Linguistics August 5 - 9, 2002

2 2 Plan of the Talk DOP model An HPSG Corpus - definition Formalism for HSPG Extraction of HPSG Grammar from HPSG Corpus Refinement of an HPSG grammar Conclusion

3 3 DOP Model [Bod 1998] Grammar formalism for the target grammar Procedure for the construction of sentence analyses in the chosen grammar formalism Decomposition procedure, which extracts a grammar in the target grammar formalism from the structures in the corpus A performance model guiding the analysis of new sentences with respect to some desirable conditions

4 4 DOP Model (2) Two additional unspoken assumptions are: –The structures in the corpus are decomposable into the grammar formalism –The extracted grammar should neither overgenerate, nor undergenerate with respect to the training corpus This assumption refers to the quality of the corpus

5 5 Corpus in a Grammar Formalism A corpus C in a given grammatical formalism G is a sequence of analyzed sentences where each analyzed sentence is a member of the set of structures defined as a strong generative capacity of a grammar  in this grammatical formalism:  S. S  C  S  SGC(  ) and  S. S  C   S'.(S'   (  (S))  S'  C)

6 6 HPSG Corpus Strong Generative Capacity in HPSG is defined by (King 1999) and (Pollard 1999) in technically the same way In our work we consider the elements of Strong Generative Capacity in HPSG to be a special kind of feature graphs based on a logic of HPSG: King’s logic - SRL

7 7 Feature Graphs (1)    S,F,A  - SRL finite signature G = is a feature graph iff G is a directed, connected and rooted graph such that N is a set of nodes, V : N  F  N is a partial arc function,  is the root node, T : N  S is a total species assignment function

8 8 Feature Graphs (2) Some notions: Subsumption based on isomorphism Unification - there is no most general unifier Complete feature graphs - all information from signature is presented Paths Subgraphs

9 9 Feature Graphs (3) Feature graphs can be interpreted via translation to SRL clauses Exclusive matrixes can be represented as a set of feature graphs (exclusive set of graphs) An SRL finite theory wrt an SRL finite signature can be represented as a set of feature graphs A sentence analysis can be represented as a complete feature graph

10 10 Feature Graphs (4) Complete feature graphs are a good representation for an HPSG corpus Feature graphs are a good representation for an HPSG grammar (exclusive set of graphs) Important property: For each node in a graph in the corpus there exists exactly one graph in the grammar which subsumes the subgraph started on the node

11 11 Corpus Grammar A grammar  such that the corpus C is a subset of its strong generative capacity is called Corpus Grammar C  SGC(  ) In feature graph terms: For each complete graph in the corpus, the grammar contains a graph which subsumes it

12 12 Grammar Extraction (1) Grammar extraction from an HPSG corpus C is graph fragmentation operation which produces a set of graphs from which a grammar can be constructed. The result is a set of graphs - GF Each extracted fragment has to contain all features for the root node, and subsume at least one complete graph in the corpus

13 13 Grammar Extraction (2) The set GF is ordered by subsumption relation. The complete graphs from the corpus are at the bottom. Each set of graphs G such that for each complete graph M in GF there is at least one feature graph in G that subsumes M and G contains only graphs from GF is a corpus grammar of C

14 14 Grammar Extraction (3) All corpus grammar extracted in this way can be ordered by set inclusion of their strong generative capacity A grammar from this hierarchy can be chosen by specifying additional constraints over it such as: –it is the most general one that doesn’t overgenerate or undergenerate over the corpus, or –it satisfies some external conditions like - the shortest inference over the corpus and etc

15 15 The set GF as a Grammar This is the original idea behind DOP Model GF contains all generalizations over the corpus GF will overgenerate over the corpus GF will accept ungrammatical sentences Thus a special inference mechanism is necessary in order to use GF as a grammar

16 16 Grammar Refinement In the process of creation of an HPSG corpus there is an HPSG grammar used by the annotators This grammar could be used as a starting point for extraction of a better grammar. This process is called Grammar Refinement We can choose the most general grammars that refine the original grammar as a new grammar

17 17 Conclusions We define an HPSG corpus as a set of complete graphs We define an HPSG grammar as a set of graphs We define a procedure for extraction of corpus grammars from the corpus We define a refinement of a grammar on the basis of a corpus


Download ppt "1 Grammar Extraction and Refinement from an HPSG Corpus Kiril Simov BulTreeBank Project (www.BulTreeBank.org) Linguistic Modeling Laboratory, Bulgarian."

Similar presentations


Ads by Google