Presentation is loading. Please wait.

Presentation is loading. Please wait.

Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad.

Similar presentations


Presentation on theme: "Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad."— Presentation transcript:

1 Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

2 Brief outline Dependency Paninian framework vibhakti-karaka correspondence karaka frames (basic + transformation) Source groups, demand groups Constraints Three basic constraints Constraints as Integer programming equations

3 Notions from Paninian Framework – a)Karaka relations It uses the notion of karaka relations between verbs and nouns in a sentence. The notion of karaka relations is central to the Paninian model. The karaka relations are syntactico-semantic (or semantico-syntactic) relations between the verbals and other related constituents in a sentence.

4 Notions from Paninian Framework – Demand Frames For the task of karaka assignment, the core parser uses the fundamental principle of ' akanksha' (demand unit) and ' yogyata' (qualification of the source unit). Ex: CAwraH vixyAlayam gacCawi (student) (school) (go) Verb Frame for this form of “gacCawi”

5 Demand Frame Gam1: ------------------------------------------------------------------------------- arc-label necessity vibhakti lex-type src-pos arc-dir ----------------------------------------------------------------------------- K1m1n l ds K2m2n l ds K3m3n l ds K5m5n l ds

6 Constraint Based Parsing Computational Paninian Model Integer Programming with basic constraints For each mandatory karakas in a karaka chart there should be exactly one outgoing edge labelled by the karaka from the demand group For each of the desirable or optional karakas in a karaka chart there should be at most one outgoing edge labelled by the karaka from the demand group There should be exactly one incoming arc into each of the source group

7 Parser Two stage strategy Stage I (Intra-clausal relations) Dependency relations marked Relations such as k1, k2, k3, etc. for each verb Stage II (Inter-clausal relations & conjunct relations) Conjuncts and relative clauses

8 Steps in Parsing Morph, POS tagging, Chunking SENTENCE Identify Demand Groups Load Frames & Transform Find Candidates Apply Constraints & Solve Final Parse Is Complex NO YES STAGE - II

9 Morph,Chunked,Tagged data (( 1 (( NP 1.1 CAwraH NN )) 2 (( NP 2.1 vixyAlayam NN )) 3 (( VGF 3.1 gacCawi VM ))

10 CAwraH vixyAlayam gacCawi

11 Demand Frame Gam1: ------------------------------------------------------------------------------- arc-label necessity vibhakti lex-type src-pos arc-dir ----------------------------------------------------------------------------- K1m1n l ds K2m2n l ds K3m3n l ds K5m5n l ds

12 k1 k2 CAwraH vixyAlayam gacCawi

13 Sanskrit Example CAwraH vixyAlayam gacCawi

14 Steps (Stage II) Identify New Demand Groups Load Frames & Transform Find Candidates Apply Constraints & Solve FINAL PARSE Repair Output of STAGE - I

15 Example – Relative Clause vaha puswaka jo rAma ne mohana ko xI hE prasixXa hE that book which Ram ERG. Mohana DAT. gave is famous is ‘The book which Ram gave to Mohana is famous’

16 Output after Stage - I xI puswaka mohanarAma k2 k4 k1 _ROOT_ jo hE k1 prasixXa k1s main vaha

17 Identify the demand group xiyA ‘give’ Main verb of the relative clause

18 Identify the demand group, Load and Transform DF jo ‘which’ transformation (special) Transforms the demand frame of the main verb of the relative clause -------------------------------------------------------------------------------------------------------------- arc-label necessity vibhakti lextype src-pos arc-dir oprt -------------------------------------------------------------------------------------------------------------- nmod__relc m any n r|l p insert --------------------------------------------------------------------------------------------------------------

19 Karaka Frame vaha puswaka jo rAma ne mohana ko xI prasixXa hE | that book which Ram ERG. Mohana DAT. gave famous is ‘The book which Ram gave to Mohana is famous’ Main verb of relative clause -------------------------------------------------------------------------------------------------------- arc-label necessity vibhakti lextype src-pos arc-dir oprt -------------------------------------------------------------------------------------------------------- nmod__relc m any n r|l p insert --------------------------------------------------------------------------------------------------------- Transformed frame for xe after applying the jo trasformation New row inserted after transformation

20 Possible candidates vaha puswaka jo rAma ne mohana ko xI hE prasixXa hE | nmod__relc

21 Output after Stage - II xiyA hE vaha puswaka mohana rAma k2 k4 k1 _ROOT_ jo hE k1 prasixXa k1s nmod__relc main

22 Example II – Coordination rAma Ora siwA kala Aye | Ram and Sita yesterday came ‘Ram and Sita came yesterday’

23 Output of Stage - I rAma _ROOT_ Aye k1 siwA Ora kala k7t dummy main

24 For Stage – II (Constraint Graph) rAma _ROOT_ Aye k1 siwA Ora kala main k7t ccof

25 Candidate Arcs rAma _ROOT_ Aye k1 siwA Ora kala main k1 ccof

26 Solution Graph rAma _ROOT_ Aye siwA Ora kala k7t main k1 ccof

27 Parse tree Aye kalaOra k7t k1 _ROOT_ rAma siwA ccof main Output after Stage II

28 Results for Hindi

29 Results CBP: Results when only the first parse is considered CBP’’: When best parse of the first 25 parses are considered CBP was tested on 220 sentences These are the results published in IALP-2008

30 Work Progress in Sanskrit Existing Constraint Based parser for Sanskrit can parse simple sentences. Over 2000 demand charts Two stage parsing needs more development Experiments performed with 268 simple sentences Re-ranking of parses is not done,only the first parse is considered for results Results not very accurate due to data problems

31 Results in Sanskrit Labelled attachment score: 540 / 1213 * 100 = 44.52 % Unlabeled attachment score: 876 / 1213 * 100 = 72.22 % Label accuracy score: 566 / 1213 * 100 = 46.66 %

32 Treebank requirement Proper Gold tagged,chunked and dependency marked data for Sanskrit will improve the efficiency of the parser Annotation with proper tools It will also help us in using machine learning methods to train statistical parsers for Sanskrit

33 Further work on Constraint Based Parsing. Extension of the parser using treebank data Hybrid approaches Soft Constraints Pruning of the graph in data driven parsers using Constraint Graph Allow learning of the parser from the treebank data Better performance

34 What we expect From Data (( 1 (( NP 1.1 CAwraH NN )) 2 (( NP 2.1 vixyAlayam NN )) 3 (( VGF 3.1 gacCawi VM ))

35 THANKS!!


Download ppt "Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad."

Similar presentations


Ads by Google