Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.

Similar presentations


Presentation on theme: "Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments."— Presentation transcript:

1 Annotation for Hindi PropBank

2 Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments Tasks to be carried out Tools for annotation Timesheets, tips Practice Tasks to be carried out Tools for annotation Timesheets, tips Practice

3 Creation of Resources For machines rather than humans Imagine a dictionary/ thesaurus for computers A requirement for Natural Language Processing – Large annotated resources Annotation implies addition of linguistic information Tailored to language specific requirements Needs to be as consistent as possible – Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation For machines rather than humans Imagine a dictionary/ thesaurus for computers A requirement for Natural Language Processing – Large annotated resources Annotation implies addition of linguistic information Tailored to language specific requirements Needs to be as consistent as possible – Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation

4 Hindi-Urdu Treebank Project One of the first efforts to make a large-scale resource for Hindi-Urdu Similar resources exist for Chinese, Arabic and English Three main components – Hindi-Urdu dependency treebank – Hindi-Urdu PropBank – Hindi-Urdu phrase structure treebank [derived] One of the first efforts to make a large-scale resource for Hindi-Urdu Similar resources exist for Chinese, Arabic and English Three main components – Hindi-Urdu dependency treebank – Hindi-Urdu PropBank – Hindi-Urdu phrase structure treebank [derived]

5 PropBank PropBank resource creation at CU Boulder We annotate semantic information on top of syntactic information PropBank involves annotation of predicate argument structure – Mainly concerned with verbs & their arguments – And the semantic nature of the arguments PropBank resource creation at CU Boulder We annotate semantic information on top of syntactic information PropBank involves annotation of predicate argument structure – Mainly concerned with verbs & their arguments – And the semantic nature of the arguments

6 What are verbs? Verbs are predicating elements e.g daud, pii, baras etc Encode (very broadly) actions and states Also have two kinds of grammatical information – Tense, aspect (present, future ; perfect, continuous) – Gender, number, person (masc/fem; sing, pl; 1 st, 2 nd, 3 rd ) Verbs are predicating elements e.g daud, pii, baras etc Encode (very broadly) actions and states Also have two kinds of grammatical information – Tense, aspect (present, future ; perfect, continuous) – Gender, number, person (masc/fem; sing, pl; 1 st, 2 nd, 3 rd )

7 What are arguments? In a sentence, e.g Ram ate an apple / Raam ne seb khaaya: – A verb, ‘eat’ or ‘khaa’ predicate – A person eating ‘Raam’ ARGUMENT – Thing eaten ‘apple’ / ‘seb’ ARGUMENT Without arguments, the meaning of the verb ‘ate’ is not realized completely Together, they make up the predicate argument structure of the sentence In a sentence, e.g Ram ate an apple / Raam ne seb khaaya: – A verb, ‘eat’ or ‘khaa’ predicate – A person eating ‘Raam’ ARGUMENT – Thing eaten ‘apple’ / ‘seb’ ARGUMENT Without arguments, the meaning of the verb ‘ate’ is not realized completely Together, they make up the predicate argument structure of the sentence

8 Arguments show what’s important Raam ne jaldi se seb khaaya – Raam, seb are arguments – But ‘jaldi se’ is not It’s all about the verb – It projects its need for certain arguments – Sift what’s mandatory from what’s optional Raam ne jaldi se seb khaaya – Raam, seb are arguments – But ‘jaldi se’ is not It’s all about the verb – It projects its need for certain arguments – Sift what’s mandatory from what’s optional

9 Like Unix commands Some commands require only one argument. – cd /home/student/ashwini – cp hmwk1.txt hmwk2.txt If the command is typed with too many or too few arguments… Some commands require only one argument. – cd /home/student/ashwini – cp hmwk1.txt hmwk2.txt If the command is typed with too many or too few arguments…

10 Error!

11 Making information explicit As speakers of Hindi or English, we already have knowledge of predicate argument structure E.g. hari ___ pahuMcaa – Capturing this knowledge for the machine is essential – Ram ne seb khaaya aur paani piyaa – Who drank the water? As speakers of Hindi or English, we already have knowledge of predicate argument structure E.g. hari ___ pahuMcaa – Capturing this knowledge for the machine is essential – Ram ne seb khaaya aur paani piyaa – Who drank the water?

12 Identify arguments In PropBank, we first identify arguments of a verb When explicitly present, they are called ARG Further, they are numbered as ARG0, ARG1, ARG2 etc. Often, you have ARG as well as ARG-M – Ram ARG0 ne jaldi se ARG-M seb ARG1 khaaya In PropBank, we first identify arguments of a verb When explicitly present, they are called ARG Further, they are numbered as ARG0, ARG1, ARG2 etc. Often, you have ARG as well as ARG-M – Ram ARG0 ne jaldi se ARG-M seb ARG1 khaaya

13 Null arguments What if arguments are not explicit? – E.g Ram ne seb khaaya aur ___ paani piyaa – Ram is also the person drinking water – It can be dropped, because of conjunction aur – For the machine, it must be retrieved from the sentence We also mark these missing or null arguments What if arguments are not explicit? – E.g Ram ne seb khaaya aur ___ paani piyaa – Ram is also the person drinking water – It can be dropped, because of conjunction aur – For the machine, it must be retrieved from the sentence We also mark these missing or null arguments

14 Tasks to be carried out Null argument insertion Argument annotation Null argument insertion Argument annotation

15 Tools to be used Sanchay – GUI for annotators. We use it especially for Null argument insertion Use your verbs account to access Sanchay Wiki for annotator resources Sanchay – GUI for annotators. We use it especially for Null argument insertion Use your verbs account to access Sanchay Wiki for annotator resources

16 Timesheets & tips Being honest about filling out timesheets is quite important We can access the amount of time you spend on verbs I will ask you to keep track of number of annotations per hour to cross check Turn in the timesheets at my CINC mailbox in physical form, with your signature Being honest about filling out timesheets is quite important We can access the amount of time you spend on verbs I will ask you to keep track of number of annotations per hour to cross check Turn in the timesheets at my CINC mailbox in physical form, with your signature

17 Practice We need to learn about four kinds of empty categories Plan to proceed – Recognizing syntactic constructions – Getting familiar with the tool – Practice with the corpus – Q & A based on null argument insertion We need to learn about four kinds of empty categories Plan to proceed – Recognizing syntactic constructions – Getting familiar with the tool – Practice with the corpus – Q & A based on null argument insertion


Download ppt "Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments."

Similar presentations


Ads by Google