Netgraph – a Tool for Searching in the Prague Dependency Treebank 2.0 Defence of the Doctoral Thesis, Prague, September 3 rd, 2008 Author: Mgr. Jiří Mírovský.

Slides:



Advertisements
Similar presentations
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Advertisements

Unsupervised Dependency Parsing David Mareček Institute of Formal and Applied Linguistics Charles University in Prague Doctoral thesis defense September.
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
The Language Model in Bulgarian Treebank (BulTreeBank) Petya Osenova (Sofia) , Prague.
Prague Arabic Dependency Treebank Center for Computational Linguistics Institute of Formal and Applied Linguistics Charles University in Prague MorphoTrees.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
Programming Languages An Introduction to Grammars Oct 18th 2002.
Creation of a Russian-English Translation Program Karen Shiells.
C o n f i d e n t i a l HOME NEXT Subject Name: Data Structure Using C Unit Title: Trees.
Binary Trees Chapter 6.
Michigan Common Core Standards
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
PDT 2.0 Prague Dependency Treebank 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
COSC2007 Data Structures II
PDT Grammatemes and Coreference in the PDT 2.0 Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University in Prague.
LING 388: Language and Computers Sandiway Fong Lecture 17.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax.
Morphological Meanings in the Prague Dependency Treebank Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Querying Structured Text in an XML Database By Xuemei Luo.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague.
Database Systems Part VII: XML Querying Software School of Hunan University
Computer Science: A Structured Programming Approach Using C Trees Trees are used extensively in computer science to represent algebraic formulas;
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
Jan 2004CSA3050: NLG21 CSA3050: Natural Language Generation 2 Surface Realisation Systemic Grammar Functional Unification Grammar see J&M Chapter 20.3.
Proper Nouns in Czech Corpora Magda Ševčíková Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics.
Trees : Part 1 Section 4.1 (1) Theory and Terminology (2) Preorder, Postorder and Levelorder Traversals.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Dynamics of Binary Search Trees under batch insertions and deletions with duplicates ╛ BACKGROUND The complexity of many operations on Binary Search Trees.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute.
Suffix Tree 6 Mar MinKoo Seo. Contents  Basic Text Searching  Introduction to Suffix Tree  Suffix Trees and Exact Matching  Longest Common Substring.
Semantic annotation of a dialog corpus Silvie Cinková Institute of Formal and Applied Linguistics Charles University in Prague, Czech Republic COMPANIONS.
+7 (499) , Moscow pr. 60-letiya Oktyabrya, 9 SYSTEM FOR INTELLIGENT SEARCH AND ANALYSIS OF LARGE-SCALE TEXT COLLECTIONS Institute.
Linguistic Graph Similarity for News Sentence Searching
CSCE 210 Data Structures and Algorithms
CSC 594 Topics in AI – Natural Language Processing
Text Based Information Retrieval
Web News Sentence Searching Using Linguistic Graph Similarity
David Mareček and Zdeněk Žabokrtský
Prague Arabic Dependency Treebank
KD Tree A binary search tree where every node is a
A Statistical Model for Parsing Czech
Orthogonal Range Searching and Kd-Trees
Prague Dependency Treebank 2. 0 Zdeněk Žabokrtský Dept
Structural / Functional Site Diagramming
Algorithm Discovery and Design
Magnet & /facet Zheng Liang
Chapter 3: Selection Structures: Making Decisions
Minimax strategies, alpha beta pruning
Chapter 3: Selection Structures: Making Decisions
A Rule-Based Approach Towards Context-Aware User Notification Services
AntConc Search Wildcards (not Regex)
The development of PDT 3.0 Introduction to the discussion
Minimax strategies, alpha beta pruning
Presentation transcript:

Netgraph – a Tool for Searching in the Prague Dependency Treebank 2.0 Defence of the Doctoral Thesis, Prague, September 3 rd, 2008 Author: Mgr. Jiří Mírovský Charles University in Prague, Institute of Formal and Applied Linguistics Supervisor: Prof. RNDr. Jan Hajič, Dr. Charles University in Prague, Institute of Formal and Applied Linguistics Opponents: RNDr. Roman Ondruška, Ph.D. SUN Microsystems Ing. Alexandr Rosen, Ph.D. Charles University in Prague, Institute of Theoretical and Computational Linguistics

The Starting Point The Prague Dependency Treebank Users without programming skills Netgraph 1.0 Three sides existed whose connection needed to be solved:

The Work Analysis of the Requirements Usage of the Query Language Designing the Query Language The doctoral thesis consists of these main parts: Comparison to Other Search Systems Implementation in Netgraph

The Work Analysis of the Requirements Usage of the Query Language Designing the Query Language In this presentation, I will focus on these parts: Comparison to Other Search Systems Implementation in Netgraph

Prague Dependency Treebank 2.0 Layers in PDT 2.0

Linguistic Requirements Valency To study valency, the query language should be able to: control a presence of a particular type of son control number of sons control a non-presence of a son

Linguistic Requirements Coordination etc. to skip a node (etc. coordination, apposition) Tree dependency is not always linguistic dependency. We need:

Linguistic Requirements Complex Example of Coordination Czech: S čím mohou vlastníci i nájemci počítat, na co by se měli připravit? English (lit.): What can owners and tenants expect, what they should get ready for?

Linguistic Requirements Coordination etc. to skip a node (etc. coordination, apposition) Tree dependency is not always linguistic dependency. We need: even better: to set a linguistic dependency

Linguistic Requirements Predicative Complement Czech: Ze světové recese vyšly jako jednička Spojené státy. English: The United States emerged from the world recession as number one.

Linguistic Requirements Predicative Complement to match values unknown at the time of creating the query The dual dependency is represented by means of a reference to another node (attributes compl.rf and id ). We need:

Linguistic Requirements Coreferences (Grammatical and Textual) Represented by means of references (attributes coref_gram.rf and coref_text.rf (and id )). Again, we need: to match values unknown at the time of creating the query

Linguistic Requirements Topic-Focus Articulation We need to set references to other nodes with other relations than “equal to” ● Contextual boundness – attribute tfa ● Communicative dynamism – deep word order (attribute deepord )

Linguistic Requirements...other phenomena Focus proper – combination of references, non- existence of a node and transitive closure of dependency Rhematizers – closest left son, closest left brother (Non-)projectivity – multiple-tree query to combine several one-tree queries representing different orientations of non-projective edges

Linguistic Requirements...other phenomena Idioms etc. – searching in the linear form of the sentence (with regular expressions) Agreement – reference to only a part of a value of an attribute of another node (e.g. the fifth position of the morphological tag for case) Word order – measuring the horizontal distance between words

Linguistic Requirements Layers in PDT 2.0

Linguistic Requirements Accessing Lower Layers We need to have means of accessing the lower layers. Queries across the layers of annotation: A PATient less dynamic than an ACTor but on the left side from it in the sentence A PATient expressed with a preposition k and a noun in the dative

Linguistic Requirements Summary Evaluation of a node ● multiple attributes evaluation ● alternative values ● alternative nodes (alternative evaluation of the whole set of attributes) ● wild cards (regular expressions) ● negation, relations other than “equal to”

Linguistic Requirements Summary Dependencies between nodes (vertical relations) direct, transitive (existence, non-existence) vertical distance (from root, from one another) number of sons (zero for leaves)

Linguistic Requirements Summary Horizontal relations precedence, immediate precedence negation of it horizontal distance Secondary relations secondary dependencies, coreferences

Linguistic Requirements Summary Other features multiple-tree queries accessing several layers of annotation at the same time searching in the linear form of the sentence

Netgraph Query Language The Basics

Netgraph Query Language Meta-attributes Multi-tree queries References to attributes of other nodes Main additions to the query language of Netgraph 1.0: Hidden nodes for a multilayer access

Netgraph Query Language Meta-Attributes Attributes not present in the corpus, treated like normal attributes: _transitive (transitive edge) _optional (optional node(s)) _#sons (number of sons) _#hsons (number of hidden sons) _#descendants (number of nodes in the subtree)

Netgraph Query Language Meta-Attributes _#lbrothers (number of left brothers) _#rbrothers (number of right brothers) _depth (distance from the root) _#occurrences (exact number of a particular type of sons/descendants) _name (label of a node for references) _sentence (linear form of the sentence)

Using the Query Language Valency one ACTor, one PATient, no other sons at least one son (ACTor), no PATient

Using the Query Language Coordination etc. The CONJunction/DISJunction becomes optional.

a nominal predicative COMPLement with second dependency on a PATient Using the Query Language Predicative Complement

Using the Query Language Topic-Focus (Deep Word Order) a PATient in focus on the left side (less dynamic) from an ACTor in topic

Netgraph Query Language Hidden Nodes A tectogrammatical node Hidden nodes Analytical information Morphological information Tectogrammatical information

Using the Query Language Hidden Nodes – A Query a PATient expressed with a preposition k and a noun (”N...3.*”) in the dative (”N...3.*”)

Using the Query Language Hidden Nodes – A Result Tree

Netgraph As a Tool Graphical creation of the query Inverted matching Chained queries Main extensions to the tool since version 1.0: Displaying context trees Removing trees from the result

Using the Query Language Hidden Nodes – A Query a PATient expressed with a preposition k and a noun (”N...3.*”) in the dative (”N...3.*”)

Using the Query Language Hidden Nodes – A Result Tree

an ACTor less dynamic than a PATient, but on the right side from it on the surface Using the Query Language Hidden Nodes – A Query

an opposite dependency on the different layers Using the Query Language Hidden Nodes – A Query

Using the Query Language Hidden Nodes – A Result Tree

Improvements to the Query Language Structured Negation Today a PREDicate that does not govern an ACTor that governs a RSTR (a multi-tree query with relation AND)

Improvements to the Query Language Structured Negation – a Possible Future unclear meaning of the second query

Improvements to the Query Language Changing the Meaning of the Dependency a new meta-attribute _dependence might change the meaning of the dependency

Improvements to the Query Language Changing the Meaning of the Dependency combinations of meta-attributes have to be carefully thought over