An Introduction to Using Semgrex Chloé Kiddon. What is Semgrex? A java utility (in javanlp) for identifying patterns in Stanford JavaNLP SemanticGraph.

Slides:



Advertisements
Similar presentations
CS Data Structures I Chapter 6 Stacks I 2 Topics ADT Stack Stack Operations Using ADT Stack Line editor Bracket checking Special-Palindromes Implementation.
Advertisements

Stacks, Queues, and Linked Lists
Sequence of characters Generalized form Expresses Pattern of strings in a Generalized notation.
An Introduction to XML Based on the W3C XML Recommendations.
Data Types in Java Data is the information that a program has to work with. Data is of different types. The type of a piece of data tells Java what can.
The Wonderful World of Tregex
Selection Statements choice of one among several blocks of code Java supports 3 kinds of selection statements: if statement – selects one block or leaves.
Fall 2007CS 2251 Stacks Chapter 5. Fall 2007CS 2252 Chapter Objectives To learn about the stack data type and how to use its four methods: push, pop,
Repetition Statements repeat block of code until a condition is satisfied also called loops Java supports 3 kinds of loops: while statement – repeats a.
Fall 2007CS 2251 Stacks Chapter 5. Fall 2007CS 2252 Chapter Objectives To learn about the stack data type and how to use its four methods: push, pop,
25-Jun-15 JavaScript Language Fundamentals II. 2 Exception handling, I Exception handling in JavaScript is almost the same as in Java throw expression.
CHAPTER 11 Searching. 2 Introduction Searching is the process of finding a target element among a group of items (the search pool), or determining that.
JavaScript, Third Edition
TokensRegex August 15, 2013 Angel X. Chang.
An Introduction to TokensRegex
Chapter 9 Domain Models. Domain Model in UML Class Diagram Notation A “visual dictionary”
Lecture Objectives  To learn how to use a tree to represent a hierarchical organization of information  To learn how to use recursion to process trees.
Lecture Objectives To understand how Java implements a stack To learn how to implement a stack using an underlying array or linked list Implement a simple.
1 The Map ADT © Rick Mercer. 2 The Map ADT  A Map is an abstract data type where a value is "mapped" to a unique key  Also known as Dictionary  Need.
INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.
COMPUTER PROGRAMMING. Control Structures A program is usually not limited to a linear sequence of instructions. During its process it may repeat code.
Week 6 - Wednesday.  What did we talk about last time?  Exam 1 post-mortem  Recursive running time.
CSCI 1100/1202 January 28, The switch Statement The switch statement provides another means to decide which statement to execute next The switch.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
Java Syntax and Style JavaMethods An Introduction to Object-Oriented Programming Maria Litvin Gary Litvin Copyright © 2003 by Maria Litvin, Gary Litvin,
Conditions in Java. First…Boolean Operators A boolean data type is always true or false. Boolean operators always return true or false For example: (x.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
XP New Perspectives on XML, 2 nd Edition Tutorial 8 1 TUTORIAL 8 CREATING ELEMENT GROUPS.
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
Computing and Statistical Data Analysis Lecture 2 Glen Cowan RHUL Physics Computing and Statistical Data Analysis Variables, types: int, float, double,
Java Programming Fifth Edition Chapter 5 Making Decisions.
Chapter 5: Making Decisions. Objectives Plan decision-making logic Make decisions with the if and if…else structures Use multiple statements in if and.
Overview of Previous Lesson(s) Over View  Algorithm for converting RE to an NFA.  The algorithm is syntax- directed, it works recursively up the parse.
Lecture Objectives  To understand how Java implements a stack  To learn how to implement a stack using an underlying array or linked list  Implement.
ICS3U_FileIO.ppt File Input/Output (I/O)‏ ICS3U_FileIO.ppt File I/O Declare a file object File myFile = new File("billy.txt"); a file object whose name.
Interfaces F What is an Interface? F Creating an Interface F Implementing an Interface F What is Marker Interface?
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
1 b Boolean expressions b truth tables b conditional operator b switch statement b repetition statements: whilewhile do/whiledo/while forfor Lecture 3.
Restrictions Objectives of the Lecture : To consider the algebraic Restrict operator; To consider the Restrict operator and its comparators in SQL.
Week 10 - Wednesday.  What did we talk about last time?  Method example  Roulette simulation  Types in Java.
Justin Bare and Deric Pang with material from Erin Peach, Nick Carney, Vinod Rathnam, Alex Mariakakis, Krysta Yousoufian, Mike Ernst, Kellen Donohue Section.
Contents What is a trie? When to use tries
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
XML Schema – XSLT Week 8 Web site:
Chapter 4.  Variables – named memory location that stores a value.  Variables allows the use of meaningful names which makes the code easier to read.
Today… Operators, Cont. Operator Precedence Conditional Statement Syntax. Winter 2016CISC101 - Prof. McLeod1.
(c) University of Washington20-1 CSC 143 Java Trees.
Tarik Booker CS 122. What we will cover… Tables (review) SELECT statement DISTINCT, Calculated Columns FROM Single tables (for now…) WHERE Date clauses,
Java Basics Regular Expressions.  A regular expression (RE) is a pattern used to search through text.  It either matches the.
Information and Computer Sciences University of Hawaii, Manoa
Java Programming Fifth Edition
Excel IF Function.
3 Introduction to Classes and Objects.
Computing and Statistical Data Analysis Lecture 2
Selection (also known as Branching) Jumail Bin Taliba by
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lecture 3- Decision Structures
Chapter 3 Branching Statements
Expressions and Control Flow in JavaScript
Microsoft Visual Basic 2005 BASICS
Stacks Chapter 5 Adapted from Pearson Education, Inc.
Starting JavaProgramming
CSE 311: Foundations of Computing
CMSC 202 Java Primer 2.
COMPUTER 2430 Object Oriented Programming and Data Structures I
CS2011 Introduction to Programming I Selections (II)
CISC124 Labs start this week in JEFF 155. Fall 2018
CSC 143 Java Trees.
Problem 1 Given n, calculate 2n
Presentation transcript:

An Introduction to Using Semgrex Chloé Kiddon

What is Semgrex? A java utility (in javanlp) for identifying patterns in Stanford JavaNLP SemanticGraph structure Much like Tregex, which does this for tree structures (Levy, Andrew 2006) and is based on tgrep-2 style syntax and functionality. (These slides adapted from the structure of theirs) Applied the same way you use regular expressions to find patterns in strings bought Bob shirt nsubj dobj a red det amod {tag:/VB.*/} >dobj ({} >amod {lemma:red}) Ex.

Semgrex Overview SemgrexPatterns are composed of nodes, representing IndexedWords, and relations between them, representing edges in a SemanticGraph SemgrexMatchers can be used on singular SemanticGraphs OR on two SemanticGraphs and an Alignment between them Ex. an RTE problem has the hypothesis graph, the text graph, and the alignment from the hypothesis graph’s IndexedFeatureLabels to the text graph’s IndexedFeatureLabels SemgrexPatterns return matches for IndexedFeatureLabels in a SemanticGraph

Syntax - Nodes Nodes are represented as {attr1:value1;attr2:value2;…} Attributes are regular strings; values can be strings or regular expressions marked by “/”s {lemma:run;pos:/VB.*/} => any verb form of the word “run” {} is any node in the graph {$} is any root in the graph {#} is the empty word (IndexedFeatureLabel.NO_WORD) Comes up when working with alignments Descriptions can be negated with ! !{lemma:boy} => any word that isn’t “boy”

Grouping Nodes Perhaps you want a node that is either word with an ner TIME tag, or the lemma “when”. The node {ner:TIME;lemma:when} does not accomplish this OR operation Can use brackets and | (or &) to specify these groupings [ {lemma:locate} | {ner:LOCATION} ] A node that is either a word with a lemma “locate” or a word with LOCATION ner Can also be negated by putting a ! In front By default, & takes precedence over |, but & has no reason to be used

Syntax - Relations Relationships between nodes can be specified Relations in Semgrex have two parts: the relation symbol and the relation type: i.e. <nsubj A <reln B : A is the dependent of a reln relation with B A >reln B : A is the governor of a reln relation with B A gov chain from A that is the dependent of a reln relation with B A >>reln B : There is some node in a gov>dep chain from A that is the governor of a reln relation with B B : A is aligned to B through an Alignment object Relation types can be regular strings or regular expressions encased by “/”

Building complex expressions Relations can be strung together for “and” All relations are relative to first node in string {} >nsubj {} >dobj {} “A node that is the governor of both an nsubj relation and a dobj relation” & symbol is optional: {} >nsubj {} & >dobj {} Nodes can be grouped w/ parentheses ({} <nsubj {}) “A noun that is aligned to a node that is the dependent of an nsubj relation ” Not the same as {} <nsubj {}

Other Operators on Relations Operators can be combined via “or” with | Ex: {} <agent {} | <nsubj {} “A node that is either an agent or a nsubj in the graph” Like with nodes, & takes precedence over | Ex: {} amod {lemma:red} “An agent node OR a subject modified by the word ‘red’” Equivalent operators are left-associative Any relation can be negated with “!” prefix Ex: {tag:/VB.*/} {tag:/VB.*/} “An verb that is not aligned to another verb”

Other Operators on Relations For times when the pattern will be being matched on a pair of graphs and their alignment, the default search point is the graph that where the alignments are from To override this, place a at the beginning of the pattern Ex: for a hypGraph, txtGraph and alignment hyp->txt {} Represents all LOCATION nodes in the hypGraph aligned to nodes in the {} Represents all LOCATION nodes in the txtGraph that are aligned to nodes in the hypGraph

Grouping relations To specify operation order, use [ and ] Ex: {tag:nn} [ <prep_in {} | <prep_on {} {#} “A noun that is the dependent of either a prep_in or prep_on relation and is aligned to NO_WORD” Grouped relations can be negated Just put ! before the [

Named Relations Suppose we want to find two nodes connected by any relation which have a pair of nodes aligned to them with the same relation Name relations with = The first showing of a named relation in a pattern is the one that is stored as the relation ({} >/.*subj|agent/=reln ({} >=reln {}) We can retrieve the string form of the relation found in the graph later by using that name

Named Nodes We can name nodes as well as relations Name nodes with = and if the node matches, we can retrieve node by that name Ex: {} <nsubj {}=verb Verb with subject found by this pattern is stored by the name “verb” The first showing of a named node in the pattern is the one stored under that name. All others must be equal to that first one Ex. ({} >nsubj ({} >nsubj {}=subject)) Finds a node that is both the governor of an nsubj relation to a node called “subject” and aligned to a node that is the governor of an nsubj relation to a node aligned to the node labeled as “subject”

Optional Relations to Nodes Sometimes we want to try to match a sub- expression to retrieve named nodes if they exist, but still match if sub-expression fails. Use optional relation prefix ‘?’ Ex: {} >/nsubj|agent/ {}=subject ?>/.*obj/ {}=object Matches nodes that are governors of nsubj or agent relations If the node also is the governor of some sort of object relation, then, we can retrieve the object using the key “object” If there is no object, the expression will still match Cannot be combined with negation Can be used in front of bracketed relations: ?[….]

Use of Semgrex classes Semgrex usage is like java.util.regex Two ways of calling the matcher: for a single SemanticGraph or for two SemanticGraphs and an Alignment between them SemgrexMatcher m = p.matcher(hypGraph, alignment, txtGraph); while (m.find()) { System.out.println(m.getMatch().word()); } String s = “ ({} >nsubj ({} >nsubj {}=subject))” SemgrexPattern p = SemgrexPattern.compile(s); SemgrexMatcher m = p.matcher(graph);

Use of Semgrex classes IndexedFeatureLabel subj = m.getNode(“subject”); String subjReln = m.getRelnString(“subjReln”); Named nodes are retrieved w/ getNode() Named relations are retrieved w/ getRelnString()

A Real Code Example - Before private void checkCopula(Problem problem, SemanticGraph hypGraph, SemanticGraph txtGraph) { IndexedFeatureLabel root = hypGraph.getFirstRoot(); IndexedFeatureLabel subj = hypGraph.getChildWithReln(root, "nsubj"); if (subj != null) { IndexedFeatureLabel alignedRoot = problem.getTxtWord(root); if (alignedRoot != IndexedFeatureLabel.NO_WORD){ IndexedFeatureLabel appos = txtGraph.getChildWithReln(alignedRoot, "appos"); List appositionList; try { appositionList = txtGraph.getChildrenWithReln(problem.getTxtWord(subj), "nn"); } catch (IllegalArgumentException e) { appositionList = new ArrayList (); } if(appos != null) { if(problem.getTxtWord(subj).equals(appos)) { problem.addFeature(this, Feature.APPOSITION_MATCH, "apposition in text between " + root.word() + " and " + subj.word()); } else { problem.addFeature(this, Feature.APPOSITION_MISMATCH, "no apposition in text between " + root.word() + " and " + subj.word()); } else if (!appositionList.isEmpty()) { boolean appositionPositiveFiring = false; for (IndexedFeatureLabel apposition : appositionList) { if (alignedRoot.equals(appos)) { problem.addFeature(this, Feature.APPOSITION_MATCH, "apposition in text between " + root.word() + " and " + subj.word()); appositionPositiveFiring = true; break; } if (!appositionPositiveFiring) { problem.addFeature(this, Feature.APPOSITION_MISMATCH, "no apposition in text between " + root.word() + " and " + subj.word()); }

A Real Code Example - After private void checkCopula(Problem problem, SemanticGraph hypGraph, SemanticGraph txtGraph) { IndexedFeatureLabel root = hypGraph.getFirstRoot(); if (checkAttributiveStructure(hypGraph) && !checkAttributiveStructure(txtGraph)) { if(VERBOSE) System.err.println("in check copula"); SemgrexPattern copulaPat = SemgrexPattern.compile("({}=subj nn {}=alignedRoot] | [<appos {}=alignedRoot]])"); SemgrexMatcher copulaMatcher = copulaPat.matcher(hypGraph, problem.getAlignment(), txtGraph); if (copulaMatcher.find()) { problem.addFeature(this, Feature.APPOSITION_MATCH, "apposition in text between " + copulaMatcher.getNode("root").word() + " and " + copulaMatcher.getNode("subj").word()); } else { problem.addFeature(this, Feature.APPOSITION_MISMATCH, "no apposition in text between " + copulaMatcher.getNode("root").word() + " and " + copulaMatcher.getNode("subj").word()); }

For More Help… More information and links to other sources of documentation are available at nlp.stanford.edu/software/tregex.shtml If you find a bug (i.e. a pattern that should work but doesn’t) or need more help,

Thanks!