LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.

Slides:



Advertisements
Similar presentations
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Advertisements

Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.
LING 388: Language and Computers Sandiway Fong Lecture 9: 9/27.
Regular Expressions (RE) Used for specifying text search strings. Standarized and used widely (UNIX: vi, perl, grep. Microsoft Word and other text editors…)
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/12.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 3: 8/28.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/11.
Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Languages, grammars, and regular expressions
LING 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/26.
LING 388 Language and Computers Lecture 4 9/11/03 Sandiway FONG.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3.
Computational Language Finite State Machines and Regular Expressions.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/13.
LING 388: Language and Computers Sandiway Fong Lecture 11: 10/3.
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
LING 388: Language and Computers Sandiway Fong Lecture 10: 9/26.
1 Foundations of Software Design Lecture 22: Regular Expressions and Finite Automata Marti Hearst Fall 2002.
Topics Automata Theory Grammars and Languages Complexities
1 Overview Regular expressions Notation Patterns Java support.
Scripting Languages Chapter 8 More About Regular Expressions.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Last Updated March 2006 Slide 1 Regular Expressions.
Copyright © Cengage Learning. All rights reserved.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
System Programming Regular Expressions Regular Expressions
Globalisation & Computer systems Week 7 Text processes and globalisation part 1: Sorting strings: collation Searching strings and regular expressions Practical:
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
CSC312 Automata Theory Lecture # 2 Languages.
March 1, 2009 Dr. Muhammed Al-mulhem 1 ICS 482 Natural Language Processing Regular Expression and Finite Automata Muhammed Al-Mulhem March 1, 2009.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Finding the needle(s) in the textual haystack
LING 388: Language and Computers Sandiway Fong 9/20 Lecture 8.
1 Chapter 1 Introduction to the Theory of Computation.
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2006.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
What is a language? An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet. A string : a finite.
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
LING 388: Language and Computers Sandiway Fong 9/27 Lecture 10.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong. Administrivia Reading Homework – Chapter 3 of JM: Words and Transducers.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2010.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
Recursive Definations Regular Expressions Ch # 4 by Cohen
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Regular expressions and the Corpus Query Language Albert Gatt.
What is grep ?  % man grep  DESCRIPTION  The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
Introduction to Automata Theory Theory of Computation Lecture 6 Tasneem Ghnaimat.
Theory of Computation Lecture #
Looking for Patterns - Finding them with Regular Expressions
Week 14 - Friday CS221.
1.5 Regular Expressions (REs)
Presentation transcript:

LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15

Administrivia reminder –optional homework exercises (from lecture 5) –due tomorrow (usual rules apply) –for those of you who missed one or more questions on homework 1

Administrivia homework 2 –out next week –requires access to Microsoft Word –or an alternative Open Office (free download, see openoffice.org)

Today’s Topic Regular Expressions (RE)

Regular Expressions (formally) equivalent to –finite state automata (FSA), and –regular grammars used in –string pattern matching typically for a single word form search text: unix (e)grep, perl, microsoft word caution: –differences in notation and implementation Regular Grammars FSA Regular Expressions

Regular Expressions shorthand for describing sets of strings String –sequence of zero or more characters –(typically, unbroken by spaces) Examples –aaa –john –mary45 –NT$ –  (empty string)

Regular Expressions –shorthand string n –exactly n occurrences of string –n = 0,1,2,3,... examples –a 4 b 3 = aaaabbb –(uv) 2 = uvuv –((ab) 2 (ba) 2 ) 2 = ababbabaababbaba Note: –parentheses are used to group sequences of characters (strings)

Regular Expressions shorthand for describing sets of strings string + –set of one or more occurrences of string –i.e. the set {string 1, string 2, string 3,... } –Note: set is infinite examples –a + = {a, aa, aaa, aaaa, aaaaa, …} –(abc) + = {abc, abcabc, abcabcabc, …}

Regular Expressions shorthand for describing sets of strings string * –set of zero or more occurrences of string –i.e. the set {string 0, string 1, string 2, string 3,... } –string 0 =  (the empty string) examples –a * = {, a, aa, aaa, aaaa, …} –(abc) * = {, abc, abcabc, …} Note: –a a * = a + –a {, a, aa, aaa, aaaa, …} = {a, aa, aaa, aaaa, aaaaa, …} Language = a set of strings

Regular Expressions Wildcard Characters matches a range of characters. (period) matches any single character examples –. + ed = set of all strings of length 3 or greater containing ed and having at least one character preceding it worked bed pre-education ed education –. * fix = set of all strings of length 3 or greater containing fix prefix infix infixed suffix fix

Regular Expressions Wildcard Characters matches a range of characters [characters] (list of matching characters) matches any single character in the list examples –[s,z]ation organization organisation –[a-z] any character in the range lowercase a to z Note: not uppercase –[0-9] any digit

Regular Expressions: grep excerpts from the manpage –The caret ^ and the dollar sign $ are metacharacters that respectively match the empty string at the beginning and end of a line. –The symbol \b matches the empty string at the edge of a word –The symbols \ respectively match the empty string at the beginning and end of a word. terminology –word unbroken sequence of digits, underscores and letters

Regular Expressions: grep Excerpts from the manpage –A regular expression may be followed by one of several repetition operators: ? The preceding item is optional and matched at most once. * The preceding item will be matched zero or more times. + The preceding item will be matched one or more times. {n} The preceding item is matched exactly n times {n,} The preceding item is matched n or more times. {n,m} The preceding item is matched at least n times, but not more than m times.

Regular Expressions: GNU grep Excerpts from the manpage concatenation –Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated subexpressions. disjunction – Two regular expressions may be joined by the infix operator |; the resulting regular expression matches any string matching either subexpression.

Regular Expressions: Examples Regular Expression –gupp(y|ies) examples –guppy –guppies Regular Expression –beds? examples –bed –beds

Regular Expressions: Examples Example –\b99 matches 99 in “there are 99 bottles …” –but not in 99 in “there are 299 bottles …” –Note: $99 contains two words, so \b99 will match 99 here –word unbroken sequence of digits, underscores and letters

Regular Expressions: Examples Example (sheeptalk) –ba! –baa! –baaa! … regular expression –baa*! –ba+!

Regular Expressions: Microsoft Word terminology: –wildcard search

Regular Expressions: Microsoft Word