CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.

Slides:



Advertisements
Similar presentations
Perl & Regular Expressions (RegEx)
Advertisements

Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
CS 330 Programming Languages 12 / 12 / 2006 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 02 / 18 / 2008 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 09 / 25 / 2006 Instructor: Michael Eckmann.
CS 330 Programming Languages 10 / 14 / 2008 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 02 / 04 / 2008 Instructor: Michael Eckmann.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 02 / 12 / 2007 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 10 / 14 / 2009 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 02 / 22 / 2008 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 10 / 04 / 2006 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 02 / 20 / 2008 Instructor: Michael Eckmann.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 02 / 24 / 2010 Instructor: Michael Eckmann.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
CS 106 Introduction to Computer Science I 02 / 19 / 2007 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 10 / 15 / 2007 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 09 / 28 / 2007 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 10 / 16 / 2006 Instructor: Michael Eckmann.
Scripting Languages Chapter 8 More About Regular Expressions.
Last Updated March 2006 Slide 1 Regular Expressions.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
CS 330 Programming Languages 09 / 25 / 2008 Instructor: Michael Eckmann.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
CS 206 Introduction to Computer Science II 02 / 23 / 2009 Instructor: Michael Eckmann.
By: Andrew Cory. Grouping Things & Hierarchical Matching Grouping characters – ( and ) Allows parts of a regular expression to be treated as a single.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Module 6 – Generics Module 7 – Regular Expressions.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
CS346 Regular Expressions1 Pattern Matching Regular Expression.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
1 Perl, Beyond the Basics: Regular Expressions, Subroutines, and Objects in Perl CSCI 431 Programming Languages Fall 2003.
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible” 
CS 106 Introduction to Computer Science I 09 / 26 / 2007 Instructor: Michael Eckmann.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
CS 106 Introduction to Computer Science I 03 / 02 / 2007 Instructor: Michael Eckmann.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
Michael Kovalchik CS 265, Fall  Parenthesis group parts of expressions together  “/CS265|CS270/” => “/CS(265|270)/”  Groups can be nested  “/Perl|Pearl/”
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Chapter 4 © 2009 by Addison Wesley Longman, Inc Pattern Matching - JavaScript provides two ways to do pattern matching: 1. Using RegExp objects.
CS 106 Introduction to Computer Science I 09 / 10 / 2007 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 01 / 24 / 2007 Instructor: Michael Eckmann.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
CSC 594 Topics in AI – Natural Language Processing
Regular Expressions in Pearl - Part II
CSC 594 Topics in AI – Natural Language Processing
CSCI 431 Programming Languages Fall 2003
- Regular expressions:
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Perl Regular Expressions – Part 1
Presentation transcript:

CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS Fall 2006 Today’s Topics Questions / comments? Anyone try the tutorial? Homework assignment to be assigned tonight. Perl –continue with pattern matching/regular expressions

Perl Michael Eckmann - Skidmore College - CS Fall 2006 =~ (matches) !~ (doesn't match) m/ / (this is the format of match regular expression) Variables can be put inside the / / search pattern and they are interpolated. =~ can be omitted if matching the $_ special default variable ^ forces the match to be required to be at the very beginning of the string $ forces the match to be required to be at the very end of the string [ ] square brackets denote a character class where ONE character in the class will match –^ (not) to match a char NOT inside the character class, ^ must be right after the [ –– (hyphen) to specify a range of characters $` $& and $' (left, matched, right) special variables that are set after a match

Perl Michael Eckmann - Skidmore College - CS Fall 2006 There are ways to specify common character classes \d (any digit) \s (any whitespace \ \t\r\n\f \w (any “word” character (a digit, letter or underscore)) \D (any non-digit) \S (any non-whitespace) \W (any non-word character). (any character other than newline \n) These can be used within the square brackets or without.

Perl Michael Eckmann - Skidmore College - CS Fall 2006 Modifiers are characters that go after the second forward slash i is a modifier for ignore case. The behaviour for no modifier (the default) is that. Matches any non-newline character ^ matches at beginning of string $ matches at end of string (or before a newline at end) s modifier: treats the string as a single long line, so. matches any character including newline m modifier: treats string as multiple lines so, ^ and $ match the beginning or end of any line But now, \A matches the beginning of the whole string, \Z matches the end of the whole string. Let’s look at this webpage’s examples under Using Character Classes for some more examples:

Perl Michael Eckmann - Skidmore College - CS Fall 2006 | alternation character (acts sort of like a logical or) Grouping characters using the parentheses Getting the “submatches” by using the $1, $2, $3, etc. variables which are set via the parentheses. Using \1, \2, \3, etc. WITHIN the match expression will allow earlier subgroup matches to be part of the match string! These \1, \2, \3, etc. are called backreferences.

Perl Michael Eckmann - Skidmore College - CS Fall 2006 Let’s continue looking at this site for examples of using the alternation character, grouping using parentheses and the backtracking mechanism and extracting matches using parentheses

Perl Michael Eckmann - Skidmore College - CS Fall 2006 Repetition quantifiers are put immediately after the –character, –character class, or –grouping The repetition quantifiers and their meanings are: ? - 0 or 1 time * - 0 or more times or more times { } – min and max, at least or exactly { min, max } - match >=min times and at most max times. { min, } - match >=min times { n} - match n times exactly –these are GREEDY, that is, they match as much of the string as possible while still allowing the whole regular expression to match

Perl Michael Eckmann - Skidmore College - CS Fall 2006 Curly braces { } – min and max, at least or exactly { min, max } - match >=min times and at most max times. { 5, 10 } - matches between 5 and 10 times inclusive { min, } - match >=min times {3, } - matches 3 or more times { n} - match n times exactly { 6 } - matches exactly 6 times Examples of a repetition quantier after a grouping and after a character m/(the){3}/ this will match thethethe all consecutively. m/the{3}/ this will match theee (only the e is repeated 3 times) m/the.*the.*the/ This will match 3 the’s with any characters (except \n) btwn them Any other way to write it?

Perl Michael Eckmann - Skidmore College - CS Fall 2006 In terms of regular expression repetition quantifiers, what does greedy mean again?

Perl Michael Eckmann - Skidmore College - CS Fall 2006 In terms of regular expression repetition quantifiers, what does greedy mean? A quantifier is greedy if it matches as much of the string as possible while still allowing the whole regular expression to match. We'll see that greediness in action now. Let’s continue looking at this site for examples of matching repetitions and the 4 principles that are followed:

Perl Michael Eckmann - Skidmore College - CS Fall 2006 Recap on the special variables we learned $_ $` $& and $' (left, match, right) $0 (program name) $1, $2, $3,... (the submatches)

Perl Michael Eckmann - Skidmore College - CS Fall 2005 Let's write a few regular expressions. match any signed or unsigned integers of arbitrary length. e.g. it should match –-22 –4567 –1 –+43 but not things like: –- –+ –4.56 –abcd –etc.

Perl Michael Eckmann - Skidmore College - CS Fall 2005 Let's try these: 1) ignore beginning whitespace if there is any, and match the word program and store the rest of the string (after the word program) into some variable. 2) Now what if there were \n's in the string? What might we change? 3) cs330 or cs106 or CS106 or CS330 but not Cs330, or cS106 etc.

Perl Michael Eckmann - Skidmore College - CS Fall ) ignore beginning whitespace if there is any, and match the word program and store the rest of the string (after the word program) into some variable. m/\s*program(.*)/ 2) Now what if there were \n's in the string? What might we change? m/\s*program(.*)/s 3) cs330 or cs106 or CS106 or CS330 but not Cs330, or cS106 etc. m/cs330|cs106|CS330|CS106/ OR m/(cs|CS)(106|330)/

Perl Michael Eckmann - Skidmore College - CS Fall 2006 Let’s look at a larger parsing example using many of the features we just learned. We'll read the problem and try to solve it ourselves before looking at the solution. The “doing string selections” section of: The following page is a good page for reference. It is a nice summary of the different characters and their meanings with succinct examples: