Regular Expressions Software Tools. Slide 2 What is a Regular Expression? A regular expression is a pattern to be matched against a string. For example,

Slides:



Advertisements
Similar presentations
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
Advertisements

7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
Perl I/O Learning Objectives: 1. To understand how to perform input from standard Input & how to process the input 2. To understand how to perform input.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
6.1 Pattern Matching. 6.2 We often want to find a certain piece of information within the file: Pattern matching 1.Find all names that end with “man”
Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression.
COS 381 Day 19. Agenda  Assignment 5 Posted Due April 7  Exam 3 which was originally scheduled for Apr 4 is going to on April 13 XML & Perl (Chap 8-10)
Shell Programming Software Tools. Slide 2 Shells l A shell can be used in one of two ways: n A command interpreter, used interactively n A programming.
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
7.1 Last time on: Pattern Matching. 7.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
Regular Expression Learning Objectives:
Perl I/O Software Tools. Lecture 15 / Slide 2 Input from STDIN Reading from STDIN is easy, and we have done it many times. $a = ; In a scalar context,
7.1 Some Eclipse Tips Try Ctrl+Shift+L Quick help (keyboard shortcuts) Try Ctrl+SPACE Auto-complete Source→Format ( Ctrl+Shift+F ) Correct indentation.
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
8.1 Last time on: Pattern Matching. 8.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
More Regular Expressions. List/Scalar Context for m// Last week, we said that m// returns ‘true’ or ‘false’ in scalar context. (really, 1 or 0). In list.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
Introduction to Perl Software Tools. Slide 2 Introduction to Perl l Perl is a scripting language that makes manipulation of text, files, and processes.
Scripting Languages Chapter 8 More About Regular Expressions.
UNIX Filters.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Last Updated March 2006 Slide 1 Regular Expressions.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
(Stream Editor) By: Ross Mills.  Sed is an acronym for stream editor  Instead of altering the original file, sed is used to scan the input file line.
January 23, 2007Spring Unix Lecture 2 Special Characters for Searches & Substitutions Shell Scripts Hana Filip.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
Regular Expression (2) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
CPTG286K Programming - Perl Chapter 7: Regular Expressions.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Appendix A: Regular Expressions It’s All Greek to Me.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
20-753: Fundamentals of Web Programming 1 Lecture 10: Server-Side Scripting II Fundamentals of Web Programming Lecture 10: Server-Side Scripting II.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Shell Programming Learning Objectives: 1. To understand the some basic utilities of UNIX File 2. To compare UNIX shell and popular shell 3. To learn the.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
1 DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida Information Management.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Strings in Python String Methods. String methods You do not have to include the string library to use these! Since strings are objects, you use the dot.
Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.
Regular Expressions Copyright Doug Maxwell (
Looking for Patterns - Finding them with Regular Expressions
Lecture 9 Shell Programming – Command substitution
CSCI The UNIX System Regular Expressions
Presentation transcript:

Regular Expressions Software Tools

Slide 2 What is a Regular Expression? A regular expression is a pattern to be matched against a string. For example, the pattern Bill. l Matching either succeeds or fails. l Sometimes you may want to replace a matched pattern with another string. Regular expressions are used by many other Unix commands and programs, such as grep, sed, awk, vi, emacs, and even some shells.

Slide 3 Simple Uses of Regular Expressions If we are looking for all the lines in a file that contain the string Shakespeare, we could use the grep command: $ grep Shakespeare movie > result Here, Shakespeare is the regular expression that grep looks for in the file movie. Lines that match are redirected to result.

Slide 4 Simple Uses of Regular Expressions In Perl, we can make Shakespeare a regular expression by enclosing it in slashes: if(/Shakespeare/){ print $_; } What is tested in the if-statement ? Answer: $_. When a regular expression is enclosed in slashes, $_ is tested against the regular expression, returning true if there is a match, false otherwise.

Slide 5 Simple Uses of Regular Expressions if(/Shakespeare/){ print $_; } The previous example tests only one line, and prints out the line if it contains Shakespeare. To work on all lines, add a loop: while(<>){ if(/Shakespeare/){ print; }

Slide 6 Simple Uses of Regular Expressions What if we are not sure how to spell Shakespeare ? Certainly the first part is easy Shak, and there must be a r near the end. How can we express our idea? grep:grep "Shak.*r" movie > result Perl:while(<>){ if(/Shak.*r/){ print; }.* means “zero or more of any character”.

Slide 7 Simple Uses of Regular Expressions grep:grep "Shak.*r" movie > result The double quotes in this grep example are needed to prevent the shell from interpreting * as “all files”. Since Shakespeare ends in “e”, shouldn’t it be: Shak.*r.* Answer: No need. Any character can come before or after the pattern. Shak.*r is the same as.*Shak.*r.*

Slide 8 Substitution l Another simple regular expression is the substitute operator. l It replaces part of a string that matches the regular expression with another string. s/Shakespeare/Bill Gates/; $_ is matched against the regular expression ( Shakespeare ). If the match is successful, the part of the string that matched is discarded and replaced by the replacement string ( Bill Gates ). l If the match is unsuccessful, nothing happens.

Slide 9 Substitution l The program: $ cat movie Titanic Saving Private Ryan Shakespeare in Love Life is Beautiful $ cat sub1 #!/usr/local/bin/perl5 -w while(<>){ if(/Shakespeare/){ s/Shakespeare/Bill Gates/; print; } $ sub1 movie Bill Gates in Love $

Slide 10 Substitution An even shorter way to write it: $ cat sub2 #!/usr/local/bin/perl5 -w while(<>){ if(s/Shakespeare/Bill Gates/){ print; } $ sub2 movie Bill Gates in Love $

Slide 11 Patterns l A regular expression is a pattern. Some parts of the pattern match a single character ( a ). Other parts of the pattern match multiple characters (.* ).

Slide 12 Single-Character Patterns The dot “. ” matches any single character except the newline ( \n ). For example, the pattern /a./ matches any two- letter sequence that starts with a and is not “ a\n ”. Use \. if you really want to match the period. $ cat test hi hi bob. $ cat sub3 #!/usr/local/bin/perl5 -w while(<>){ if(/\./){ print; } } $ sub3 test hi bob. $

Slide 13 Single-Character Groups l If you want to specify one out of a group of characters to match use [ ]: /[abcde]/ This matches a string containing any one of the first 5 lowercase letters, while: /[aeiouAEIOU]/ matches any of the 5 vowels in either upper or lower case.

Slide 14 Single-Character Groups If you want ] in the group, put a backslash before it, or put it as the first character in the list: /[abcde]]/# matches [abcde] + ] /[abcde\]]/# okay /[]abcde]/# also okay Use - for ranges of characters (like a through z ): /[ ]/# any single digit /[0-9]/# same If you want - in the list, put a backslash before it, or put it at the beginning/end: /[X-Z]/# matches X, Y, Z /[X\-Z]/# matches X, -, Z /[XZ-]/# matches X, Z, - /[-XZ]/# matches -, X, Z

Slide 15 Single-Character Groups l More range examples: /[0-9\-]/ # match 0-9, or minus /[0-9a-z]/ # match any digit or lowercase letter /[a-zA-Z0-9_]/ # match any letter, digit, underscore There is also a negated character group, which starts with a ^ immediately after the left bracket. This matches any single character not in the list. /[^ ]/# match any single non-digit /[^0-9]/# same /[^aeiouAEIOU]/# match any single non-vowel /[^\^]/ # match any single character except ^

Slide 16 l For convenience, some common character groups are predefined: PredefinedGroupNegatedNegated Group \d (a digit)[0-9]\D (non-digit)[^0-9] \w (word char)[a-zA-Z0-9_]\W (non-word)[^a-zA-Z0-9_] \s (space char)[ \t\n]\S (non-space)[^ \t\n] \d matches any digit \w matches any letter, digit, underscore \s matches any space, tab, newline l You can use these predefined groups in other groups: /[\da-fA-F]/# match any hexadecimal digit Single-Character Groups

Slide 17 Multipliers l Multipliers allows you to say “one or more of these” or “up to four” of these.” * means zero or more of the immediately previous character (or character group). + means one or more of the immediately previous character (or character group). ? means zero or one of the immediately previous character (or character group).

Slide 18 Multipliers l Example: /Ga+te?s/ matches a G followed by one or more a ’s followed by t, followed by an optional e, followed by s. *, +, and ? are greedy, and will match as many characters as possible: $_ = "Bill xxxxxxxxx Gates"; s/x+/Cheap/; # gives: Bill Cheap Gates

Slide 19 General Multiplier How do you say “five to ten x ’s”? /xxxxxx?x?x?x?x?/# works, but ugly /x{5,10}/# nicer How do you say “five or more x ’s”? /x{5,}/ How do you say “exactly five x ’s”? /x{5}/ How do you say “up to five x ’s”? /x{0,5}/

Slide 20 General Multiplier How do you say “ c followed by any 5 characters (which can be different) and ending with d ”? /c.{5}d/ * is the same as {0,} + is the same as {1,} ? is the same as {0,1}

Slide 21 Pattern Memory l How would we match a pattern that starts and ends with the same letter or word? l For this, we need to remember the pattern. l Use ( ) around any pattern to put that part of the string into memory (it has no effect on the pattern itself). l To recall memory, include a backslash followed by an integer. /Bill(.)Gates\1/

Slide 22 Pattern Memory l Example: /Bill(.)Gates\1/ This example matches a string starting with Bill, followed by any single non-newline character, followed by Gates, followed by that same single character. l So, it matches: Bill!Gates!Bill-Gates- but not: Bill?Gates!Bill-Gates_ (Note that /Bill.Gates./ would match all four)

Slide 23 Pattern Memory l More examples: /a(.)b(.)c\2d\1/ This example matches a string starting with a, a character (#1), followed by b, another single character (#2), c, the character #2, d, and the character #1. So it matches: a-b!c!d-.

Slide 24 Pattern Memory l The reference part can have more than a single character. l For example: /a(.*)b\1c/ This example matches an a, followed by any number of characters (even zero), followed by b, followed by the same sequence of characters, followed by c. So it matches: aBillbBillc and abc, but not: aBillbBillGatesc.

Slide 25 Alteration l How about picking from a set of alternatives when there is more than one character in the patterns. The following example matches either Gates or Clinton or Shakespeare : /Gates|Clinton|Shakespeare/ l For single character alternatives, /[abc]/ is the same as /a|b|c/.

Slide 26 Anchoring Patterns l Anchors requires that the pattern be at the beginning or end of the line. ^ matches the beginning of the line (only if ^ is the first character of the pattern): /^Bill/ # match lines that begin with Bill /^Gates/ # match lines that begin with Gates /Bill\^/ # match lines containing Bill^ somewhere /\^/ # match lines containing ^ $ matches the end of the line (only if $ is the last character of the pattern): /Bill$/ # match lines that end with Bill /Gates$/ # match lines that end with Gates /$Bill/ # match with contents of scalar $Bill /\$/ # match lines containing $

Slide 27 So what happens with the pattern: a|b* Is this (a|b)* or a|(b*) ? Precedence of patterns from highest to lowest: NameRepresentation Parentheses( ) Multipliers? + * {m,n} Sequence & anchoringabc ^ $ Alternation| By the table, * has higher precedence than |, so it is interpreted as a|(b*). Precedence

Slide 28 Precedence l What if we want the other interpretation in the previous example? Answer: Simple, just use parentheses: (a|b)* l Use parentheses in ambiguous cases to improve clarity, even if not strictly needed. When you use parentheses for precedence, they also go into memory ( \1, \2, \3 ).

Slide 29 Precedence More precedence examples: abc* # matches ab, abc, abcc, abccc,… (abc)* # matches "", abc, abcabc, abcabcabc,… ^a|b # matches a at beginning of line, or b anywhere ^(a|b) # matches either a or b at the beginning of line a|bc|d # a, or bc, or d (a|b)(c|d) # ac, ad, bc, or bd (Bill Gates)|(Bill Clinton)# Bill Gates, Bill Clinton Bill (Gates|Clinton)# Bill Gates, Bill Clinton (Mr\. Bill)|(Bill (Gates|Clinton)) # Mr. Bill, Bill Gates, Bill Clinton (Mr\. )?Bill( Gates| Clinton)? # Bill, Mr. Bill, Bill Gates, Bill Clinton, # Mr. Bill Gates, Mr. Bill Clinton

Slide 30 =~ What if you want to match a different variable than $_ ? Answer: Use =~. l Examples: $name = "Bill Shakespeare"; $name =~ /^Bill/;# true $name =~ /(.)\1/;# also true (matches ll) if($name =~ /(.)\1/){ print "$name\n"; }

Slide 31 =~ An example using =~ to match : $ cat match1 #!/usr/local/bin/perl5 -w print "Quit (y/n)? "; if( =~ /^[yY]/){ print "Quitting\n"; exit; } print "Continuing\n"; $ match1 Quit (y/n)? y Quitting $

Slide 32 =~ Another example using =~ to match : $ cat match2 #!/usr/local/bin/perl5 -w print "Wakeup (y/n)? "; while( =~ /^[nN]/){ print "Sleeping\n"; print "Wakeup (y/n)? "; } $ match2 Wakeup (y/n)? n Sleeping Wakeup (y/n)? N Sleeping Wakeup (y/n)? y $

Slide 33 Ignoring Case In the previous examples, we used [yY] and [nN] to match either upper or lower case. Perl has an “ignore case” option for pattern matching: /somepattern/i $ cat match1a #!/usr/local/bin/perl5 -w print "Quit (y/n)? "; if( =~ /^y/i){ print "Quitting\n"; exit; } print "Continuing\n"; $ match1a Quit (y/n)? Y Quitting $

Slide 34 Slash and Backslash If your pattern has a slash character ( / ), you must precede each with a backslash ( \ ): $ cat slash1 #!/usr/local/bin/perl5 -w print "Enter path: "; $path = ; if($path =~ /^\/usr\/local\/bin/){ print "Path is /usr/local/bin\n"; } $ slash1 Enter path: /usr/local/bin Path is /usr/local/bin $

Slide 35 Different Pattern Delimiters If your pattern has lots of slash characters ( / ), you can also use a different pattern delimiter with the form: m#somepattern# The # can be any non-alphanumeric character. $ cat slash1a #!/usr/local/bin/perl5 -w print "Enter path: "; $path = ; if($path =~ m#^/usr/local/bin#){ #if($path =~ also works print "Path is /usr/local/bin\n"; } $ slash1a Enter path: /usr/local/bin Path is /usr/local/bin $

Slide 36 Special Read-Only Variables After a successful pattern match, the variables $1, $2, $3,… are set to the same values as \1, \2, \3,… You can use $1, $2, $3,… later in your program. $ cat read1 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in Love"; /(\w+)\W+(\w+)/; # match first two words # $1 is now "Bill" and $2 is now "Shakespeare" print "The first name of $2 is $1\n"; $ read1 The first name of Shakespeare is Bill

Slide 37 Special Read-Only Variables You can also use $1, $2, $3,… by placing the match in a list context: $ cat read2 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in Love"; ($first, $last) = /(\w+)\W+(\w+)/; print "The first name of $last is $first\n"; $ read2 The first name of Shakespeare is Bill

Slide 38 Special Read-Only Variables l Other read-only variables: $& is the part of the string that matched the pattern. $` is the part of the string before the match $’ is the part of the string after the match $ cat read3 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in Love"; / in /; print "Before: $`\n"; print "Match: $&\n"; print "After: $'\n"; $ read3 Before: Bill Shakespeare Match: in After: Love

Slide 39 More on Substitution If you want to replace all matches instead of just the first match, use the g option for substitution: $ cat sub3 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in love with Bill Gates"; s/Bill/William/; print "Sub1: $_\n"; $_ = "Bill Shakespeare in love with Bill Gates"; s/Bill/William/g; print "Sub2: $_\n"; $ sub3 Sub1: William Shakespeare in love with Bill Gates Sub2: William Shakespeare in love with William Gates $

Slide 40 More on Substitution l You can use variable interpolation in substitutions: $ cat sub4 #!/usr/local/bin/perl5 -w $find = "Bill"; $replace = "William"; $_ = "Bill Shakespeare in love with Bill Gates"; s/$find/$replace/g; print "$_\n"; $ sub4 William Shakespeare in love with William Gates $

Slide 41 More on Substitution l Pattern characters in the regular expression allows patterns to be matched, not just fixed characters: $ cat sub5 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in love with Bill Gates"; s/(\w+)/ /g; print "$_\n"; $ sub5 $

Slide 42 More on Substitution l Substitution also allows you to: n ignore case n use alternate delimiters use =~ $ cat sub6 #!/usr/local/bin/perl5 -w $line = "Bill Shakespeare in love with bill Gates"; $line =~ s#bill#William#gi; $line =~ print "$line\n"; $ sub6 William Gates in love with William Gates $

Slide 43 split The split function allows you to break a string into fields. split takes a regular expression and a string, and breaks up the line wherever the pattern occurs. $ cat split1 #!/usr/local/bin/perl5 -w $line = "Bill Shakespeare in love with Bill = split(/ /,$line); # split $line using space as delimiter print "$fields[0] $fields[3] $fields[6]\n"; $ split1 Bill love Gates $

Slide 44 split You can use $_ with split. split defaults to look for space delimiters. $ cat split2 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in love with Bill = split; # split $_ using space (default) as delimiter print "$fields[0] $fields[3] $fields[6]\n"; $ split2 Bill love Gates $

Slide 45 join The join function allows you to glue strings in a list together. $ cat join1 #!/usr/local/bin/perl5 = qw(Bill Shakespeare dislikes Bill Gates); $line = join(" print "$line\n"; $ join1 Bill Shakespeare dislikes Bill Gates $ l Note that the glue string is not a regular expression, just a normal string.