Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,

Slides:



Advertisements
Similar presentations
Perl & Regular Expressions (RegEx)
Advertisements

Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Regular Expressions Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string. if( $str =~ /hello/){
W3101: Programming Languages (Perl) 1 Perl Regular Expressions Syntax for purpose of slides –Regular expression = /pattern/ –Broader syntax: if (/pattern/)
CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
Regular expression. Validation need a hard and very complex programming. Sometimes it looks easy but actually it is not. So there is a lot of time and.
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
System Programming Regular Expressions Regular Expressions
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expression in Java 101 COMP204 Source: Sun tutorial, …
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
Quiz 30 minutes 10 questions No talking, texting, collaboration, etc…
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Review Please hand in your practicals and homework Regular Expressions with grep.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Appendix A: Regular Expressions It’s All Greek to Me.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
CSC 4630 Meeting 21 April 4, Return to Perl Where are we? What is confusing? What practice do you need?
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
1 Printing in Python Every program needs to do some output This is usually to the screen (shell window) Later we’ll see graphics windows and external files.
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
What is grep ?  % man grep  DESCRIPTION  The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Chapter 4 © 2009 by Addison Wesley Longman, Inc Pattern Matching - JavaScript provides two ways to do pattern matching: 1. Using RegExp objects.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
CMSC330 More Ruby. Last lecture Scripting languages Ruby language –Implicit variable declarations –Many control statements –Classes & objects –Strings.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Regular Expressions Copyright Doug Maxwell (
CS 330 Class 7 Comments on Exam Programming plan for today:
Looking for Patterns - Finding them with Regular Expressions
Regular Expressions and perl
CSE 303 Concepts and Tools for Software Development
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Presentation transcript:

Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript, C, C++, C#, elisp, Perl, Python, Ruby, PHP, sed, awk, and in many applications, such as editors, grep, egrep. Regular Expressions help you master your data.

Systems and Network ManagementPerl — Regular Expressions2 What is a Regular Expression? Powerful. Low level description: –Describes some text –Can use to: Verify a user's input Sift through large amounts of data High level description: –Allow you to master your data

Systems and Network ManagementPerl — Regular Expressions3 Regular Expressions as a language Can consider regular expressions as a language Made of two types of characters: –Literal characters Normal text characters Like words of the program –Metacharacters The special characters + ?. * ^ $ ( ) [ { | \ Act as the grammar that combines with the words according to a set of rules to create and expression that communicates an idea

How to use a Regular Expression How to make a regular expression as part of your program

Systems and Network ManagementPerl — Regular Expressions5 What do they look like? In Perl, a regular expression begins and ends with /, like this: /abc/ /abc/ matches the string "abc" –Are these literal characters or metacharacters? Returns true if matches, so often use as condition in an if statement

Systems and Network ManagementPerl — Regular Expressions6 Example: searching for "Course:" Problem: want to print all lines in all input files that contain the string "Course:" while ( <> ) { my $line = $_; if ( $line =~ /Course:/ ) { print $line; } Or more concisely: while ( <> ) { print if $_ =~ /Course:/; }

Systems and Network ManagementPerl — Regular Expressions7 The "match operator" =~ If just use /Course:/, this returns true if $_ contains the string Course: If want to test another string variable $var to see if it contains the regular expression, use $var =~ /regular expression/ Under what condition is this true?

Systems and Network ManagementPerl — Regular Expressions8 The "match operator" =~ 2 # sets the string to be searched: $_ = "perl for Win32"; # is 'perl' inside $_? if ( $_ =~ /perl/ ) { print "Found perl\n" }; # Same as the regex above. # Don't need the =~ as we are testing $_: if ( /perl/ ) { print "Found perl\n" };

Systems and Network ManagementPerl — Regular Expressions9 /i Matching without case sensitivity $_ = "perl for Win32"; # this will fail because the case doesn't match: if ( /PeRl/ ) { print "Found PeRl\n" }; # this will match, because there is an 'er' in 'perl': if ( /er/ ) { print "Found er\n" }; # this will match, because there is an 'n3' in 'Win32': if ( /n3/ ) { print "Found n3\n" }; # this will fail because the case doesn't match: if ( /win32/ ) { print "Found win32\n" }; # This matches because the /i at the end means # "match without case sensitivity": if ( /win32/i ) { print "Found win32 (i)\n" };

Systems and Network ManagementPerl — Regular Expressions10 Using !~ instead of =~ # Looking for a space: print "Found!\n" if / /; # both these are the same, but reversing the logic with # unless and !~ print "Found!!\n" unless $_ !~ / /; print "Found!!\n" unless ! / /;

Systems and Network ManagementPerl — Regular Expressions11 Embedding variables in regexps # Create two variables containing regular expressions # to search for: my $find = 32; my $find2 = " for "; if ( /$find/ ) { print "Found '$find'\n" }; if ( /$find2/ ) { print "Found '$find2'\n" }; # different way to do the above: print "Found $find2\n" if /$find2/;

The Metacharacters The funny characters What they do How to use them

Systems and Network ManagementPerl — Regular Expressions13 Character Classes [...] = ( "Nick", "Albert", "Alex", "Pick" ); foreach my $name ) { if ( $name =~ /[NP]ick/ ) { print "$name: Out for a Pick Nick\n"; else { print "$name is not Pick or Nick\n"; } Square brackets match a single character

Systems and Network ManagementPerl — Regular Expressions14 Examples of use of [...] Match a capital letter: [ABCDEFGHIJKLMNOPQRSTUVWXYZ] Same thing: [A-Z] Match a vowel: [aeiou] Match a letter or digit: [A-Za-z0-9]

Systems and Network ManagementPerl — Regular Expressions15 Negated character class: [^...] Match any single character that is not a letter: [^A-Za-z] Match any character that is not a space or a tab: [^ \t]

Systems and Network ManagementPerl — Regular Expressions16 Example using [^...] This simple program prints only lines that contain characters that are not a space: while ( <> ) { print $_ if /[^ ]/; } This prints lines that start with a character that is not a space: while ( <> ) { print $_ if /^[^ ]/; } Notice that ^ has two meanings: one inside [...], the other outside.

Systems and Network ManagementPerl — Regular Expressions17 Shorthand for Common Character Classes Since matching a digit is very common, Perl provides \d as a short way of writing [0-9] \D matches a non-digit: [^0-9] \s matches any whitespace character; shorthand for [ \t\n\r\f] \S non-whitespace, [^ \t\n\r\f] \w word character, [a-zA-Z0-9_] \W non-word character, [^ a-zA-Z0-9_]

Systems and Network ManagementPerl — Regular Expressions18 Matching any character The dot matches any character except a newline This matches any line with at least 5 characters: print if /...../;

Systems and Network ManagementPerl — Regular Expressions19 Matching the beginning or end to match a line that contains exactly five characters: print if /^.....$/; the ^ matches the beginning of the line. the $ matches at the end of the line

Systems and Network ManagementPerl — Regular Expressions20 Matching Repetitions: * + ? {n,m} To match zero or more: /a*/ will match zero or more letter a, so matches "", "a", "aaaa", "qwereqwqwer", or the nothing in front of anything! to match at least one: /a+/ matches at least one "a" /a?/ matches zero or one "a" /a{3,5}/ matches between 3 and 5 "a"s.

Systems and Network ManagementPerl — Regular Expressions21 Example using.* $_ = 'Nick Urbanik '; print "found something in <>\n" if / /; # Find everything between quotes: $_ = 'He said, "Hi there!", and then "What\'s up?"'; print "quoted!\n" if /"[^"]*"/; print "too much!\n" if /".*"/;

Systems and Network ManagementPerl — Regular Expressions22 Capturing the Match with (...) Often want to scan large amounts of data, extracting important items Use parentheses and regular expressions Silly example of capturing an address: $_ = 'Nick Urbanik '; print "found $1 in <>\n" if / /;

Systems and Network ManagementPerl — Regular Expressions23 Capturing the match: greediness Look at this example: $_ = 'He said, "Hi there!", and then "What\'s up?"'; print "$1\n" if /"([^"]*)"/; print "$1\n" if /"(.*)"/; What will each print? The first one works; the second one prints: Hi there!", and then "What's up? Why? Because *, ?, +, {m,n} are greedy! They match as much as they possibly can!

Systems and Network ManagementPerl — Regular Expressions24 Being Stingy (not Greedy): ? Usually greedy matching is what we want, but not always How can we match as little as possible? Put a ? after the quantifier: *? Match 0 or more times +? Match 1 or more times ?? Match 0 or 1 time {n,}? Match at least n times {n,m}? Match at least n, but no more than m times

Systems and Network ManagementPerl — Regular Expressions25 Being Less Greedy: Example We can solve the problem we saw earlier using non-greedy matching: $_ = 'He said, "Hi there!", and then "What\'s up?"'; print "$1\n" if /"([^"]*)"/; print "$1\n" if /"(.*?)"/; These both work, and match only Hi there!

Systems and Network ManagementPerl — Regular Expressions26 Sifting through large amounts of data Imagine you need to create computing accounts for thousands of students As input, you have data of the form: Some heading on the top of each page More headings with other content, including blank lines A tab character separates the columns H123456(1) I234567(2) J345678(3) A123456(1)

Systems and Network ManagementPerl — Regular Expressions27 Capturing the Match: (...) # useradd() is a function defined elsewhere # that creates a computer account with # username as first parameter, password as # the second parameter while ( <> ) { if ( /^(\d{9})\t([A-Z]\d{6}\([\dA]\))/ ) { my $student_id = $1; my $hk_id = $2; useradd( $student_id, $hk_id ); }

Systems and Network ManagementPerl — Regular Expressions28 The Substitution Operator s/// Sometimes want to replace one string with another (editing) Example: want to replace Nicholas with Nick on input files: while ( <> ) { $_ =~ s/Nicholas/Nick/; print $_; }

Systems and Network ManagementPerl — Regular Expressions29 Avoiding leaning toothpicks: /\/\/ Want to change a filename, edit the directory in the path from, say /usr/local/bin/filename to /usr/bin/filename Could do like this: s/\/usr\/local\/bin\//\/usr/\bin\//; but this makes me dizzy! We can do this instead: s!\/usr/local/bin/!/usr/bin/!; Can use any character instead of / in s/// For matches, can put m//, and use any char instead of / Can also use parentheses or braces: s{...}{...} or m{...}

Systems and Network ManagementPerl — Regular Expressions30 Substitution and the /g modifier If an input line contains: Nicholas Urbanik read "Nicholas Nickleby" then the output is: Nick Urbanik read "Nicholas Nickleby" How change all the Nicholas in one line? Use the /g (global) modifier: while ( <> ) { $_ =~ s/Nicholas/Nick/g; print $_; }

Systems and Network ManagementPerl — Regular Expressions31 Making regular expressions readable: /x modifier Sometimes regular expressions can get long, and need comments inside so others (or you later!) understand Use /x at the end of s///x or m//x Allows white space, newlines, comments