Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Slides:



Advertisements
Similar presentations
Perl & Regular Expressions (RegEx)
Advertisements

Regular Expressions using Ruby Assignment: Midterm Class: CPSC5135U – Programming Languages Teacher: Dr. Woolbright Student: James Bowman.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 3: 8/28.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Regular Expressions.
W3101: Programming Languages (Perl) 1 Perl Regular Expressions Syntax for purpose of slides –Regular expression = /pattern/ –Broader syntax: if (/pattern/)
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
1 Scanning Tokens. 2 Tokens When a Scanner reads input, it separates it into “tokens”  … at least when using methods like nextInt()  nextInt() takes.
Scripting Languages Chapter 8 More About Regular Expressions.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Last Updated March 2006 Slide 1 Regular Expressions.
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expression in Java 101 COMP204 Source: Sun tutorial, …
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
Perl: Lecture 2 Advanced RE & CGI. Regular Expressions 2.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
VBScript Session 13.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
ECA 225 Applied Interactive Programming1 ECA 225 Applied Online Programming regular expressions.
Python for NLP Regular Expressions CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Pattern Matching II. Greedy Matching When dealing with quantifiers, Perl’s pattern matcher is by default greedy. For example, –$_ = “Bob sat next to the.
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
1 Perl, Beyond the Basics: Regular Expressions, Subroutines, and Objects in Perl CSCI 431 Programming Languages Fall 2003.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex Introduction to Perl Session 2 · manipulating.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible” 
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Chapter 4 © 2009 by Addison Wesley Longman, Inc Pattern Matching - JavaScript provides two ways to do pattern matching: 1. Using RegExp objects.
Finding substrings my $sequence = "gatgcaggctcgctagcggct"; #Does this string contain a startcodon? if ($sequence =~ m/atg/) { print "Yes"; } else { print.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
COMP234-Perl Variables, Literals Context, Operators Command Line Input Regex Program template.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607.
Regular Expressions in Perl
Regular Expression - Intro
Regular Expressions and perl
CSCI 431 Programming Languages Fall 2003
ECE 103 Engineering Programming Chapter 8 Data Types and Constants
CIT 383: Administrative Scripting
- Regular expressions:
Perl Regular Expressions – Part 1
Presentation transcript:

Regular Expressions in Perl Part I Alan Gold

Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters Prefixing the expression with “m” allows for arbitrary delimiters: e.g. m%Don’t use this% Modifiers follow the closing delimiter

Simple matching “Hello World” =~ /Hello/ Matches the literal string “Hello” “Superman” =~ /Kal-El/ Unfortunately does not match

Metacharacters Metacharacters are {}[]()^$.|*+?\ These must be escaped with a “\” to match their literal characters “Spoon+fork” =~ /Spoon+/ will match, but not how you want it to “Spoonnnnnn” =~ /Spoon+/ will also match “Spoon+fork” =~ /Spoon\+/ matches properly

Escape sequences Several characters can’t be printed directly They are matched using an escape sequence \t is a tab character (ASCII code 9) \n is a newline character (ASCII code 10) \r is a carriage return (ASCII code 13) \0.. Is an octal character, e.g. \033 \x.. Is a hexidecimal character, e.g. \x1B

Variables Variables can be used in regular expressions similarly to double-quoted strings $something = “cool”; ‘cool cruel pool’ =~ /$something/ Will match just fine

Anchors ^ anchors the pattern to the beginning of the string $ anchors to the end “Speaker” =~ /^peak/ Will not match “Rabbit” =~ /bit$/ Will match

Character classes Character classes match any character contained in [brackets] /tin[yas]/ will match tiny, tina, and tins “-” can be used to represent a range /[a-zA-Z0-9]/ will match a single alphanumeric character The literal “-” character can be matched if it is the first or last character, e.g. /[-0-9]/

Negated character classes The “^” character negates a character class /200[^7]/ will not match 2007 but will match 2008, 200q, etc.

Shortcut character classes \d is a digit, equivalent to [0-9] \s is any whitespace, equivalent to [\ \t\r\n\f] \w is a word character, eq. [0-9a-zA-Z_] \D is any non-digit, eq. [^0-9] \S is any non-whitespace, eq. [^\s] \W is any non-word, eq. [^\w] The period ‘.’ matches any character but ‘\n’

Word anchors The word anchor ‘\b’ matches the boundary between a word character and non-word character /\bpen/ matches “penitentiary”, not “open” /\bpen\b/ only matches “pen” if surrounded by non-words, e.g. “this pen is blue”

Modifiers Modifiers change the behavior of the engine // is the default, ‘.’ doesn’t match newlines //s causes ‘.’ to match newlines //m treats each line as its own string //i matches case-insensitively Modifiers can be combined, e.g. //sim /^car.$/im matches “not a car\nCAR!”

Or The pipe character ‘|’ can be used to match any one of the given choices /lumber|wood/ will match “My desk is made of spare lumber” and “My desk is made of 100,000 year old petrified wood” /0|1|2/ is equivalent to [0-2]

A blank slide