Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.

Slides:



Advertisements
Similar presentations
การใช้ระบบปฏิบัติการ UNIX พื้นฐาน บทที่ 4 File Manipulation วิบูลย์ วราสิทธิชัย นักวิชาการคอมพิวเตอร์ ศูนย์คอมพิวเตอร์ ม. สงขลานครินทร์ เวอร์ชั่น 1 วันที่
Advertisements

CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
CS 497C – Introduction to UNIX Lecture 29: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Filters using Regular Expressions grep: Searching a Pattern.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
System Programming Regular Expressions Regular Expressions
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Programming Languages Meeting 13 December 2/3, 2014.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
CSC 352– Unix Programming, Spring 2015 April 28 A few final commands.
Quiz 30 minutes 10 questions No talking, texting, collaboration, etc…
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Introduction to Regular Expression for sed & awk by Susan Lukose.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Appendix A: Regular Expressions It’s All Greek to Me.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
CSC 4630 Meeting 21 April 4, Return to Perl Where are we? What is confusing? What practice do you need?
Sys Prog & Scrip - Heriot Watt Univ 1 Systems Programming & Scripting Lecture 12: Introduction to Scripting & Regular Expressions.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 5 – Regular Expressions, grep, Other Utilities.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
What are Regular Expressions?What are Regular Expressions?  Pattern to match text  Consists of two parts, atoms and operators  Atoms specifies what.
Agenda The Bourne Shell – Part II Special Characters Ambiguous File Reference Variable Names and Values User Created Variables Read-only Variables (Positional.
What is grep ?  % man grep  DESCRIPTION  The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses.
Regular Expressions Todd Kelley CST8207 – Todd Kelley1.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CSC 352– Unix Programming, Fall 2011 November 8, 2011, Week 11, a useful subset of regular expressions, grep and sed, parts of Chapter 11.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
John Hurley Cal State LA CS 245 Lecture 7: A Little More on Assembly More on *nix Shell Scripting.
Regular Expressions Copyright Doug Maxwell (
Lesson 5-Exploring Utilities
Looking for Patterns - Finding them with Regular Expressions
Regular Expression - Intro
BASIC AND EXTENDED REGULAR EXPRESSIONS
Lecture 9 Shell Programming – Command substitution
Regular Expression Beihang Open Source Club.
The ‘grep’ Command Colin Masterson.
CSC 352– Unix Programming, Spring 2016
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
CIT 383: Administrative Scripting
CSCI The UNIX System Regular Expressions
1.5 Regular Expressions (REs)
Regular Expressions grep Familiy of Commands
Presentation transcript:

Regular Expressions

u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities. –like grep, sed, vi, emacs, awk,... u The form of a regular expression: –It can be plain text... > grep unix file (matches all the appearances of unix) –It can also be special text... > grep ‘[uU]nix’ file (matches unix and Unix)

Regular Expressions and File Wildcarding u Regular expressions are different from file name wildcards. –Regular expressions are interpreted and matched by special utilities (such as grep). –File name wildcards are interpreted and matched by shells. –They have different wildcarding systems. –File wildcarding takes place first! obelix[1] > grep ‘[uU]nix’ file obelix[2] > grep [uU]nix file

Regular Expression Wildcards u A dot. matches any single character a.b matches axb, a$b, abb, a.b but does not match ab, axxb, a$bccb u * matches zero or more occurrences of the previous single character pattern a*b matches b, ab, aab, aaab, aaaab, … but doesn’t match axb u What does the following match?.*

Character Ranges u Matching a set or range of characters is done with [...] –[wxyz] - match any of wxyz [u-z] - match a character in range u - z u Combine this with * to match repeated sets –Example: [aeiou]* - match any number of vowels u Wildcards lose their specialness inside [...] –If the first character inside the [...] is ], it loses its specialness as well –Example: '[])}]' matches any of those closing brackets

Match Parts of a Line u Match beginning of line with ^ (caret) ^TITLE –matches any line containing TITLE at the beginning –^ is only special if it is at the beginning of a regular expression u Match the end of a line with a $ (dollar sign) FINI$ –matches any line ending in the phrase FINI –$ is only special at the end of a regular expression –Don’t use $ and double quotes (problems with shell) u What does the following match? ^WHOLE$

Matching Parts of Words u Regular expressions have a concept of a “word” which is a little different than an English word. –A word is a pattern containing only letters, digits, and underscores (_) u Match beginning of a word with \< –\<Fo matches Fo if it appears at the beginning of a word u Match the end of a word with \> –ox\> matches ox if it appears at the end of a word u Whole words can be matched too: \

More Regular Expressions u Matching the complement of a set by using the ^ –[^aeiou] - matches any non-vowel –^[^a-z]*$ - matches any line containing no lower case letters u Regular expression escapes –Use the \ (backslash) to “escape” the special meaning of wildcards v CA\*Net v This is a full sentence\. v array\[3] v C:\\DOS v \[.*\]

Regular Expressions Recall u A way to refer to the most recent match u To remember portions of regular expressions –Surround them with \(...\) –Recall the remembered portion with \n where n is 1-9 v Example: '^\([a-z]\)\1' –matches lines beginning with a pair of duplicate (identical) letters v Example: '^.*\([a-z]*\).*\1.*\1' –matches lines containing at least three copies of something which consists of lower case letters

Matching Specific Numbers of Repeats u X\{m,n\} matches m -- n repeats of the one character regular expression X –E.g. [a-z]\{2,10\} matches all sequences of 2 to 10 lower case letters u X\{m\} matches exactly m repeats of the one character regular expression X –E.g. #\{23\} matches 23 #s u X\{m,\} matches at least m repeats of the one character regular expression X –E.g. ^[aeiou]\{2,\} matches at least 2 vowels in a row at the beginning of a line u.\{1,\} matches more than 0 characters

Regular Expression Examples (1) u How many words in /usr/dict/words end in ing? –grep -c 'ing$' /usr/dict/words u How many words in /usr/dict/words start with un and end with g? –grep -c '^un.*g$' /usr/dict/words u How many words in /usr/dict/words begin with a vowel? –grep -ic '^[aeiou]' /usr/dict/words The -c option says to count the number of matches The -i option says to ignore case distinction

Regular Expression Examples (2) u How many words in /usr/dict/words have triple letters in them? –grep -ic '\(.\)\1\1' /usr/dict/words u How many words in /usr/dict/words start and end with the same 3 letters? –grep -c '^\(...\).*\1$' /usr/dict/words u How many words in /usr/dict/words contain runs of 4 consonants? –grep -ic '[^aeiou]\{4\}' /usr/dict/words

Regular Expression Examples (3) u What are the 5 letter palindromes present in /usr/dict/words? –grep -ic '^\(.\)\(.\).\2\1$' /usr/dict/words u How many words of the words in /usr/dict/words with y as their only vowel –grep '^[^aAeEiIoOuU]*$' /usr/dict/words | grep -ci 'y' u How many words in /usr/dict/words do not start and end with the same 3 letters? –grep -ivc '^\(...\).*\1$' /usr/dict/words

Extended Regular Expressions (1) u Used by some utilities like egrep support an extended set of matching mechanisms. –Called extended or full regular expressions. u + matches one or more occurrences of the previous single character pattern. –a+b matches ab, aab,... but not b (unlike *) u ? matches zero or one occurrence(s) of the previous single character pattern. –a?b matches b, ab and aab, … (why?)

Extended Regular Expressions (2) u r1|r2 matches regular expression r1 or r2 (| acts like a logical “or” operator). –red|blue will match either red or blue –Unix|UNIX will match either Unix or UNIX u (r1) allows the *, +, or ? matches to apply to the entire regular expression r1, and not just a single character. –(ab)+ requires at least one repetition of ab