CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 5 – Regular Expressions, grep, Other Utilities.

Slides:



Advertisements
Similar presentations
Lecture 5  Regular Expressions;  grep; CSE4251 The Unix Programming Environment.
Advertisements

CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
Regular Expressions grep
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson
CS 497C – Introduction to UNIX Lecture 29: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
1 CSE 303 Lecture 7 Regular expressions, egrep, and sed read Linux Pocket Guide pp , 73-74, 81 slides created by Marty Stepp
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller
Shell Basics CS465 - Unix. Shell Basics Shells provide: –Command interpretation –Multiple commands on a single line –Expansion of wildcard filenames –Redirection.
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Filters using Regular Expressions grep: Searching a Pattern.
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
1 Lecture 5 Additional useful commands COP 3353 Introduction to UNIX.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
CSE 311 Foundations of Computing I Lecture 17 Structural Induction: Regular Expressions, Regular Languages Autumn 2011 CSE 3111.
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
BTANT129 w61 Regular expressions step by step Tamás Váradi
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
BIF713 Additional Utilities. Linux Utilities  You have learned many Linux commands. Here are some more that you can use:  Data Manipulation (Reg Exps)
CSC 352– Unix Programming, Spring 2015 April 28 A few final commands.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
Pattern Matching CSCI N321 – System and Network Administration.
Appendix A: Regular Expressions It’s All Greek to Me.
20-753: Fundamentals of Web Programming 1 Lecture 10: Server-Side Scripting II Fundamentals of Web Programming Lecture 10: Server-Side Scripting II.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
2004/12/051/27 SPARCS 04 Seminar Regular Expression By 박강현 (lightspd)
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
BASH – Text Processing Utilities Erick, Joan © Sekolah Tinggi Teknik Surabaya 1.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 6 – sed, command-line tools wrapup.
Models of Computing Regular Expressions 1. Formal models of computation What can be computed? What is a valid program? What is a valid name of a variable.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 4 – Shell Variables, More Shell Scripts.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CSC 352– Unix Programming, Fall 2011 November 8, 2011, Week 11, a useful subset of regular expressions, grep and sed, parts of Chapter 11.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
CSE 311 Foundations of Computing I Lecture 18 Recursive Definitions: Context-Free Grammars and Languages Autumn 2011 CSE 3111.
Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
CIRC Summer School 2016 Baowei Liu
PROGRAMMING THE BASH SHELL PART III by İlker Korkmaz and Kaya Oğuz
Regular Expressions Copyright Doug Maxwell (
CSE 374 Programming Concepts & Tools
CSE 374 Programming Concepts & Tools
Looking for Patterns - Finding them with Regular Expressions
CIRC Winter Boot Camp 2017 Baowei Liu
Regular Expression - Intro
BASIC AND EXTENDED REGULAR EXPRESSIONS
CSE 390a Lecture 7 Regular expressions, egrep, and sed
CSC 352– Unix Programming, Fall 2012
CSC 352– Unix Programming, Spring 2016
Unix Talk #2 grep/egrep/fgrep (maybe add more to this one….)
Lecture 5 Additional useful commands COP 3353 Introduction to UNIX 1.
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
CSE 303 Concepts and Tools for Software Development
CSCI The UNIX System Regular Expressions
1.5 Regular Expressions (REs)
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Lab 8: Regular Expressions
Lecture 5 Additional useful commands COP 3353 Introduction to UNIX 1.
Presentation transcript:

CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 5 – Regular Expressions, grep, Other Utilities

Where we are Done learning about the shell and it’s bizarre “programming language” (but pick up more on hw3) Today: Specifying string patterns for many utilities, particularly grep and sed (also needed for hw3) Next: sed And then: a real programming language – C 2

Globbing vs Regular Expressions “Globbing” refers to shell filename expansion “Regular expressions” are a different but overlapping set of rules for specifying patterns to programs like grep. (Sometimes called “pattern matching”) More distinctions: –Regular expressions as in CS/mathematics –“Regular expressions” in grep –“Extended regular expressions” in egrep Same as grep –E –Other variations in other programs… 3

Real Regular Expressions Some of the crispest, elegant, most useful CS theory out there. What computer scientists know and ill-educated hackers don’t (to their detriment). A regular expression p may “match” a string s. If p = –a, b, … matches the single character (basic reg. exp.) –p 1 p 2, …, if we can write s as s 1 s 2, where p 1 matches s 1, p 2 matches s 2. –p 1 | p 2, … if p 1 matches s or p 2 matches s (in egrep, for grep use \|) –p*, if there is an i  0 such that p…p (i times) matches s. (for i = 0, matches the zero-character string  ) 4

Conveniences Most regular expressions allow various abbreviations for convenience, but these do not make the language any more powerful –p+ is pp* –p? is (  | p) –[zd-h] is z | d | e | f | g | h –[^a-z] and. are more complex, but just technical conveniences (entire character set except for those listed, or a single character. ) –p{n} is p…p (p repeated n times) –p{n,} is p…pp* (p repeated n or more times) –p{n,m} is p repeated n through m times 5

grep – beginning and end of lines By default, grep matches each line against.*p.* You can anchor the pattern with ^ (beginning) and/or $ (end) or both (match whole line exactly) These are still “real” regular expressions 6

* is greedy For example, find sections in an xml file: egrep '.* ' stuff.xml –The.* matches as much as possible, even over an intermediate ‘ ’ –Use [^chars] or other regular expressions to anchor the search so it matches less But that does not mean that.*p.* will match any string – still need to match p. 7

Gotchas Modern (i.e., gnu) versions of grep and egrep use the same regular expression engine for matching, but the input syntax is different for historical reasons –For instance, \{ for grep vs { for egrep –See grep manual sec. 3.6 Must quote patterns so the shell does not muck with them – and use single quotes if they contain $ (why?) Must escape special characters with \ if you need them literally: \. and. are very different –But inside [ ] many more characters are treated literally, needing less quoting (\ becomes a literal!) 8

Previous matches – back references Up to 9 times in a pattern, you can group with (p) and refer to the matched text later! –(Need backslashes in sed.) You can refer to the text (most recently) matched by the n th group with \n. Simple example: double-words ^\([a-zA-Z]*\)\1$ You cannot do this with actual regular expressions; the program must keep the previous strings. –Especially useful with sed because of substitutions. 9

Other utilities Some very useful programs you can learn on your own: –find (search for files, e.g., find /usr -name words) –diff (compare two files’ contents; output is easy for humans and programs to read (see patch)) Also: –For many programs the -r flag makes them recursive (apply to all files, subdirectories, subsubdirectories, …). –So “delete everything on the computer” is cd /; rm -rf * (be careful!) 10

11