Presentation is loading. Please wait.

Presentation is loading. Please wait.

1.5 Regular Expressions (REs)

Similar presentations


Presentation on theme: "1.5 Regular Expressions (REs)"— Presentation transcript:

1 1.5 Regular Expressions (REs)
Compiler Structures , Semester 2, 1.5 Regular Expressions (REs) Objectives what is a regular expression? give examples of REs used in the grep command

2 1. Regular Expressions A regular expression (RE or regex) is a pattern used to match against text when searching inside a file. Regexs are used everywhere in Linux: Editors: ed, ex, vi Utilities: grep, egrep, sed, and awk

3 String Regex c k s UNIX Tools rocks. UNIX Tools sucks.
regex pattern UNIX Tools rocks. text: match UNIX Tools sucks. text: match text: UNIX Tools is okay. no match

4 Multiple Matches a p p l e Scrapple from the apple.
A regex pattern can match text in more than one place. a p p l e regex pattern Scrapple from the apple. text: match 1 match 2

5 The . (dot) Regex o . For me to poop on.
The . regex pattern can be used to match any character in the text. o . regex pattern For me to poop on. text: match 1 match 2

6 The Character Class Regex
A character class [] can match any set of characters in the text. b [eor] a t regex pattern beat a brat on a boat text: match 1 match 2 match 3

7 Character Class Examples

8 Repetition Regex: * (star)
The * defines zero or more copies of the letter before it. y a * y regex pattern I got mail, yaaaaaaaaaay! text: match

9 o a * o I like the zoo. h . * o Say hello Andrew. regex pattern text:
match h . * o regex pattern Say hello Andrew. text: match

10 h . * o regex pattern Say hello to Andrew. text: match Regex are greedy – they match as much of the text as they can.

11 Anchors: ^ $ ^ b [eor] a t beat a brat on a boat b [eor] a t $
regex pattern ^ matches the beginning of the text line beat a brat on a boat text: match b [eor] a t $ text: regex pattern $ matches the end of the text line beat a brat on a boat match

12 More Anchors

13 The | (or) Regex

14 More Repetition Regexs: * + ?

15 More Regex Operations See the regular expressions "cheat-sheet" at the course website over 80 operators!!

16 2. grep “grep” uses a regex pattern to search a text file Examples:
all the lines containing a match (or matches) are printed Examples: % grep "root" test1 % grep "r..t" test1 % grep "ro*t" test1 % grep "r[a-z]*t" test1 regex pattern in "..." text filename

17 The Grep Family grep usual version egrep extended REs
| + ? don’t need backslash) fgrep only strings, i.e. is faster

18 Common “grep” Options -c Print a count of matched lines. -i
Ignore uppercase/ lowercase -l List filenames that contain matches -n Print matched lines and line numbers -s Work silently; only display error messages. -v Print lines that do not match the pattern.

19 Some Simple Examples grep searches input lines, a line at a time.
If the line contains a string that matches grep's RE (pattern), then the line is output. input lines (e.g. from a file) output matching lines (e.g. to a file) grep "RE" hello andy my name is andy my bye byhe continued

20 Examples "|" means "or" continued grep "and" grep -E "an|my"
hello andy my name is andy my bye byhe hello andy my name is andy grep -E "an|my" hello andy my name is andy my bye byhe hello andy my name is andy my bye byhe "|" means "or" continued

21 "*" means "0 or more" grep "hel*" hello andy my name is andy
my bye byhe hello andy my bye byhe "*" means "0 or more"

22 grep with \< \> begin and end of word Look for the word "north"

23 grep with a\|b a or b egrep doesn't need backslash

24 grep with \+ one or more egrep doesn't need backslash

25 grep with . any character egrep doesn't need backslash

26 grep with ^ and $ begin and end of line

27 grep with [ ] set of chars

28 Fun with a Linux Dictionary
Find the location of the words file List all the words containing "hh"

29 Look for "niether" or "neither"
Look for words with three "u"s Count the words with three "a"s

30 Complex Regex Examples
Variable names in C [a-zA-Z_][a-zA-Z_0-9]* Dollar amount with optional cents \$[0-9]+(\.[0-9][0-9])? Time of day (1[012]|[1-9]):[0-5][0-9] (am|pm) HTML headers <h1> <H1> <h2> … <[hH][1-4]>

31 3. The RE Language A RE can be defined as a pattern language (operands and operators) which matches on text strings.

32 Some Possible RE Operands
text characters (e.g. ‘a’, ‘1’, ‘(‘) the symbol e (means an empty string ‘’) in code just use "" variables, which can be assigned a RE variable = RE

33 The Basic RE Operators There are three basic operators: union ‘|’
concatenation closure *

34 Union S | T use S or T to match strings Example REs: a | b a | b | c

35 Concatenation S T Example REs:
use S followed by the T to match against strings Example REs: a b matches the string "ab" w | (a b) matches the strings "w" or "ab"

36 Closure S* Example RE: use S 0 or more times to match against strings
a* matches the strings: e, a, aa, aaa, aaaa, aaaaa, ... empty string

37 3.1. REs for C Identifiers We define two RE variables, letter and digit: letter = A | B | C | D ... Z | a | b | c | d .... z digit = 0 | 1 | 2 | 3 | 4 | 5 | | 7 | 8 | 9 id is defined using letter and digit: id = letter ( letter | digit )* continued

38 Strings matched by id include:
ab345 w h5g Strings not matched: 2 $abc ****

39 3.2. REs for Integers and Floats
We redefine digit: digit = 0|1|2|3|4|5|6|7|8|9 or digit = [1 – 9] int and float: int = {digit}+ float = {digit}+ "." {digit}+

40 Integers and floats with exponents:
number = {digit}+ ('.' {digit}+ )? ( 'E'('+'|'-')? {digit}+ )?

41 4. More on REs See RE summary on the course website:
regular_expressions_cheat_sheet.pdf I have the standard RE book: Mastering Regular Expressions Jeffrey E. F. Freidl O'Reilly & Associates continued

42 There are many websites that explain REs:
helpsheets/unix/regex.html


Download ppt "1.5 Regular Expressions (REs)"

Similar presentations


Ads by Google