Regular Expressions grep Lecture 6 Lecture 7 Regular Expressions grep
Why Regular Expressions? Regular expressions are used to describe text patterns/filters Unix commands/utilities that support regular expressions: grep(fgrep, egrep) - search a file for a string or regular expression sed - stream editor awk (nawk) - pattern scanning and processing language There are some minor differences between the regular expressions supported by these programs We will cover the general matching operators first.
Character Class [] matches any of the enclosed chars [abc] matches a single a b or c [a-z] matches any of abcdef…xyz [^A-Za-z] matches a single character as long as it is not a letter. Example: [Dd][Aa][Vv][Ee] Matches "Dave" or "dave" or "dAVE", Does not match "ave" or "da" Exactly one character from the possibilities
Regular Expression Operators Any character (except a metacharacter!) matches itself. . Matches any single character except newline. * Matches 0 or more of the immediately preceding R.E. ? Matches 0 or 1 instances of the immediately preceding R.E. + Matches 1 or more instances of immediately preceding R.E. ^ Matches the preceding R.E. at the beginning of the line $ Matches the preceding R.E. at the end of the line | Matches the R.E. specified before or after this symbol \ Turn off the special meaning
Examples of R.E. x[abc]?x matches "xax" or "xx“ [abc]* matches "aaaaa" or "acbca" 0*10 matches "010" or "0000010"or "10" ^(dog)$ matches lines starting and ending with dog [\t ]* (A|a)+b*c?
Grouping with parens If you put a subpattern inside parens you can use + * and ? to the entire subpattern. a(bc)*d matches "ad" and "abcbcd" does not match "abcxd" or "bcbcd"
Example Christian Scott lives here and will put on a Christmas party There are around 30 to 35 people invited. They are: Tom Dan Rhonda Savage Nicky and Kimberly. Steve, Suzanne, Ginger and Larry ^[A-Z]..$ ^[A-Z][a-z]*3[0-5] [a-z]*\. ^ *[A-Z][a-z][a-z]$ ^[A-Z][a-z]*[^,][A-Za-z]*$
Review: Metacharacters for filename abbreviation * Matches anything: ls Test*.doc ? Matches any single character ls Test?.doc [abc…] Matches any of the enclosed characters: ls T[eE][sS][tT].doc [a-z] matches any character in a range ls [a-zA-Z]* [!abc…] matches any character except those listed: ls [!0-9]*
Difference !! Although there are similarities to the metacharacters used in filename expansion – we are talking about something different! Filename expansion is done by the shell. Regular expressions are used by commands (programs). However, be careful about specifying RE on the command line as a result of this overlap Good idea to always quote RE with special chars (‘’or “”)on the command line Example: % grep ‘[a-z]*’ chap[12]* Note: filename mask expanded by shell w/o ``
grep - search for a string grep [-bchilnsvw] PATTERN [filename...] Read files or standard /redirected input Search for specified pattern in each line Send results to the standard output Examples: %grep ‘^X11’ *- search all files for lines starting with the string “X11” %grep -v text file - print lines that do not match “text”
Regular expressions for grep c any non special character \c turn off any special meaning of character c ^ beginning of line $ end of line . any single character [...] any of characters in range .… [^....] any single character not in range .… r* zero or more occurrences of r
Regular Expressions for grep \< beginning of word anchor \<abc matches “abcd” but not “dabc” \> end of work anchor abc\> matches “dabc” but not “abcd” \(…\) stores the pattern … \(abc\)def matches “abcdef” and stores abc in \1. So \(abc\)def\1 matches “abcdefabc”. Can store up to 9 matches
grep - options Some useful options -c count number of lines -h do not display filename -l list only the files with matching lines -v display lines that do not match -n print line numbers
File db northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Heme 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Webber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
grep with pipes Remember, we can use pipes when a file is expected ls –l | grep ‘\<Feb.*3\>’
egrep Extended grep allows for more kinds of regular expressions unfortunately, egrep regular expressions are not a superset of grep regular expressions some of grep’s regular expressions are not available in egrep
grep vs. egrep new to egrep only in grep f+ matches one or more occurrences of f f? matches zero or one occurrences of f f|g matches f or g (ab) groups characters a and b together only in grep \( … \), \<, \> Final Note: Different versions of grep/egrep may support different expressions. Make sure to check the man pages.
Recommended Reading Chapter 3 Chapter 4, sections 4.1 – 4.5