Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNIX System Programming

Similar presentations


Presentation on theme: "UNIX System Programming"— Presentation transcript:

1 UNIX System Programming
J. Tan Computer Networks Regular Expressions You can use and even administer Unix systems without understanding regular expressions but you will be doing things the hard way Regular expressions are endemic to Unix vi, ed, sed, and emacs awk, tcl, Perl and Python grep, egrep, fgrep Regular expressions descend from a fundamental concept in Computer Science called regular grammars, from finite automata theory UNIX System Programming CS UNIX System Programming

2 What Is a Regular Expression?
J. Tan Computer Networks What Is a Regular Expression? A regular expression is a description of a pattern that describes a set of possible characters in an input string Simple examples of regular expressions (known as regex from here on) In vi when searching :/c[aou]t searches for cat, cot, or cut In the shell ls *.txt cat chapter? cp Week[1234].pdf /home/fac/tan UNIX System Programming CS UNIX System Programming

3 Shortcomings of Regular Expressions
J. Tan Computer Networks Shortcomings of Regular Expressions Considerable variation from utility to utility: The shell is limited to fairly simple metacharacter substitution (*,?, […]) and does not really support regex Regex in ed and vi are also fairly limited Regex in sed are not exactly the same as regex in Perl, or Awk, or grep, or egrep UNIX System Programming CS UNIX System Programming

4 Shortcomings of Regular Expressions
J. Tan Computer Networks Shortcomings of Regular Expressions Burden on the user to examine the man page or other documentation for these utilities to determine which flavor of regex are supported UNIX System Programming CS UNIX System Programming

5 So How Do We Build a Regex?
J. Tan Computer Networks So How Do We Build a Regex? The simplest regex is a normal character c, for example, will match a c anywhere while an a will do the same for an a. The next thing is a . (period) This will match any single occurrence of any character except a newline For example . will match a z or an e or a ? or even another . UNIX System Programming CS UNIX System Programming

6 So How Do We Build a Regex?
J. Tan Computer Networks So How Do We Build a Regex? w.n will match win, wan, won, wen, wmn, as well as w*n and w9n Complex regex are constructed by simply by stringing together smaller regexs UNIX System Programming CS UNIX System Programming

7 Protecting Regex Metacharacters
J. Tan Computer Networks Protecting Regex Metacharacters Since many of the special characters used in regexs also have special meaning to the shell, it’s a good idea to get in the habit of single quoting your regexs: protects any special characters from being operated on by the shell If you habitually do it, you won’t have to worry about when it is necessary UNIX System Programming CS UNIX System Programming

8 Multiple Occurrences in a Pattern
J. Tan Multiple Occurrences in a Pattern Computer Networks The * (asterisk or star) is used to define zero or more occurrences of the single character preceding it abc*d will match abd, abcd, abccd, abcccd, or even abcccccccccccccccccccccccccccccccccccd But not abcccxxxxd ! UNIX System Programming CS UNIX System Programming

9 Multiple Occurrences in a Pattern
J. Tan Multiple Occurrences in a Pattern Computer Networks Note the difference between the * in a regex and the shell’s usage In a regex, a * only stands for zero or more occurrences of a single preceding character, In the shell, the * stands for any number of characters that may or may not be different UNIX System Programming CS UNIX System Programming

10 Specifying Begin or End of Line
J. Tan Computer Networks Specifying Begin or End of Line The ^ specifies the beginning of a line ^The then will match any The that are the first characters on a line The $ matches the end of line well$ will match well only if they are the last characters on a line prior to the NEWLINE character UNIX System Programming CS UNIX System Programming

11 Specifying Begin or End of Line
J. Tan Computer Networks Specifying Begin or End of Line Note that well□‘ (notice the space at the end) would NOT match well$ ^Ken$ would only match a line that started with Ken and then had no other characters on the line What would the regex ^$ do? UNIX System Programming CS UNIX System Programming

12 UNIX System Programming
J. Tan Computer Networks Character Classes [ ] The square brackets [ ] are used to define character classes [aeiou] will match any of the characters a, e, i, o, or u. [aA]wk will match awk or Awk UNIX System Programming CS UNIX System Programming

13 UNIX System Programming
J. Tan Computer Networks Character Classes [ ] Ranges can also be specified in character classes [1-9] is the same as [ ] [abcde] is equivalent to [a-e] You can also combine multiple ranges [abcde ] is equivalent to [a-e1-9] Note that the - character has a special meaning in a character class BUT ONLY if it is used within a range, [-123] would match the characters -, 1, 2, or 3 UNIX System Programming CS UNIX System Programming

14 Negating a Character Class
J. Tan Computer Networks Negating a Character Class The ^, when used as the first character in a character class definition, serves to negate the definition For example [^aeiou] matches any character except a, e, i, o, or u UNIX System Programming CS UNIX System Programming

15 Negating a Character Class
J. Tan Computer Networks Negating a Character Class Used anywhere else within a character class, the ^ simply stands for a ^ [ab^&] would match an a, b, ^, or & Note also that within a character class, the ^ does not stand for beginning of line UNIX System Programming CS UNIX System Programming

16 Escaping Special Characters
J. Tan Computer Networks Escaping Special Characters Even though we are single quoting our regexs so the shell won’t interpret the special characters, sometimes we still want to use a special character as itself. To do this, we escape the character with a \ (backslash) UNIX System Programming CS UNIX System Programming

17 Escaping Special Characters
J. Tan Computer Networks Escaping Special Characters Suppose we want to search for the character sequence 8*9* Unless we do something special, this will match zero or more 8’s followed by zero or more 9’s, not what we want 8\*9\* will fix this - now the asterisks are treated as regular characters UNIX System Programming CS UNIX System Programming

18 UNIX System Programming
J. Tan Computer Networks Reading a Regex If you get in the habit of literally reading a regex, it will be much easier for you to determine what one does ^Ken could be read as matching the word Ken at the beginning of a line A better way to read it is the beginning of a line followed by a capital K followed by an e followed by an n UNIX System Programming CS UNIX System Programming

19 UNIX System Programming
J. Tan Computer Networks Reading a Regex ^corn$ would be read as the beginning of a line followed immediately by a c followed by an o followed by an r followed by an n followed immediately by a NEWLINE This matches up to the state machine used to interpret the regex UNIX System Programming CS UNIX System Programming

20 UNIX System Programming
J. Tan Computer Networks Alternation Regex also provides an alternation character ( | ) for matching one or another subexpression (K|T)en will match Ken or Ten (note use of parenthesis and not brackets) ^(From|Subject): matches a beginning of line followed by either the characters From or Subject followed by a : UNIX System Programming CS UNIX System Programming

21 UNIX System Programming
J. Tan Computer Networks Alternation The parenthesis ( ) are used to limit the scope of the alternation At(ten|nine)tion matches Attention or Atninetion, not Atten or ninetion as would happen without the parenthesis - (Atten|ninetion) UNIX System Programming CS UNIX System Programming

22 UNIX System Programming
J. Tan Computer Networks Word Boundaries The regex cat will match cat, concatenate, catastrophe, and catatonic What if I only want to match the word cat? UNIX System Programming CS UNIX System Programming

23 UNIX System Programming
J. Tan Computer Networks Word Boundaries Some regex flavors implement the concept of words \< signifies the beginning of a word and \> signifies the end of a word These are not metacharacters but when used together the have special meaning to the regex engine UNIX System Programming CS UNIX System Programming

24 UNIX System Programming
J. Tan Computer Networks Word Boundaries Note that the regex engine does not understand English A beginning-of word is just the position where a sequence of alpha numeric characters begin End-of-word is where the sequence stops Where is that &^%$# stinkin’ roadrunner-lovin’ coyote? UNIX System Programming CS UNIX System Programming

25 UNIX System Programming
J. Tan Computer Networks Quantifiers The ? (question mark) specifies an optional character, the single character that immediately precedes it E.g., if I am looking for the month of July, it may be specified as July or Jul I could use (July|Jul) to search or I could use July? UNIX System Programming CS UNIX System Programming

26 UNIX System Programming
J. Tan Computer Networks Repetition The * (asterisk or star) specifies zero or more occurrences of the immediately preceding character + (plus) means one or more abc+d will match abcd, abccd, or abccccccd but will not match abd abc?d will match abd and abcd but not abccd UNIX System Programming CS UNIX System Programming

27 UNIX System Programming
J. Tan Computer Networks Repetition Ranges Ranges can also be specified {n,m} notation can specify a range of repetitions for the immediately preceding regex {n} means exactly n occurrences {n,} means at least n occurrences {n,m} means at least n occurrences but no more than m occurrences UNIX System Programming CS UNIX System Programming

28 UNIX System Programming
J. Tan Computer Networks Backreferences Sometimes it is handy to be able to refer to a match that was made earlier in a regex This is done using backreferences \n is the backreference specifier, where n is a number. E.g., \1 references the first set of matched text. Matched text are delineated by parenthesis ( .. ) UNIX System Programming CS UNIX System Programming

29 UNIX System Programming
J. Tan Computer Networks Backreferences For example, to find our double-word example \<([A-Za-z]+)sp+ \1\> (note: ( ) is also used for backreferencing) This first finds a generic word ([A-Za-z]+) followed by one or more spaces sp+ The \1\> then matches the first subexpression ([A-Za-z]+) as the end of a word UNIX System Programming CS UNIX System Programming

30 UNIX System Programming
J. Tan Computer Networks Regex Summary UNIX System Programming CS UNIX System Programming

31 UNIX System Programming
J. Tan Computer Networks Regex Examples Variable names in C [a-zA-Z_] [a-zA-Z_0-9]* Dollar amount with optional cents \$[0-9]+(\.[0-9][0-9])? Time of day (1[012]|[1-9]):[0-5][0-9] (am|pm) UNIX System Programming CS UNIX System Programming

32 UNIX System Programming
J. Tan Computer Networks grep ■ grep comes from the ed search command global regular expression print or g\re\p ■ This was such a useful command that it was written as a standalone utility UNIX System Programming CS UNIX System Programming

33 UNIX System Programming
J. Tan Computer Networks grep ■ There are two other variants, egrep and fgrep that comprise the grep family ■ grep is the answer to the moments where you know you want a the file that contains a specific phrase but you cannot remember its name UNIX System Programming CS UNIX System Programming

34 UNIX System Programming
J. Tan Computer Networks grep Family Syntax grep [-hilnw] [-e expression] [filename] egrep [-hiln] [-e expression] [-f filename] [expression] [filename] fgrep [-hilnx] [-e string] [-f filename] [string] [filename] UNIX System Programming CS UNIX System Programming

35 UNIX System Programming
J. Tan Computer Networks grep Family ■ -h - Do not display filenames ■ -i - Ignore case ■ -l - List only filenames containing matching lines ■ -n - Precede each matching line with its line number ■ -w - Search for the expression as a word (grep only) ■ -x - Match whole line only (fgrep only) ■ -v – do not match lines with a given pattern ■ -e expression - Same as a plain expression but useful when expression starts with a – UNIX System Programming CS UNIX System Programming

36 UNIX System Programming
J. Tan Computer Networks grep ■ -e string - fgrep only uses search strings, no regular expressions ■ -f filename - take the regular expression (egrep) or a list of strings separated by NEWLINES (fgrep) from filename UNIX System Programming CS UNIX System Programming

37 UNIX System Programming
J. Tan Computer Networks Family Differences ■ grep - uses regular expressions for pattern matching ■ fgrep - file grep, does not use regular expressions, only matches fixed strings but can get search strings from a file UNIX System Programming CS UNIX System Programming

38 UNIX System Programming
J. Tan Computer Networks Family Differences ■ egrep - exponential grep, uses a more powerful set of regular expressions but does not support backreferencing, generally the fastest member of the grep family UNIX System Programming CS UNIX System Programming

39 Regex in the grep Family
J. Tan Computer Networks Regex in the grep Family The following one-character regexs match a single character ■ c - an ordinary character ■ \c - an escaped special character . * [ \ ^ $ ■ \ followed by < > ( ) { or } ■ . (period) ■ [string] any single character contained within the brackets UNIX System Programming CS UNIX System Programming

40 Rules For Constructing grep Regex
J. Tan Rules For Constructing grep Regex Computer Networks ■ A single character regex followed by a * matches zero or more occurrences of the single-character regex ■ A regex enclosed in \( and \) matches whatever the regex matches and tags it (grep only) ■ \n matches the same string the corresponding \(regex\) matched UNIX System Programming CS UNIX System Programming

41 grep Regex Construction Rules
J. Tan Computer Networks grep Regex Construction Rules ■ The concatenation of regexs is a regex that matches the concatenation of the strings matched by each component of the regex ■ A regex followed by a \{m\}, \{m,\}, or \{m,n\} matches a range of occurrences of the regex UNIX System Programming CS UNIX System Programming

42 UNIX System Programming
J. Tan Computer Networks egrep Regex ■ egrep uses the same rules except for \(, \), \n, \<, \>, \{, and \} ■ egrep adds the following regex components ■ * a regular expression followed by a * matches zero or more occurrences of the expression, not just a single character UNIX System Programming CS UNIX System Programming

43 UNIX System Programming
J. Tan Computer Networks egrep Regex ■ +  a regex followed by a + matches one or more occurrences of the regex ■ ?  a regex followed by a ? matches zero or one occurrences of the regex ■ | provides alternation: two regex separated by | match either a match for the first or the second regex ■ a regex enclosed in () provides a match for the regex UNIX System Programming CS UNIX System Programming

44 UNIX System Programming
J. Tan Computer Networks UNIX System Programming CS UNIX System Programming

45 UNIX System Programming
J. Tan Computer Networks grep Family Options UNIX System Programming CS UNIX System Programming

46 grep Family Expressions
J. Tan Computer Networks grep Family Expressions UNIX System Programming CS UNIX System Programming

47 More grep Family Expressions
J. Tan Computer Networks More grep Family Expressions UNIX System Programming CS UNIX System Programming

48 UNIX System Programming
J. Tan Computer Networks grep Examples grep men foobar grep -w men foobar grep “\<men\>” foobar (same as –w flag) grep “fo*” foobar UNIX System Programming CS UNIX System Programming

49 UNIX System Programming
J. Tan Computer Networks grep Examples grep -nw “[Tt]he” foobar fgrep The foobar fgrep -f expfile foobar fgrep -x Kiss me foobar //match whole line exactly UNIX System Programming CS UNIX System Programming

50 UNIX System Programming
J. Tan Computer Networks sed, The Stream Editor sed is descended from our friend, ed Both operate on files one line at a time Both use a similar command format [address] operation [argument] ed can use command scripts ed filename < script_file UNIX System Programming CS UNIX System Programming

51 UNIX System Programming
J. Tan Computer Networks sed, The Stream Editor sed is a special purpose editor that will only take commands from a script or the command line, it cannot be used interactively. All input to comes from stdin and goes to stdout, although there is an option to supply an edit filename on the command line UNIX System Programming CS UNIX System Programming

52 UNIX System Programming
J. Tan Computer Networks sed Changes are not made to the edit file itself, instead the input file, along with any changes, is written to standard output Important difference between ed and sed ■ ed changes the edit file, sed does not UNIX System Programming CS UNIX System Programming

53 UNIX System Programming
J. Tan Computer Networks sed To make changes from sed permanent: Redirect from stdin to outfile sed -f scriptfile editfile > outfile UNIX System Programming CS UNIX System Programming

54 UNIX System Programming
J. Tan Computer Networks Stream Addressing Another difference is the stream orientation aspect of sed’s impact on line addressing ed operates only on lines that are specifically addressed or the current line if no address is specified sed goes through the file a line at a time, so if no specific address is provided for a command, it operates on all lines UNIX System Programming CS UNIX System Programming

55 UNIX System Programming
J. Tan Computer Networks Stream Addressing If you enter the command s/dog/cat/ it would change the first instance of dog on the current line to cat The same command in sed would change the first occurrence of dog on every line to cat UNIX System Programming CS UNIX System Programming

56 UNIX System Programming
J. Tan Computer Networks sed Syntax Syntax: sed [-n] [-e] [command] [file] sed [-n] [-f scriptfile] [file…] -n - only print lines specified with the p command or the p flag of the substitute (s) command -e command - the next argument is an editing command rather than a filename, useful if multiple commands are specified UNIX System Programming CS UNIX System Programming

57 UNIX System Programming
J. Tan Computer Networks sed Syntax -f scriptfile - next argument is a filename containing editing commands If the first line of a scriptfile is #n, sed acts as though -n had been specified Note that all forms of calling sed are really the same sed [options] script file_argument(s) UNIX System Programming CS UNIX System Programming

58 How Does sed Treat Files?
J. Tan Computer Networks How Does sed Treat Files? Input scriptfile Input line (Pattern Space) Hold Space Output UNIX System Programming CS UNIX System Programming

59 UNIX System Programming
J. Tan Computer Networks Scripts A script is nothing more than a file of commands Each command consists of an address and an action, where the address can be a pattern (regular expression) UNIX System Programming CS UNIX System Programming

60 UNIX System Programming
J. Tan Computer Networks Scripts As each line of the input file is read, sed reads the first command of the script and checks the address or pattern against the current input line If there is a match, the command is executed If there is no match, the command is ignored sed then repeats this action for every command in the script file All commands in the script are read - not just the first one that matches UNIX System Programming CS UNIX System Programming

61 UNIX System Programming
J. Tan Computer Networks Sed characteristics When it has reached the end of the script, sed outputs the current line unless the -n option has been set sed then reads the next line in the input file and restarts from the beginning of the script file UNIX System Programming CS UNIX System Programming

62 UNIX System Programming
J. Tan Computer Networks Sed characteristics All commands in the script file are compared to, and potentially act on, all lines in the input file* Note again the difference from ed If no address is given, ed operates only on the current line If no address is given sed operates on all lines UNIX System Programming CS UNIX System Programming

63 Four Basic Script Types
J. Tan Computer Networks Four Basic Script Types Multiple edits to the same file Changing from one document formatter’s codes to that of another Making changes across a set of files Global changes due to product name changes or some similar global change UNIX System Programming CS UNIX System Programming

64 Four Basic Script Types
J. Tan Computer Networks Four Basic Script Types Extracting the contents of a file Flat-file database operations Making edits in a pipeline Used when making changes prior to some other command that you don’t want made permanently to the source file UNIX System Programming CS UNIX System Programming

65 Three Basic Principles of sed
J. Tan Computer Networks Three Basic Principles of sed All editing commands in a script are applied in order to each line of input unless the command is d or c in which case a new line is read after the d or c command executes All editing lines of a script are applied to all lines of the edit file unless line addressing restricts the lines affected by the command UNIX System Programming CS UNIX System Programming

66 Three Basic Principles of sed
J. Tan Computer Networks Three Basic Principles of sed The original file is unchanged, the editing commands modify a copy of the original line and the copy is sent to standard output UNIX System Programming CS UNIX System Programming

67 UNIX System Programming
J. Tan Computer Networks sed Commands sed commands have the general form [address[, address]][!]command [arguments] sed copies each input line into a pattern space If the address of the command matches the line in the pattern space, the command is applied to that line UNIX System Programming CS UNIX System Programming

68 UNIX System Programming
J. Tan Computer Networks sed Commands If the command has no address, it is applied to each line as it enters pattern space If a command changes the line in pattern space, subsequent commands operate on the modified line When all commands have been read, the line in pattern space is written to standard output and a new line is read into pattern space UNIX System Programming CS UNIX System Programming

69 UNIX System Programming
J. Tan Computer Networks Addressing An address can be either a line number or a pattern, enclosed in slashes ( /pattern/ ) A pattern is described using regular expressions UNIX System Programming CS UNIX System Programming

70 UNIX System Programming
J. Tan Computer Networks Addressing Additionally a NEWLINE can be specified using the \n character pair This is only really useful when two lines have been joined in pattern space with the N command so that patterns crossing line boundaries can be searched. If no pattern is specified, the command will be applied to all lines of the input file UNIX System Programming CS UNIX System Programming

71 UNIX System Programming
J. Tan Computer Networks Addressing Most commands will accept two addresses. If only one address is given, the command operates only on that line If two comma separated addresses are given, then the command operates on a range of lines between the first and second address, inclusively UNIX System Programming CS UNIX System Programming

72 UNIX System Programming
J. Tan Computer Networks Addressing The ! operator can be used to negate an address, i.e., address!command causes command to be applied to all lines that do not match address Braces { } can be used to apply multiple commands to an address UNIX System Programming CS UNIX System Programming

73 Address-oriented processinf
J. Tan Computer Networks Address-oriented processinf [/pattern/[,/pattern/]]{ command1 command2 command3 } The opening brace must be the last character on a line and the closing brace must be on a line by itself Make sure there are no spaces following the braces UNIX System Programming CS UNIX System Programming

74 UNIX System Programming
J. Tan Computer Networks Address Examples d deletes the current line 6d deletes line 6 /^$/d deletes all blank lines 1,10d deletes lines 1 through 10 1,/^$/d deletes from line 1 through the first blank line. Usage: sed ‘1,/^$/d’ fname UNIX System Programming CS UNIX System Programming

75 UNIX System Programming
J. Tan Computer Networks Address Examples /^$/,/$/d deletes from the first blank line through the last line of the file /^$/,10d deletes from the first blank line through line 10. /^Co*t/,/[0-9]$/d deletes from the first line that begins with Cot, Coot, Cooot, etc., through the first line that ends with a digit. UNIX System Programming CS UNIX System Programming

76 UNIX System Programming
J. Tan Computer Networks Sed commands Although sed contains many editing commands, we are only going to concern ourselves with a small subset ► s - substitute ► a - append ► i - insert ► c - change ► d - delete ► h,H - put pattern space into hold space ► g,G - Get hold space ► p - print ► ! - negate ► r - read ► w - write ► y - transform ► q - quit UNIX System Programming CS UNIX System Programming

77 UNIX System Programming
J. Tan Computer Networks sed Command List ► a\ text Append. Place text on output before reading next input line. ► b label Branch to the : command bearing the label. If label is empty, branch to the end of the script. ► c\ text Change. Delete pattern space. Place text on the output. Start the next cycle. ► d Delete the pattern space. Start the next cycle. UNIX System Programming CS UNIX System Programming

78 UNIX System Programming
J. Tan Computer Networks sed Command List ► D Delete the initial segment of the pattern space through the first NEWLINE. Start the next cycle. ► g Replace the contents of the pattern space with the contents of the hold space. ► G Append the contents of the hold space to the contents of the pattern space. UNIX System Programming CS UNIX System Programming

79 UNIX System Programming
J. Tan Computer Networks sed Command List ► h Replace the contents of the hold space with the contents of the pattern space. ► H Append the contents of the hold space to the contents of the pattern space. ► i\ text Insert. Place text on standard output. ► l List the pattern space on standard output in an unambiguous form. Non-printable characters are displayed in octal notation and long lines are folded. UNIX System Programming CS UNIX System Programming

80 UNIX System Programming
J. Tan Computer Networks sed Command List ► n Copy the pattern space to standard output. Replace the pattern space with the next line of input. ► N Append the next line of input to the pattern space with an embedded NEWLINE. (The current line number changes.) ► p Print. Copy the pattern space to standard output. ► P Copy the initial segment of the pattern space up through the first NEWLINE to standard output. ► q Quit. Branch to the end of the script. Do not start a new cycle. UNIX System Programming CS UNIX System Programming

81 UNIX System Programming
J. Tan Computer Networks sed Command List ► r rfile Read the contents of rfile. Place them on standard output before reading the next input line. UNIX System Programming CS UNIX System Programming

82 UNIX System Programming
J. Tan Computer Networks sed Command List ► s /regular expression/replacement/flags Substitute the replacement string for instances of the regular expression in the pattern space. Flag is zero or more of: n Substitute the nth occurrence of the regex g Global. Substitute all non-overlapping instances of the regex rather than just the first one. p Print the pattern space if a replacement was made. w wfile Write. Append the pattern space to wfile if a replacement was made. UNIX System Programming CS UNIX System Programming

83 UNIX System Programming
J. Tan Computer Networks sed Command List ► t label Test.Branch to the : command bearing the label if any substitutions have been made since the most recent reading of the input line or execution of a t. If label is empty, branch to end of script. ► w wfile Write. Append the pattern space to wfile. The first occurrence of a w will caused wfile to be cleared. Subsequent invocations of w will append. Each time the sed command is used, wfile is overwritten. ► x Exchange the contents of the pattern and the hold space. UNIX System Programming CS UNIX System Programming

84 UNIX System Programming
J. Tan Computer Networks sed Command List ► y/string1/string2/ Transform. Replace all occurrences of the characters in string1 with the characters in string2. string1 and string2 must have the same number of characters. e.g., file contents: The Lord of the Rings > sed ‘y/Ts/Xz/’ file yields: Xhe Lord of Xhe Ringz UNIX System Programming CS UNIX System Programming

85 UNIX System Programming
J. Tan Computer Networks sed Command List ►!function Don't. Apply the function (or group if function is {) only to those lines not selected by the address(s). ►: label This command does nothing. It is the label for the b and t to branch to. ► = Place the current line number on standard output as a line. ► { Execute the following commands through a matching } only when the pattern space is selected. UNIX System Programming CS UNIX System Programming

86 UNIX System Programming
J. Tan Computer Networks sed Command List ► # If an # appears as the first character on a line of scrip  comment. If it is the first line of the file and the character after the # is an n. Then the default output is suppressed (just like sed -n). The rest of the line after the n is also ignored. A script file must contain at least one non- comment line. UNIX System Programming CS UNIX System Programming

87 UNIX System Programming
J. Tan Computer Networks Substitute Syntax: [address(es)]s/pattern/replacement/[flags] pattern - search pattern replacement - replacement string for pattern UNIX System Programming CS UNIX System Programming

88 UNIX System Programming
J. Tan Computer Networks Substitute flags - optionally any of the following ► n a number from 1 to 512 indicating which occurrence of pattern should be replaced ► g global, replace all occurrences of pattern in pattern space ► p print contents of pattern space ► w file write the contents of pattern space to file UNIX System Programming CS UNIX System Programming

89 UNIX System Programming
J. Tan Computer Networks Replacement Patterns Substitute can use several special characters in the replacement string & - replaced by the entire string matched in the regular expression for pattern e.g., ®the UNIX operating system … s/.NI./wonderful &/ ®the wonderful UNIX operating system ... UNIX System Programming CS UNIX System Programming

90 UNIX System Programming
J. Tan Computer Networks Replacement Patterns Substitute can use several special characters in the replacement string \n - replaced by the nth substring (or subexpression) previously specified using \( and \) \ - used to escape the ampersand (&) and the backslash (\) UNIX System Programming CS UNIX System Programming

91 Replacement Pattern Examples
J. Tan Computer Networks Replacement Pattern Examples $ cat test1 first:second one:two $ sed ‘s/\(.*\):\(.*\)/\2:\1/’ test1 second:first two:one UNIX System Programming CS UNIX System Programming

92 Other Substitute Examples
J. Tan Computer Networks Other Substitute Examples $ s/cat/dog/ Substitute dog for the first occurrence of cat in pattern space $ s/Tom/Dick/2 Substitutes Dick for the second occurrence of Tom in the pattern space UNIX System Programming CS UNIX System Programming

93 Other Substitute Examples
J. Tan Computer Networks Other Substitute Examples $ s/wood/plastic/p Substitutes plastic for the first occurrence of wood and outputs (prints) pattern space $ s/Mr/Dr/g Substitutes Dr for every occurrence of Mr in pattern space UNIX System Programming CS UNIX System Programming

94 Append, Insert, and Change
J. Tan Computer Networks Append, Insert, and Change Syntax for these commands is a little strange because they must be specified on multiple lines, e.g., Append: [address]a\ text Insert: [address]i\ Change: [address(es)]c\ UNIX System Programming CS UNIX System Programming

95 UNIX System Programming
J. Tan Computer Networks Append and Insert Append places text behind the current line in pattern space Insert places text before the current line in pattern space UNIX System Programming CS UNIX System Programming

96 UNIX System Programming
J. Tan Computer Networks Append and Insert Each of these commands requires a \ following it to escape the NEWLINE that is entered when you press RETURN (or ENTER). text must begin on the next line. To use multiple lines, simply ESCAPE all but the last with a \ If text begins with whitespace, sed will discard it unless you start the line with a \ Append and Insert can only be applied to a single line address, not a range of lines UNIX System Programming CS UNIX System Programming

97 UNIX System Programming
J. Tan Computer Networks Insert example Put the following into a file named insert.scr /third/i\ Once upon a time\ Zatoichi likes sushi > sed –f insert.scr test1 test1 contents: output: first:second first:second third:fourth Once upon a time □ Zatoichi likes sushi ying:yang third:fourth ying:yang UNIX System Programming CS UNIX System Programming

98 UNIX System Programming
J. Tan Computer Networks Insert example Put the following into a file named append.scr /third/a\ Once upon a time\ Zatoichi likes sushi Put the following into a file named change.scr /third/c\ sed –f append.scr test1 sed –f change.scr test1 UNIX System Programming CS UNIX System Programming

99 UNIX System Programming
J. Tan Computer Networks Change Unlike i (insert) and a (append), c (change) can be applied to either a single line address or a range of addresses When applied to a range, the entire range is replaced by text specified with change, not each line UNIX System Programming CS UNIX System Programming

100 UNIX System Programming
J. Tan Computer Networks Change However, if the c (change) command is executed as one of a group of commands enclosed in { } that act on a range of lines, each line will be replaced with text UNIX System Programming CS UNIX System Programming

101 UNIX System Programming
J. Tan Computer Networks Change Usage Locate line with the word first in it, and replace all lines through the first blank line with <punk’ed> Create file called change2.scr and type in: /^first/,/^$/c\ <Punk’ed> > sed –f change2.scr test1 Assume test1 has contents: Output: first:second <punk’ed> third:fourth ying:yang ying:yang UNIX System Programming CS UNIX System Programming

102 UNIX System Programming
J. Tan Computer Networks Side Effects Change clears the pattern space. No command following the change command in the script is applied Insert and Append do not clear the pattern space but none of the commands in the script will be applied to the text that is inserted or appended. UNIX System Programming CS UNIX System Programming

103 UNIX System Programming
J. Tan Computer Networks Side Effects No matter what changes are made to pattern space, the text from change, insert, or append will be output as supplied This is true even if default output is suppressed using the -n option, text will still be output for these commands UNIX System Programming CS UNIX System Programming

104 UNIX System Programming
J. Tan Computer Networks Delete Delete takes zero, one, or two addresses and deletes the current pattern space, or the pattern space when it matches the first address, or the range of lines contained within two addresses UNIX System Programming CS UNIX System Programming

105 UNIX System Programming
J. Tan Computer Networks Delete After a delete, no other commands are applied to pattern space. Instead, the next line is read into pattern space and the script starts over at the top UNIX System Programming CS UNIX System Programming

106 UNIX System Programming
J. Tan Computer Networks Delete Delete deletes the entire line, not just the part that matches the address. To delete a portion of a line, use substitute with a blank replacement string UNIX System Programming CS UNIX System Programming

107 UNIX System Programming
J. Tan Computer Networks Caveats A bang ( ! ) after an address or address range command is applied to all lines not in the range. Example: 1,5!d would delete all lines except 1 through 5 UNIX System Programming CS UNIX System Programming

108 UNIX System Programming
J. Tan Computer Networks Caveats /black/!s/cow/horse/ would substitute horse for cow on all lines except those that contained black The brown cow The brown horse The black cow The black cow” UNIX System Programming CS UNIX System Programming

109 UNIX System Programming
J. Tan Computer Networks Read and Write Read ([address]r filename) Write ([address1[, address2]]w filename) allow you to work directly with files Both take a single argument, a file name UNIX System Programming CS UNIX System Programming

110 UNIX System Programming
J. Tan Computer Networks Read and Write Read takes an optional single address and reads the specified file into pattern space after the addressed line. It cannot operate on a range of lines Write takes an optional line address or range of addresses and writes the contents of pattern space to the specified file UNIX System Programming CS UNIX System Programming

111 UNIX System Programming
J. Tan Computer Networks Read and Write Must have a single space between the r or w command and the filename. No spaces after the filename or sed will include them as part of the file name Read will not complain if the file does not exist. UNIX System Programming CS UNIX System Programming

112 UNIX System Programming
J. Tan Computer Networks Read and Write Write will create it if it does not exist. If file exists, write will overwrite it. Write will do an append if: file was created during the current invocation of sed. UNIX System Programming CS UNIX System Programming

113 UNIX System Programming
J. Tan Computer Networks Read and Write If there are multiple commands writing to the same file, each will append to it. Maximum files allowed per script: 10. UNIX System Programming CS UNIX System Programming

114 UNIX System Programming
J. Tan Computer Networks Uses for Read and Write Read can be used for substitution in form letters $ cat sedscr /organizations:$/r company.list $ cat company.list Netflix Amazon Ebay UNIX System Programming CS UNIX System Programming

115 UNIX System Programming
J. Tan Computer Networks Uses for Read and Write $ cat formletter To purchase your own copy of Zatoichi, contact any of the following organizations: Thank you UNIX System Programming CS UNIX System Programming

116 UNIX System Programming
J. Tan Computer Networks Uses for Read and Write $ sed –f sedscr formletter To purchase your own copy of Zatoichi, contact any of the following organizations: Netflix Amazon Ebay Thank you UNIX System Programming CS UNIX System Programming

117 UNIX System Programming
J. Tan Computer Networks Uses for Read and Write Write can be used to pull selected lines and segregate them into individual files Suppose customer file (customers) contents: John Pikachu WA Benny Snorlax CA Jill Pichu VA Jane Suicune CA Bob Entei VA Ann Celebi CA UNIX System Programming CS UNIX System Programming

118 UNIX System Programming
J. Tan Computer Networks Uses for Read and Write Objective: segregate all of the customers from each state into a file of their own cat sedscr2 /CA$/w customers.CA /VA$/w customers.VA /WA$/w customers.WA sed -f sedscr customers will create files for each state that contain only the customers from that state UNIX System Programming CS UNIX System Programming

119 UNIX System Programming
J. Tan Computer Networks Transform The Transform command (y) operates like tr, it does a 1-to-1 or character-to-character replacement Transform accepts zero, one or two addresses [address[, address]]y/abc/xyz/ every a within the specified address(es) is transformed to an x. The same is true for b to y and c to z UNIX System Programming CS UNIX System Programming

120 UNIX System Programming
J. Tan Computer Networks Transform y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/ changes all lower case characters on the addressed line to upper case. If you only want to do specific characters, or a word, in the line, it is much more difficult and requires use of a hold space (temp buffer) UNIX System Programming CS UNIX System Programming

121 Copy Pattern Space to Hold Space
J. Tan Computer Networks Copy Pattern Space to Hold Space The h and H commands move the contents of pattern space to hold space h copies pattern space to hold space, replacing anything that was previously there H appends an embedded NEWLINE (\n) to whatever is currently in hold space followed by the contents of pattern space Even if the hold space is empty, the embedded NEWLINE is appended to hold space first UNIX System Programming CS UNIX System Programming

122 Get Contents of Hold Space
J. Tan Computer Networks Get Contents of Hold Space g and G get the contents of hold space and place it in pattern space g copies the contents of hold space into pattern space, replacing whatever was there UNIX System Programming CS UNIX System Programming

123 Get Contents of Hold Space
J. Tan Computer Networks Get Contents of Hold Space G appends an embedded NEWLINE character (\n) followed by the contents of hold space to pattern space Even if pattern space is empty, the NEWLINE is still appended to pattern space before the contents of the hold space UNIX System Programming CS UNIX System Programming

124 UNIX System Programming
J. Tan Computer Networks G, y, h examples Now, suppose that I want to capitalize a specific word in a file, specifically, every time I see a the abc statement, I want to change it to the ABC statement UNIX System Programming CS UNIX System Programming

125 UNIX System Programming
J. Tan Computer Networks G, y, h examples /the .* statement/{ #e.g., the bub statement h # pattern  hold space s/.*the \(.*\) statement.*/\1/ #in pattern sp y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/G # hold  pattern (embeded \n 1st) Hold space: s/\(.*\)\n\(.*the \).*\( bub statement.*\)/\2\1\3/ Pattern sp: } BUB\n the bub statement\n UNIX System Programming CS UNIX System Programming

126 UNIX System Programming
J. Tan Computer Networks So How Does It Work? The address limits the procedure to lines that match the .* statement h copies the current line into hold space, replacing whatever was there After the h, pattern space and hold space are identical pattern space - find the print statement hold space find the print statement UNIX System Programming CS UNIX System Programming

127 UNIX System Programming
J. Tan Computer Networks So How Does It Work? s/.*the \(.*\) statement.* /\1/ extracts the name of the statement (\1) and replaces the entire line with it pattern space – whatever is in (\1) hold space unchanged UNIX System Programming CS UNIX System Programming

128 UNIX System Programming
J. Tan Computer Networks So How Does It Work? y/abc .../ABC …/ changes each lowercase letter to uppercase pattern space – capitalize (\1) hold space unchanged UNIX System Programming CS UNIX System Programming

129 UNIX System Programming
J. Tan Computer Networks So How Does It Work? The G command appends a NEWLINE (\n) to pattern space followed by the line saved in hold space s/\(.*\)\n(.*the \).*\( statement.*\)/\2\1\3/ matches three different parts of the pattern space and rearranges them UNIX System Programming CS UNIX System Programming

130 UNIX System Programming
J. Tan Computer Networks Print The print command (p) forces the pattern space to stdout, even if the -n or #n option has been specified Syntax: [address1[, address2]]p Note: if the -n or #n option has not been specified, p will cause the line to be output twice! UNIX System Programming CS UNIX System Programming

131 UNIX System Programming
J. Tan Computer Networks Print Examples: 1,5p will display lines 1 through 5 /^$/,/$/p will display the lines from the first blank line through the last line of the file UNIX System Programming CS UNIX System Programming

132 UNIX System Programming
J. Tan Computer Networks Quit Quit causes sed to stop reading new input lines and stop sending them to stdout. It takes at most a single line address Once a line matching the address is reached, the script will be terminated This can be used to save time when you only want to process some portion of the beginning of a file UNIX System Programming CS UNIX System Programming

133 UNIX System Programming
J. Tan Computer Networks Quit Example: To print the first 100 lines of a file (like head) use: sed 100q filename sed will, by default, send the first 100 lines of filename to standard output and then quit processing UNIX System Programming CS UNIX System Programming

134 Regex Metacharacters for sed
J. Tan Computer Networks Regex Metacharacters for sed UNIX System Programming CS UNIX System Programming


Download ppt "UNIX System Programming"

Similar presentations


Ads by Google