Presentation is loading. Please wait.

Presentation is loading. Please wait.

The awk Utility CS465 - Unix. Background awk was developed by –Aho, Weinberger, and Kernighan (of K & R) –Was further extended at Bell Labs Handles simple.

Similar presentations


Presentation on theme: "The awk Utility CS465 - Unix. Background awk was developed by –Aho, Weinberger, and Kernighan (of K & R) –Was further extended at Bell Labs Handles simple."— Presentation transcript:

1 The awk Utility CS465 - Unix

2 Background awk was developed by –Aho, Weinberger, and Kernighan (of K & R) –Was further extended at Bell Labs Handles simple data-reformatting jobs easily with just a few lines of code. Versions –awk - original version –nawk - new awk - improved awk –gawk - gnu awk - improved nawk

3 How awk works awk commands include patterns and actions –Scans the input line by line, searching for lines that match a certain pattern (or regular expression) –Performs a selected action on the matching lines awk can be used: –at the command line for simple operations –in programs or scripts for larger applications

4 Running awk From the Command Line: $ awk '/pattern/{action}' file OR From an awk script file: $ cat awkscript # This is a comment /pattern/ {action} $ awk –f awkscript file

5 awk ’s Format using Input from a File $ awk /pattern/ filename –awk will act like grep $ awk '{action}' filename –awk will apply the action to every line in the file $ awk '/pattern/ {action}' filename –awk will apply the action to every line in the file that matches the pattern

6 Example 1 Input $ cat pingfile PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes 64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms 64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms 64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms 64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms ----dt033n32.san.rr.com PING Statistics---- 128 packets transmitted, 127 packets received, 0% packet loss round-trip (ms) min/avg/max = 37/73/495 ms $ awk awk '/icmp/' pingfile Output 64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms 64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms 64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms 64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms

7 Example 1 Input $ cat pingfile PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes 64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms 64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms 64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms 64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms ----dt033n32.san.rr.com PING Statistics---- 128 packets transmitted, 127 packets received, 0% packet loss round-trip (ms) min/avg/max = 37/73/495 ms $ awk awk '{print $1}' pingfile Output PING 64 ----dt033n32.san.rr.com PING Statistics---- 128 round-trip

8 Example 1 Input $ cat pingfile PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes 64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms 64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms 64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms 64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms … ----dt033n32.san.rr.com PING Statistics---- 128 packets transmitted, 127 packets received, 0% packet loss round-trip (ms) min/avg/max = 37/73/495 ms $ awk awk '/icmp/ {print $5}' pingfile Output icmp_seq=0 icmp_seq=1 icmp_seq=2 icmp_seq=3

9 Records and Fields awk divides the input into records and fields –Each line is a record (by default) field-1 field-2 field-3 | | | v v v record 1 -> George Jones Admin record 2 -> Anthony Smith Accounting –Each record is split into fields, delimited by a special character (whitespace by default) Can change delimeter with –F or FS

10 awk field variables awk creates variables $1, $2, $3… that correspond to the resulting fields (just like a shell script). –$1 is the first field, $2 is the second… –$0 is a special field which is the entire line –NF is always set to the number of fields in the current line (no dollar sign to access)

11 Example #1 $ cat students Bill White 77777711980/01/01 Science Jill Blue 11111171978/03/20 Arts Ben Teal 71717171985/02/26 CompSci Sue Beige 17171711963/09/12 Science $ $ awk '/Science/{print $1, $2}' students Bill White Sue Beige $ Commas indicates that we want the output to be delimited by spaces (otherwise they are concatonated): $ awk '/Science/{print $1 $2}' students BillWhite SueBeige

12 Example #2 -No pattern given, so matches ALL lines -Text strings to print are placed in double quotes $ cat phonelist Joe Smith 774-0888 Mary Jones 772-2345 Hank Knight 494-8888 $ $ awk '{print "Name: ", $1, $2, \ " Telephone:", $3}' phonelist Name: Joe Smith Telephone: 774-0888 Name: Mary Jones Telephone: 772-2345 Name: Hank Knight Telephone: 494-8888 $

13 Example #3 $ grep small /etc/passwd small000:x:1164:102:Faculty - Pam Smallwood:/export/home/small000:/bin/ksh $ $ awk -F: '/small000/{print $5}' /etc/passwd Faculty - Pam Smallwood $ Given a username, display the person’s real name:

14 awk using Input from Commands You can run awk in a pipeline, using input from another command: $ command | awk '/pattern/ {action}' –Takes the output from the command and pipes it into awk which will then perform the action on all lines that match the pattern

15 Piped awk Input Example $ w | awk '/ksh/{print $1}' pugli766 gibbo201 nelso828 $ $ w 1:04pm up 25 day(s), 5:37, 6 users, load average: 0.00, 0.00, 0.01 User tty login@ idle JCPU PCPU what pugli766 pts/8 Tue10pm 3days -ksh lin318 pts/17 10:58am 1:45 vi choosesort small000 pts/18 12:43pm w mcdev712 pts/10 11:52am 14 1 vi adddata gibbo201 pts/12 12:15pm 18 -ksh nelso828 pts/16 7:17pm 17:43 -ksh $

16 Relational Operators awk can use relational operators (, =, ==, !=, ! ) to compare a field to a value –If the outcome of the comparison is true then the the action is performed Examples: –To print every record in the log.txt file in which the second field is larger than 10 $ awk '$2 > 10' log.txt –To print every record in the log.txt file which does NOT contain ‘Win32’ $ awk '!/Win32/' log.txt

17 Relational Operator Example $ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net) lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com) small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net) mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net) gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com) nelso828 pts/16 Jun 5 19:17 (65.100.138.177) $ $ who | awk '$4 < 6 {print $1, $3, $4, $5}' pugli766 Jun 3 22:24 nelso828 Jun 5 19:17 $

18 Piping awk output $ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net) lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com) small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net) mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net) gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com) nelso828 pts/16 Jun 5 19:17 (65.100.138.177) $ $ who | awk '$4 == 6 {print $1}' | sort gibbo201 lin318 mcdev712 small000 $

19 awk Programming awk programming is done by building a list –The list is a list of rules –Each rule is applied sequentially to each line (record) Example: /pattern1/ { action1 } /pattern2/ { action2 } /pattern3/ { action3 }

20 awk - pattern matching Before processing, lines can be matched with a pattern. /pattern/ { action }execute if line matches pattern The pattern is a regular expression. Examples: /^$/ { print "This line is blank" } /num/ { print "Line includes num" } /[0-9]+$/ { print "Integer at end:", $0 } /[A-z]+/ { print "String:", $0 } /^[A-Z]/{ print "Starts w/uppercase letter" }

21 awk program from a file The awk commands (program) can be placed into a file The –f (lowercase f) indicates that the commands come from a file whose name follows the –f $ awk –f awkfile datafile The contents of the file called awkfile will be used as the commands for awk

22 Example 1 $ cat students Bill White 3333331980/01/01 Science Jill Blue 3334441978/03/20 Arts Bill Teal 5555551985/02/26 CompSci Sue Beige 5557771963/09/12 Science $ cat awkprog /5?5/ {print $1, $2} /3*4/ {print $5} $ $ awk –f awkprog students Arts Bill Teal Sue Beige $ **NOTE: All patterns applied to each line before moving to next line

23 Example 2 $ cat students Bill White 3333331980/01/01 Science Jill Blue 3334441978/03/20 Arts Bill Teal 5555551985/02/26 CompSci Sue Beige 5557771963/09/12 Science $ cat awkprog /Science/ {print "Science stu:", $1, $2} /CompSci/ {print "Computing stu:", $1, $2} $ $ awk –f awkprog students Science stu: Bill White Computing stu: Bill Teal Science stu: Sue Beige $

24 More about Patterns Patterns can be: –Empty: will match everything –Regular expressions: /reg-expression/ –Boolean Expressions: $2=="foo" && $7=="bar" –Ranges: /jones/,/smith/

25 Example - Boolean Expressions $ cat students Bill White 3333331980/01/01 Science Jill Blue 3334441978/03/20 Arts Bill Teal 5555551985/02/26 CompSci Sue Beige 5557771963/09/12 Science $ cat awkprog $3 <= 444444 {print "Not counted"} $3 > 444444 {print $2 ",", $1} $ $ awk –f awkprog students Not counted Teal, Bill Beige, Sue $

26 Example - Ranges $ cat students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci Sue Beige 555777 1963/09/12 Science $ $ awk '/333333/,/555555/' students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci $

27 More Built-In awk Variables Two types: Informative and Configuration Informative: NR = Current Record Number (start at 1) –Counts ALL records, not just those that match NF = Number of Fields in the Current Record FILENAME = Current Input Data File –Undefined in the BEGIN block

28 Example using NF $ cat names Pam Sue Laurie Bob Joe Bill Dave Joan Jill $ $ awk '{print NF}' names 3 4 2 0 $

29 Example using a boolen, NF, and NR $ cat names Pam Sue Laurie Bob Joe Bill Dave Joan Jill $ $ awk 'NF > 2 {print NR ":", NF, "fields"}' names 1: 3 fields 2: 4 fields $

30 Built-in awk functions log(expr) natural logarithm index(s1,s2) position of string s2 in string s1 length(s) string length substr(s,m,n) n-char substring of s starting at m tolower(s) converts string to lowercase printf() print formatted - like C printf

31 Example 2 Input PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes 64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms 64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms 64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms 64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms … Program /PING/ { print tolower($1) } /icmp/ { time = substr($7,6,2) print time } OutputPing 49 94 50 41 …

32 print & printf Use print in an awk statement to output specific field(s) printf is more versatile –works like printf in the C language –May contain a format specifier and a modifier

33 Format Specification A format specification consists of a percent symbol, a modifier, width and precision values, and a conversion character To display the third field as a floating point number with two decimal places: awk '{printf("%.2f\n", $3)}' file You can include additional text in the printf statement '{printf ("3rd value: %.2f\n", $3)}'

34 Specifiers, Width, Precision, & Modifiers Type Specifiers: %c Single character %d integer (decimal) %f Floating point %s String Between the % and the specifier you can place the width and precision %6.2f means a floating point number in a field of width 6 in which there are two decimal places Modifiers control details of appearance: - minus sign is the left justification modifier right justification) + plus sign forces the appearance of a sign (+,-) for numeric output 0 zero pads a right justified number with zeros

35 awk Variables Variables –No need for declaration Implicitly set to 0 AND the Empty String –Variable type is a combination of a floating-point and string –Variable is converted as needed, based on its use title = "Number of students" no = 100 weight = 13.4

36 Example 2 Input PING dt033n32.san.rr.com (24.30.138.50): 56 data bytes 64 bytes from 24.30.138.50: icmp_seq=0 ttl=48 time=49 ms 64 bytes from 24.30.138.50: icmp_seq=1 ttl=48 time=94 ms 64 bytes from 24.30.138.50: icmp_seq=2 ttl=48 time=50 ms 64 bytes from 24.30.138.50: icmp_seq=3 ttl=48 time=41 ms … Program /icmp/ { time = substr($7,6,2) printf( "%1.1f ms\n", time ); } Output49.0 ms 94.0 ms 50.0 ms 41.0 ms …

37 awk program execution BEGIN { …. } { …. } specification { ….. } END { ….. } Executes only once before reading input data Executes for each input line Executes at the end after all lines being processed Executes for each input line that matches specified /pattern/ or Boolean expression

38 Example #1: Count # lines in file - Set total to 0 before processing any lines - For every row in the file, execute {total = total + 1} - Print total after all lines processed. $ cat awkprog BEGIN {total = 0} {total = total + 1} END {print total " lines"} $ cat testfile Hello There Goodbye! $ $ awk –f awkprog testfile 2 lines $

39 Ex #2: Count lines containing a pattern $ cat Simpsons Marge34 Homer32 Lisa10 Bart11 Maggie01 $ cat countthem BEGIN {totalMa = 0; totalar = 0} /Ma/ { totalMa++ } /ar/ { totalar++ } END { print totalMa " Ma's" print totalar " ar's"} $ {totalpattern++} only executes if the line in filename has pattern appearing in the line. $ awk -f countthem Simpsons 2 Ma's 2 ar's $

40 Example #3: Add line numbers $ cat numawk BEGIN { print "Line numbers by awk" } { print NR ":", $0 } END { print "Done processing " FILENAME } $ cat testfile Hello There Goodbye! $ $ awk –f numawk testfile Line numbers by awk 1: Hello There 2: Goodbye! Done processing testfile $

41 More Built-In awk Variables Two types: Informative and Configuration Configuration FS = Input field separator OFS = Output field separator (default for both is space " " ) RS = Input record seperator ORS = Output record seperator (default for both is newline "\n" )

42 Example #1: Reverse 2 columns $ cat switch BEGIN{FS="\t"} {print $2 "\t" $1} $ awk -f switch Simpsons 34Marge 32Homer 10Lisa 11Bart 01Maggie $ Alternatively you could do the following: $ awk -F\t '{print $2 "\t" $1}' Simpsons NOTE: Columns separated by tabs

43 Example #2: Sum a column $ cat awksum2 BEGIN { FS="\t" sum = 0 } {sum = sum + $2} END { print "Done" print "Total sum is " sum } $ $ awk -f awksum2 Simpsons Done Total sum is 88 $

44 Example #3: Comma delimited file $ cat names Bill Jones,3333,M Pam Smith,5555,F Sue Smith,4444,F $ $ awk -F, '{print $2}' names 3333 5555 4444 $

45 Longer awk program $ cat awkprog BEGIN { print "Processing..." } # print number of fields in first line NR == 1 { print $0, NF, "fields"} /^Unix/ { print "Line starts with Unix: ", $0 } /Unix$/ { print "Line ends with Unix: " $0 } # finishing it up END {print NR " lines checked"} $

46 awk program execution $ cat datfile First Line Unix is great! What else is better? This is Unix Yes it is Unix Goodbye! $ $ awk -f awkprog datfile Processing... First Line 2 fields Line starts with Unix: Unix is great! Line ends with Unix: This is Unix Line ends with Unix: Yes it is Unix 6 lines checked $

47 awk programming language syntax if ( found == true )# if (expr) print “Found”; # {action1} else# else print “Not found”; # {action2} while ( i <= 100)# while (cond) { i = i + 1;# { actions... print i }# }

48 awk programming language syntax for (i=1; i < 10; i++ ) # for (set; test; incr) {# { sqr = i * i;#actions print i " squared is " sqr }# } do# do { i = i + 1; # { actions... print i }# } while ( i < 100);# while (cond);

49 awk – longer example Write an awk program that prints out content of a directory in the following format: BYTESFILE 24576 copyfile 736 copyfile.c 740 copyfile.c~ 24576 dirlist 989 dirlist.c 977 dirlist.c% 24576 envadv 185 envadv.c tmp 740 x.c Total: 73684 bytes in 9 regular files

50 awk example - code $ cat awkprog BEGIN {print " BYTES \t FILE"; sum=0; filenum=0 } # test for lines starting with - /^-/ { sum += $5 ++filenum printf ("%10d \t%s\n", $5, $9) } # test for directories - line starts with d /^d/ { print " \t", $9 } # conclusion END { print "\n Total: " sum " bytes in" print " " filenum " regular files" } $

51 awk example - output $ ls -l total 84 drwx------ 2 small000 faculty 512 Jun 2 13:44 sub2 -rwx------ 1 small000 faculty 224 Jun 3 10:35 sumnums -rw------- 1 small000 faculty 2 Jun 3 21:08 tab -rw------- 1 small000 faculty 187 Jun 8 11:15 tbook $ $ ls -l | awk –f awkprog BYTES FILE sub2 224sumnums 2tab 187tbook Total: 413 bytes in 3 regular files $

52 awk Handout Review awk examples on handout


Download ppt "The awk Utility CS465 - Unix. Background awk was developed by –Aho, Weinberger, and Kernighan (of K & R) –Was further extended at Bell Labs Handles simple."

Similar presentations


Ads by Google