Presentation is loading. Please wait.

Presentation is loading. Please wait.

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.

Similar presentations


Presentation on theme: "CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of."— Presentation transcript:

1 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of program q Awk is a pattern action language q The language looks a little like C but automatically handles input, field splitting, initialization, and memory management u Built-in string and number data types u No variable type declarations q Awk is a great prototyping language u Start with a few lines and keep adding until it does what you want

2 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 2 History q Originally designed/implemented in 1977 by Al Aho, Peter Weinberger, and Brian Kernigan u In part as an experiment to see how grep and sed could be generalized to deal with numbers as well as text u Originally intended for very short programs u But people started using it and the programs kept getting bigger and bigger! q In 1985, new awk, or nawk, was written to add enhancements to facilitate larger program development u Major new feature is user defined functions

3 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 3 q Other enhancements in nawk include: u Dynamic regular expressions l Text substitution and pattern matching functions u Additional built-in functions and variables u New operators and statements u Input from more than one file u Access to command line arguments q nawk also improved error messages which makes debugging considerably easier under nawk than awk q On most systems, nawk has replaced awk u On ours, both exist

4 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 4 Tutorial q Program structure q Running an Awk program q Error messages q Output from Awk q Record selection q BEGIN and END q Number crunching q Handling text q Built-in functions q Control flow q Arrays

5 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 5 BEGIN pattern {action}. pattern { action} END Structure of an AWK Program q An Awk program consists of: u An optional BEGIN segment l For processing to execute prior to reading input u pattern - action pairs l Processing for input data l For each pattern matched, the corresponding action is taken u An optional END segment l Processing after end of input data

6 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 6 Pattern-Action Structure q Every program statement has to have a pattern, an action, or both q Default pattern is to match all lines q Default action is to print current record q Patterns are simply listed; actions are enclosed in { }s q Awk scans a sequence of input lines, or records, one by one, searching for lines that match the pattern u Meaning of match depends on the pattern u /Beth/ matches if the string “Beth” is in the record u $3 > 0 matches if the condition is true

7 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 7 Running an AWK Program q There are several ways to run an Awk program u awk ‘program’ input_file(s) l program and input files are provided as command- line arguments u awk ‘program’ l program is a command-line argument; input is taken from standard input (yes, awk is a filter!) u awk -f program_file_name input_files l program is read from a file

8 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 8 Errors q If you make an error, Awk will provide a diagnostic error message awk '$3 == 0 [ print $1 }' emp.data awk: syntax error near line 1 awk: bailing out near line 1 q Or if you are using nawk nawk '$3 == 0 [ print $1 }' emp.data nawk: syntax error at source line 1 context is $3 == 0 >>> [ <<< 1 extra } 1 extra [ nawk: bailing out at source line 1 1 extra } 1 extra [

9 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 9 Some of the Built-In Variables q NF - Number of fields in current record q NR - Number of records read so far q $0- Entire line q $n- Field n q $NF - Last field of current record

10 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 10 Simple Output From AWK q Printing Every Line u If an action has no pattern, the action is performed fo all input lines l { print } will print all input lines on stdout l { print $0 } will do the same thing q Printing Certain Fields u Multiple items can be printed on the same output line with a single print statement u { print $1, $3 } u Expressions separated by a comma are, by default, separated by a single space when output

11 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 11 q NF, the Number of Fields u Any valid expression can be used after a $ to indicate a particular field u One built-in expression is NF, or Number of Fields u { print NF, $1, $NF } will print the number of fields, the first field, and the last field in the current record q Computing and Printing u You can also do computations on the field values and include the results in your output u { print $1, $2 * $3 }

12 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 12 q Printing Line Numbers u The built-in variable NR can be used to print line numbers u { print NR, $0 } will print each line prefixed with its line number q Putting Text in the Output u You can also add other text to the output besides what is in the current record u { print “total pay for”, $1, “is”, $2 * $3 } u Note that the inserted text needs to be surrounded by double quotes

13 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 13 Fancier Output q Lining Up Fields u Like C, Awk has a printf function for producing formatted output u printf has the form l printf( format, val1, val2, val3, … ) { printf(“total pay for %s is $%.2f\n”, $1, $2 * $3) } u When using printf, formatting is under your control so no automatic spaces or NEWLINEs are provided by Awk. You have to insert them yourself. { printf(“%-8s %6.2f\n”, $1, $2 * $3 ) }

14 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 14 Awk as a Filter q Since Awk is a filter, you can also use pipes with other filters to massage its output even further q Suppose you want to print the data for each employee along with their pay and have it sorted in order of increasing pay awk ‘{ printf(“%6.2f %s\n”, $2 * $3, $0) }’ emp.data | sort

15 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 15 Selection q Awk patterns are good for selecting specific lines from the input for further processing q Selection by Comparison u $2 >=5 { print } q Selection by Computation u $2 * $3 > 50 { printf(“%6.2f for %s\n”, $2 * $3, $1) } q Selection by Text Content u $1 == “Susie” u /Susie/ q Combinations of Patterns u $2 >= 4 || $3 >= 20

16 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 16 Data Validation q Validating data is a common operation q Awk is excellent at data validation u NF != 3 { print $0, “number of fields not equal to 3” } u $2 < 3.35 { print $0, “rate is below minimum wage” } u $2 > 10 { print $0, “rate exceeds $10 per hour” } u $3 < 0 { print $0, “negative hours worked” } u $3 > 60 { print $0, “too many hours worked” }

17 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 17 BEGIN and END q Special pattern BEGIN matches before the first input line is read; END matches after the last input line has been read q This allows for initial and wrap-up processing BEGIN { print “NAME RATE HOURS”; print “” } { print } END { print “total number of employees is”, NR }

18 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 18 Computing with AWK q Counting is easy to do with Awk $3 > 15 { emp = emp + 1} END { print emp, “employees worked more than 15 hrs”} q Computing Sums and Averages is also simple { pay = pay + $2 * $3 } END { print NR, “employees” print “total pay is”, pay print “average pay is”, pay/NR }

19 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 19 Handling Text q One major advantage of Awk is its ability to handle strings as easily as many languages handle numbers q Awk variables can hold strings of characters as well as numbers, and Awk conveniently translates back and forth as needed q This program finds the employee who is paid the most per hour $2 > maxrate { maxrate = $2; maxemp = $1 } END { print “highest hourly rate:”, maxrate, “for”, maxemp }

20 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 20 q String Concatenation u New strings can be created by combining old ones { names = names $1 “ “ } END { print names } q Printing the Last Input Line u Although NR retains its value after the last input line has been read, $0 does not { last = $0 } END { print last }

21 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 21 Built-in Functions q Awk contains a number of built-in functions. length is one of them. q Counting Lines, Words, and Characters using length ( a poor man’s wc ) { nc = nc + length($0) + 1 nw = nw + NF } END { print NR, “lines,”, nw, “words,”, nc, “characters” }

22 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 22 Control Flow Statements q Awk provides several control flow statements for making decisions and writing loops q If-Else $2 > 6 { n = n + 1; pay = pay + $2 * $3 } END { if (n > 0) print n, “employees, total pay is”, pay, “average pay is”, pay/n else print “no employees are paid more than $6/hour” }

23 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 23 Loop Control q While # interest1 - compute compound interest # input: amount rate years # output: compound value at end of each year { i = 1 while (i <= $3) { printf(“\t%.2f\n”, $1 * (1 + $2) ^ i) i = i + 1 }

24 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 24 q For # interest2 - compute compound interest # input: amount rate years # output: compound value at end of each year { for (i = 1; i <= $3; i = i + 1) printf(“\t%.2f\n”, $1 * (1 + $2) ^ i) }

25 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 25 Arrays q Awk provides arrays for storing groups of related data values # reverse - print input in reverse order by line { line[NR] = $0 } # remember each line END { i = NR# print lines in reverse order while (i > 0) { print line[i] i = i - 1 }

26 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 26 Useful “One(or so)-liners” q END { print NR } q NR == 10 q { print $NF } q {field = $NF } END { print field } q NF > 4 q $NF > 4 q { nf = nf + NF } END { print nf }

27 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 27 q /Beth/ { nlines = nlines + 1 } END { print nlines } q $1 > max { max = $1; maxline = $0 } END { print max, maxline } q NF > 0 q length($0) > 80 q { print NF, $0} q { print $2, $1 } q { temp = $1; $1 = $2; $2 = temp; print } q { $2 = “”; print }

28 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 28 q { for (i = NF; i > 0; i = i - 1) printf(“%s “, $i) printf(“/n”) } q { sum = 0 for (i = 1; i <= NF; i = i + 1) sum = sum + $i print sum { q { for (i = 1; i <= NF; i = i + 1) sum = sum $i } END { print sum }

29 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 29 Review of Awk Principles q Awk’s purpose: to give Unix a general purpose programming language that handles text (strings) as easily as numbers u This makes Awk one of the most powerful of the Unix utilities q Awk process fields while ed/sed process lines q nawk (new awk) is the new standard for Awk u Designed to facilitate large awk programs q Awk gets it’s input from u files u redirection and pipes u directly from standard input

30 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 30 BEGIN pattern {action}. pattern { action} END Structure of an AWK Program q An Awk program consists of: u An optional BEGIN segment l For processing to execute prior to reading input u pattern - action pairs l Processing for input data l For each pattern matched, the corresponding action is taken u An optional END segment l Processing after end of input data

31 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 31 Pattern-Action Pairs q Both are optional, but one or the other is required u Default pattern is match every record u Default action is print record q Patterns u BEGIN and END u expressions l $3 < 100 l $4 == “Asia” u string-matching l /regex/ - /^.*$/ l string - abc –matches the first occurrence of regex or string in the record

32 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 32 u compound l #3 < 100 && $4 == “Asia” –&& is a logical AND –|| is a logical OR u range l NR == 10, NR == 20 –matches records 10 through 20 inclusive q Patterns can take any of these forms and for /regex/ and string patterns will match the first instance in the record

33 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 33 Regular Expressions in Awk q Awk uses the same regular expressions we’ve been using u ^ $ - beginning of/end of line u. - any character u [abcd] - character class u [^abcd] - negated character class u [a-z] - range of characters u (regex1|regex2) - alternation u * - zero or more occurrences of preceding expression u + - one or more occurrences of preceding expression u ? - zero or one occurrence of preceding expression u NOTE: the min max {m, n} or variations {m}, {m,} syntax is NOT supported

34 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 34 Awk Variables q $0, $1, $2, $NF q NR - Number of records processed q FNR - Number of records processed in current file q NF - Number of fields in current record q FILENAME - name of current input file q FS - Field separator, space or TAB by default q OFS - Output field separator, space or TAB default q ARGC/ARGV - Argument Count, Argument Value array u Used to get arguments from the command line

35 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 35 Command Line Arguments q Accessed via built-ins ARGC and ARGV q ARGC is set to the number of command line arguments q ARGV[ ] contains each of the arguments u For the command line u awk ‘script’ filename l ARGC == 2 l ARGV[0] == “awk” l ARGV[1] == “filename l the script is not considered an argument

36 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 36 q ARGC and ARGV can be used like any other variable q The can be assigned, compared, used in expressions, printed q They are commonly used for verifying that the correct number of arguments were provided

37 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 37 Operators q = assignment operator; sets a variable equal to a value or string q == equality operator; returns TRUE is both sides are equal q != inverse equality operator q && logical AND q || logical OR q ! logical NOT q, = relational operators q +, -, /, *, %, ^ q String concatenation

38 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 38 Control Flow Statements q Awk provides several control flow statements for making decisions and writing loops q If-Else if (expression is true or non-zero){ statement1 } else { statement2 } where statement1 and/or statement2 can be multiple statements enclosed in curly braces { }s u the else and associated statement2 are optional

39 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 39 Loop Control q While while (expression is true or non-zero) { statement1 }

40 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 40 q For for(expression1; expression2; expression3) { statement1 } u This has the same effect as: expression1 while (expression2) { statement1 expression3 } u for(;;) is an infinite loop

41 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 41 q Do While do { statement1 } while (expression)

42 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 42 Built-In Functions q Arithmetic u sin, cos, atan, exp, int, log, rand, sqrt q String u length, substitution, find substrings, split strings q Output u print, printf, print and printf to file q Special u system - executes a Unix command l system(“clear”) to clear the screen l Note double quotes around the Unix command u exit - stop reading input and go immediately to the END pattern-action pair if it exists, otherwise exit the script

43 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 43 Formatted Output q printf provides formatted output q Syntax is printf(“format string”, var1, var2, ….) q Format specifiers u %d - decimal number u %f - floating point number u %s - string u \n - NEWLINE u \t - TAB q Format modifiers u - left justify in column u n column width u.n number of decimal places to print

44 CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 44 printf Examples q printf(“I have %d %s\n”, how_many, animal_type) q printf(“%-10s has $%6.2f in their account\n”, name, amount) q printf(“%10s %-4.2f %-6d\n”, name, interest_rate, account_number) q printf(“\t%d\t%d\t%6.2f\t%s\n”, id_no, age, balance, name)


Download ppt "CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of."

Similar presentations


Ads by Google