Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.

Similar presentations


Presentation on theme: "Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks."— Presentation transcript:

1 Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.

2 Awk Works well on record-type data Reads input file(s) a line at a time Parses each line into fields Performs user-defined tests against each line, performs actions on matches

3 Other Common Uses Input validation Every record have same # of fields? Do values make sense (negative time, hourly wage > $1000, etc.)? Filtering out certain fields Searches Who got a zero on lab 3? Who got the highest grade? Many others

4 Invocation Can write little one-liners on the command line (very handy): print the 3 rd field of every line: $ awk '{ print $3 }' input.txt Execute an awk script file: $ awk –f script.awk input.txt Or, use this sha-bang as the first line, and give your script execute permissions: #!/bin/awk -f

5 Form of an AWK program AWK programs are entries of the form: pattern { action } pattern – some test, looking for a pattern (regular expressions) or C-like conditions  if null, actions are applies to every line action – a statement or set of statements  if not provided, the default action is to print the entire line, much like grep

6 Form of an AWK program Input files are parsed, a record (line) at a time Each line is checked against each pattern, in order There are 2 special patterns: BEGIN – true before any records are read END – true at end of input (after all records have been read)

7 Awk Features Patterns can be regular expressions or C like conditions. Each line of the input is matched against the patterns, one after the next. If a match occurs the corresponding action is performed. Input lines are parsed and split into fields, which are accessed by $1,…,$NF, where NF is a variable set to the number of fields. The variable $0 contains the entire line, and by default lines are split by white space (blanks, tabs)

8 Variables Not declared, nor typed No character type Only strings and floats (support for ints) $n refers to the nth field (where n is some integer value) # prints each field on the line for( i=1; i<=NF; ++i ) print $i

9 Some Built-in Variables FS – the input field separator OFS – the output field separator NF – # of fields; changes w/each record NR – the # of records read (so far). So, the current record # FNR – the # of records read so far, reset for each named file $0 – the entire input line

10 Example $ cat emp.data Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18 Print pay for those employees who actually worked $ awk ‘$3>0 {print $1, $2*$3}’ emp.data Kathy 40 Mark 100 Mary 121 Susie 76.5

11 Example – CSV file $ cat students.csv smith,john,js12 jones,fred,fj84 bee,sue,sb23 fife,ralph,rf86 james,jim,jj22 cook,nancy,nc54 banana,anna,ab67 russ,sam,sr77 loeb,lisa,guitarHottie $ cat getEmails.awk #!/bin/awk -f BEGIN { FS = "," } { printf( "%s's email is: %s@school.edu\n", $2, $3 ); } $ getEmails.awk students.csv john's email is: js12@school.edu fred's email is: fj84@school.edu sue's email is: sb23@school.edu ralph's email is: rf86@school.edu jim's email is: jj22@school.edu nancy's email is: nc54@school.edu anna's email is: ab67@school.edu sam's email is: sr77@school.edu lisa's email is: guitarHottie@school.edu

12 Example – output separator $ cat out.awk #!/bin/awk -f BEGIN { FS = ","; OFS = "-*-"; } { print $1, $2, $3; } $ out.awk students.csv smith-*-john-*-js12 jones-*-fred-*-fj84 bee-*-sue-*-sb23 fife-*-ralph-*-rf86 james-*-jim-*-jj22 cook-*-nancy-*-nc54 banana-*-anna-*-ab67 russ-*-sam-*-sr77 loeb-*-lisa-*-guitarHottie

13 Flow Control Awk syntax is much like C Same loops, if statements, etc. AWK: Aho, Weinberger, Kernighan Kernighan and Ritchie wrote the C language

14 Associative Arrays Awk also supports arrays that can be indexed by arbitrary strings. They are implemented using hash tables. Total[“Sue”] = 100; It is possible to loop over all indices that have currently been assigned values. for (name in Total) print name, Total[name];

15 Example using Associative Arrays $ cat scores Fred 90 Sue 100 Fred 85 Sam 70 Sue 98 Sam 50 Fred 70 $ cat total.awk { Total[$1] += $2} END { for (i in Total) print i, Total[i]; } $ awk -f total.awk scores Sue 198 Sam 120 Fred 245

16 Useful one-liners Line count: awk 'END {print NR}' grep awk '/pat/' head awk 'NR<=10' Add line #s to a file awk '{print NR, $0}' awk '{ printf( "%5d %s", NR, $0 )}' Many more. See the resources tab on the course webpage for links to more examples.


Download ppt "Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks."

Similar presentations


Ads by Google