Presentation is loading. Please wait.

Presentation is loading. Please wait.

AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated.

Similar presentations


Presentation on theme: "AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated."— Presentation transcript:

1 AWK

2 text processing languge

3 awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated versions ▫NAWK  New awk ▫GAWK  Free Software Foundation’s version

4 awk Basics Basic form: ▫ awk options 'selection criteria {action}' file(s) Can use regular expressions Files are read one line at a time with contents as fields Fields are numbered ($1, $2, etc…) ▫Entire line is $0 Can run standalone Can run as a program Uses a blank as the default separator

5 -f Option (stored awk programs) awk programs can be stored in a file awk –f awkfile datafile ▫ -f filename is the awk program ▫ datafile contains the data

6 Example Find the TAs in the personnel file ▫The file is blank separated  -F defines the delimiter  Use “ \ “ to escape the blank (a blank after the \) ▫Note: the blank is the default seperator anyway ▫Title is in the 3 rd field # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # # awk -F\ '$3 == "TA" { print }' personnel.data Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 #

7 example To run an awk program ▫ personnel.data has the data ▫ findta.awk is the code  Looks for TA (3 rd parm)  Prints first name and telephone number (1 st and 5 th parms) ▫Note: what small formatting problem is here? # awk -F\ -f findta.awk personnel.data TAs Jinyue704-687-2222 Hadi704-687-3333 Done # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # cat findta.awk BEGIN { print "TAs"; } $3 == "TA" {print $1 $5} END { print "Done" }

8 print and printf Output goes to std out ▫can be redirected with > or |  redirected name must be in quotes:  # print $2, $1 | "sort" ▫ the output of the print goes to the sort routine print is unformatted printf allows formatting ▫%s – string  %-20s  20 char spaces, justified (-) ▫%d – integer  %8d  set aside 8 spaces for the number ▫%f – floating point  %4.8f  Set aside 4 chars to the left of the decimal point and 8 to the right ▫printf needs \n to start new line

9 Number processing AWK supports basic computation ▫ + - addition ▫ - - subtraction ▫ * - multiplication ▫ / - division ▫ % - modulus ▫ ^ - exponentiation Also supports: ▫ ++ - add one to itself (post and pre fix) ▫ += - add and assign to self ▫ -- - subtract one from self (post and pre fix) ▫ -= - subtract from self ▫ *= - multiply self ▫ /= - divide self

10 Variables and Expressions awk is loosely typed do not need to declare variables ▫ x = 5 do not need $ to access like sed ▫ print x strings are double quoted ▫ x = "This is a string" no string concatenater, done by context ▫ x = "string1"; y = "string2" print x y  Space is required some conversions done automatically ▫ x = "56"; y = 43; z = "abc" print x y # gives 5643 y converted to string print x + y # gives 99 + converts x to integer print y + z # gives 43+ converts z to integer 0

11 Comparison and Logical Operators awk supports string and numeric comparisons ▫== is the equality operator  = is for assignment ▫ can be used on strings  Beware of conversions when dealing with strings that consist of numbers ▫~ is used for regular expressions  $2 ~ /[dh]og/  parameter 2 matches hog or dog

12 Comparison and Logical Operators awk supports boolean operations ▫ && - and ▫ || - or ▫ ! - not

13 simple comparison Field 6 is number of years with organization ▫Find those with more than 5 years # awk '$6 > 5 { print $2 ", " $1 ":" $6}' personnelyears.data Kombol, Tony:6 Flintstone, Fred:10 # # cat personnelyears.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 #

14 Regular Expression comparison example Find the TAs and RAs including the URAs # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 Fred Flintstone RA 800123321 704-687-1212 Barney Rubble URA 800112233 704-687-3344 # awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel.data Jinyue Xia 704-687-2222 Hadi Hashemi 704-687-3333 Fred Flintstone 704-687-1212 Barney Rubble 704-687-3344 #

15 BEGIN and END Sections BEGIN and END allows for some pre and post processing ▫Both are optional General format: ▫ BEGIN { action } { action } END { action } ▫BEGIN's actions are done before the processing of the datafile begins  Good for headers, setup, etc. ▫END's actions are done after the processing of the datafile ends  Good for post processing, notes, etc.

16 another regular expression This is a more complex check using a file for the awk program ▫Check to see the ID is 800……  That is 800 followed by 6 characters # awk -f findbadid.awk personnelbad.data List of bad IDs follows Bad Id has a bad id:809123456 End of list # cat personnelbad.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 Bad Id LX 809123456 704-687-8890 0 # cat findbadid.awk BEGIN { print "List of bad IDs follows"; } $4 !~ /^800....../ { print $1 " " $2 " has a bad id:" $4}; END { print "End of list"; } #

17 awk file example # cat grades.data Fred Ziffle:99:A Arnold Ziffle: 55: F Tara Boomdea: 85:B Neo:100:A Buffy Summers: 72:C Sheldon Cooper:67:D Zorbon Prentwist: 88 : B Zorbax Bottlewit:88:B Bad Grade: 33: A # cat ckgrades.awk BEGIN { print "Listing Bs\n" } $3 == "B" { print $0 } END { print "\nDone" } # # awk -F: -f ckgrades.awk grades.data Listing Bs Tara Boomdea: 85:B Zorbax Bottlewit:88:B Done # Note: " : B " does not get matched

18 Positional Parameters Parameters are usually used as the fields of each line A parameter can be passed to the awk program ▫Used with a shell program ▫Must be in quotes in the program  e.g.  Instead of ▫ $4 > 12 ▫4 th parm in line is > 12 ▫ $4 > '$2' ▫4 th parm in line is > 2 nd parm passed to the program: ▫ prog.awk 50 82

19 Arrays awk supports arrays ▫arrays do not need to be "declared"  "declared" the minute they are used Arrays are associative ▫index can be  numeric  alphabetic ▫ thisday["Tue"] = "Tuesday"; thisday[2] = "Tuesday";  above are two array elements for the array thisday  each reference a separate string  printf("thisday[\"Tue\"] is %s", thisday["Tue"]) ; printf("thisday[2] is %s", thisday[2]) ; ▫Both will print "Tuesday" for the array referenced

20 Arrays ENVIRON[ ] ▫an assosciative array containing all the environmental variables # awk 'BEGIN{for (env in ENVIRON)print env "=" ENVIRON[env]}' SSH_CLIENT=10.23.161.139 59365 22 HOME=/home/tkombol TERM=xterm LESSOPEN=| /usr/bin/lesspipe %s SHELL=/bin/bash USER=tkombol _=/usr/bin/awk SHLVL=1 PWD=/home/tkombol SSH_CONNECTION=10.23.161.139 59365 152.15.95.103 22 LANG=en_US.UTF-8 MAIL=/var/mail/tkombol LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36: HISTCONTROL=ignoredups PATH=/usr/local/bin:/usr/bin:/bin:/usr/games LESSCLOSE=/usr/bin/lesspipe %s %s LOGNAME=tkombol SSH_TTY=/dev/pts/2 #

21 Built-in Variables awk has a set of built-in variables ▫Some can be overridden Built-In Variables VariableFunctionDefault NRCumulative # of lines read- FSInput Field Separatorspace OFSOutput Field Separatorspace OFMTDefault FP format%.6f RSRecord separatornewline NFNumber of fields in current line- FILENAMECurrent input file- ARGCNumber of arguments in command line- ARGVArray containing list of arguments- ENVIRONAssoc. array of all environment variables-

22 Functions awk has several built-in functions ▫() are optional if no parms  encouraged to use ▫Arithmetic functions ▫String functions

23 Arithmetic Functions int(x) sqrt(x)

24 String Functions length() ▫length of complete line length(x) ▫length of x tolower(s) ▫returns s as lower case toupper(s) ▫returns s as upper case substr(str,m) ▫returns string starting at m to end of string substr(str,m,n) ▫returns string starting at m for n characters index(s1,s2) ▫finds the position of s2 inside s2 split(str,arr,ch) ▫splits str int an array, the delimiter is ch system("cmd") ▫exectutes a system (Linux) command and returns exit status

25 If Syntax: ▫ if (cond true) { statements } else { statements } ▫Notes:  else is optional  {} not needed for single statements

26 For Syntax form 1: ▫ for ( startval ; condition ; control) statement  C like in form ▫Example:  for ( k=1 ; k<9 ; k++ ) print k Syntax form 2: ▫ for ( var in array) statement  Will scan every var in the array  Great for associative array  Non numeric indices  Gaps in array  See ENVIRON example in previous slide

27 While Syntax: ▫ while (cond is true) { statement(s) }

28 continue and break Continue and break can be used to stop all loops ▫for ▫while break ▫stops the loop continue ▫stops processing statements in this loop ▫continues to next iteration

29 Resources Awk - A Tutorial and Introduction - by Bruce BarnettAwk - A Tutorial and Introduction - by Bruce Barnett ▫http://www.grymoire.com/Unix/Awk.htmlhttp://www.grymoire.com/Unix/Awk.html Awk Tutorial - Main PageAwk Tutorial - Main Page ▫http://robert.wsi.edu.pl/awk/http://robert.wsi.edu.pl/awk/

30 Summary awk is a "primative" scripting language good for processing text files ▫filtering perl is a more modern replacement ▫"religious war" over which is better if you understand awk it will be a good basis to understant perl


Download ppt "AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated."

Similar presentations


Ads by Google