Presentation is loading. Please wait.

Presentation is loading. Please wait.

AWK.

Similar presentations


Presentation on theme: "AWK."— Presentation transcript:

1 AWK

2 awk text processing languge

3 awk Created for Unix by Aho, Weinberger and Kernighan Basicaly an:
interpreted text processing programming language Updated versions NAWK New awk GAWK Free Software Foundation’s version

4 awk Basics Basic form: Can use regular expressions
awk options 'selection criteria {action}' file(s) Can use regular expressions Files read one line at a time with contents as fields Fields are numbered ($1, $2, etc…) Entire line is $0 Can run standalone Can run as a program Uses a blank as the default separator

5 -f Option (stored awk programs)
awk programs can be stored in a file awk –f awkfile datafile -f filename is the awk program datafile contains the data

6 Example Find the TAs in the personnel file The file is blank separated
-F defines the delimiter Use “\ “ to escape the blank (a blank after the \) Note: the blank is the default seperator anyway Title is in the 3rd field # cat personnel.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA # # awk -F\ '$3 == "TA" { print }' personnel.data

7 example To run an awk program personnel.data has the data
findta.awk is the code Looks for TA (3rd parm) Prints first name and telephone number (1st and 5th parms) Note: what small formatting problem is here? # awk -F\ -f findta.awk personnel.data TAs Jinyue Hadi Done # cat personnel.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA # cat findta.awk BEGIN { print "TAs"; } $3 == "TA" {print $1 $5} END { print "Done"

8 print and printf Output goes to std out print is unformatted
can be redirected with > or | redirected name must be in quotes: # print $2, $1 | "sort" the output of the print goes to the sort routine print is unformatted printf allows formatting %s – string %-20s 20 char spaces, justified (-) %d – integer %8d set aside 8 spaces for the number %f – floating point %4.8f Set aside 4 chars to the left of the decimal point and 8 to the right printf needs \n to start new line

9 Number processing AWK supports basic computation Also supports:
+ - addition - - subtraction * - multiplication / - division % - modulus ^ - exponentiation Also supports: ++ - add one to itself (post and pre fix) += - add and assign to self -- - subtract one from self (post and pre fix) -= - subtract from self *= - multiply self /= - divide self

10 Variables and Expressions
awk is loosely typed do not need to declare variables x = 5 do not need $ to use variables like sed or bash print x strings are double quoted x = "This is a string" no string concatenater, done by context x = "string1"; y = "string2" print x y Space is required some conversions done automatically x = "56"; y = 43; z = "abc" print x y # gives y converted to string print x + y # gives converts x to integer print y + z # gives 43 + converts z to integer 0

11 Comparison and Logical Operators
awk supports string and numeric comparisons == is the equality operator = is for assignment < and > can be used on strings Beware of conversions when dealing with strings that consist of numbers ~ is used for regular expressions $2 ~ /[dh]og/ parameter 2 matches hog or dog

12 Comparison and Logical Operators
awk supports boolean operations && - and || - or ! - not

13 simple comparison Field 6 is number of years with organization
Find those with more than 5 years # awk '$6 > 5 { print $2 ", " $1 ":" $6}' personnelyears.data Kombol, Tony:6 Flintstone, Fred:10 # # cat personnelyears.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA Fred Flintstone RA Barney Rubble URA #

14 Regular Expression comparison example
Find the TAs and RAs including the URAs # awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel.data Jinyue Xia Hadi Hashemi Fred Flintstone Barney Rubble # # cat personnel.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA Fred Flintstone RA Barney Rubble URA

15 BEGIN and END Sections BEGIN and END General format:
Allows for some pre and post processing Both are optional General format: BEGIN { action } { action } END { action } BEGIN's actions are done before the processing of the datafile begins Good for headers, setup, etc. END's actions are done after the processing of the datafile ends Good for post processing, notes, etc.

16 another regular expression
This is a more complex check using a file for the awk program Check to see the ID is 800…… That is 800 followed by 6 characters # cat findbadid.awk BEGIN { print "List of bad IDs follows"; } $4 !~ /^ / { print $1 " " $2 " has a bad id:" $4}; END { print "End of list"; # # cat personnelbad.data Tony Kombol Lecturer Jinyue Xia TA Hadi Hashemi TA Fred Flintstone RA Barney Rubble URA Bad Id LX # awk -f findbadid.awk personnelbad.data List of bad IDs follows Bad Id has a bad id: End of list

17 awk file example # cat ckgrades.awk BEGIN { print "Listing Bs\n" }
END { print "\nDone" # awk file example # awk -F: -f ckgrades.awk grades.data Listing Bs Tara Boomdea: 85:B Zorbax Bottlewit:88:B Done # # cat grades.data Fred Ziffle:99:A Arnold Ziffle: 55: F Tara Boomdea: 85:B Neo:100:A Buffy Summers: 72:C Sheldon Cooper:67:D Zorbon Prentwist: 88 : B Zorbax Bottlewit:88:B Bad Grade: 33: A Note: ": B" does not get matched

18 Positional Parameters
Parameters are usually used as the fields of each line A parameter can be passed to the awk program Used with a shell program Must be in quotes in the program e.g. Instead of $4 > 12 4th parm in line is > 12 $4 > '$2' 4th parm in line is > 2nd parm passed to the program: prog.awk 50 82

19 Arrays awk supports arrays Arrays are associative
arrays do not need to be "declared" "declared" the minute they are used Arrays are associative index can be numeric alphabetic thisday["Tue"] = "Tuesday"; thisday[2] = "Tuesday"; above are two array elements for the array thisday each reference a separate string printf("thisday[\"Tue\"] is %s", thisday["Tue"]) ; printf("thisday[2] is %s", thisday[2]) ; Both will print "Tuesday" for the array referenced

20 Arrays ENVIRON[ ] an assosciative array containing all the environmental variables # awk 'BEGIN{for (env in ENVIRON)print env "=" ENVIRON[env]}' SSH_CLIENT= HOME=/home/tkombol TERM=xterm LESSOPEN=| /usr/bin/lesspipe %s SHELL=/bin/bash USER=tkombol _=/usr/bin/awk SHLVL=1 PWD=/home/tkombol SSH_CONNECTION= LANG=en_US.UTF-8 MAIL=/var/mail/tkombol LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36: HISTCONTROL=ignoredups PATH=/usr/local/bin:/usr/bin:/bin:/usr/games LESSCLOSE=/usr/bin/lesspipe %s %s LOGNAME=tkombol SSH_TTY=/dev/pts/2 #

21 Built-in Variables awk has a set of built-in variables
Some can be overridden Built-In Variables Variable Function Default NR Cumulative # of lines read - FS Input Field Separator space OFS Output Field Separator OFMT Default FP format %.6f RS Record separator newline NF Number of fields in current line FILENAME Current input file ARGC Number of arguments in command line ARGV Array containing list of arguments ENVIRON Assoc. array of all environment variables

22 Functions awk has several built-in functions
() are optional if no parms encouraged to use Arithmetic functions String functions

23 Arithmetic Functions int(x) sqrt(x)

24 String Functions length() length(x) tolower(s) toupper(s)
length of complete line length(x) length of x tolower(s) returns s as lower case toupper(s) returns s as upper case substr(str,m) returns string starting at m to end of string substr(str,m,n) returns string starting at m for n characters index(s1,s2) finds the position of s2 inside s2 split(str,arr,ch) splits str int an array, the delimiter is ch system("cmd") exectutes a system (Linux) command and returns exit status

25 if Syntax: if (cond true) { statements } else { statements } Notes:
else is optional {} not needed for single statements

26 for Syntax form 1: Syntax form 2: Example:
for ( startval ; condition ; control ) statement C like in form Example: for ( k=1 ; k<9 ; k++ ) print k Syntax form 2: for ( var in array ) statement Will scan every var in the array Great for associative array Non numeric indices Gaps in array See ENVIRON example in previous slide

27 While Syntax: while (cond is true) { statement(s) }

28 continue and break Continue and break can be used to stop all loops
for while break stops the loop continue stops processing statements in this loop continues to next iteration

29 Resources Awk - A Tutorial and Introduction - by Bruce Barnett
Awk Tutorial - Main Page

30 Which is not a “scripting language:
Auk Awk Perl Pearl Bash Bam

31 Summary awk is a "primative" scripting language
good for processing text files filtering perl is a more modern replacement "religious war" over which is better if you understand awk it will be a good basis to understant perl


Download ppt "AWK."

Similar presentations


Ads by Google