The awk Utility CS465 - Unix. Background awk was developed by –Aho, Weinberger, and Kernighan (of K & R) –Was further extended at Bell Labs Handles simple.

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

Introduction to Unix – CS 21 Lecture 11. Lecture Overview Shell Programming Variable Discussion Command line parameters Arithmetic Discussion Control.
Introduction to C Programming
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC Santa Barbara.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.
Shell Programming Learning Objectives: 1. To understand the some basic utilities of UNIX File 2. To compare UNIX shell and popular shell 3. To learn the.
Introduction to C Programming
Lecture 02CS311 – Operating Systems 1 1 CS311 – Lecture 02 Outline UNIX/Linux features – Redirection – pipes – Terminating a command – Running program.
Printing. printf: formatted printing So far we have just been copying stuff from standard-in, files, pipes, etc to the screen or another file. Say I have.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
UNIX Filters.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
Unix programming Term: III B.Tech II semester Unit-II PPT Slides Text Books: (1)unix the ultimate guide by Sumitabha Das (2)Advanced programming.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
OPERATING SYSTEMS DESIGN UNIX BASICS & SHELL SCRIPTING.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Chapter 3: Formatted Input/Output Copyright © 2008 W. W. Norton & Company. All rights reserved. 1 Chapter 3 Formatted Input/Output.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
A talk about AWK Don Newcomb 18 Jan What is AWK? AWK is an interpreted computer language It is primarily used for text processing and data formatting.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
LIN Unix Lecture 7 Hana Filip. LIN Text Processing Command Line Utility Programs (cont.) sed LAST WEEK wc sort tr uniq awk TODAY join paste.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
Shell Programming Learning Objectives: 1. To understand the some basic utilities of UNIX File 2. To compare UNIX shell and popular shell 3. To learn the.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
Alon Efrat Computer Science Department University of Arizona Unix Tools.
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
Operating System Discussion Section. The Basics of C Reference: Lecture note 2 and 3 notes.html.
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Chapter 3: Formatted Input/Output 1 Chapter 3 Formatted Input/Output.
Programming Languages Meeting 12 November 18/19, 2014.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
1 float Data Type Data type that can hold numbers with decimal values – e.g. 3.14, 98.6 Floats can be used to represent many values: –Money (but see warning.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
Lesson 5-Exploring Utilities
AWK.
CSC 4630 Meeting 7 February 7, 2007.
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
CS 403: Programming Languages
John Carelli, Instructor Kutztown University
Introduction to Bash Programming, part 3
Presentation transcript:

The awk Utility CS465 - Unix

Background awk was developed by –Aho, Weinberger, and Kernighan (of K & R) –Was further extended at Bell Labs Handles simple data-reformatting jobs easily with just a few lines of code. Versions –awk - original version –nawk - new awk - improved awk –gawk - gnu awk - improved nawk

How awk works awk commands include patterns and actions –Scans the input line by line, searching for lines that match a certain pattern (or regular expression) –Performs a selected action on the matching lines awk can be used: –at the command line for simple operations –in programs or scripts for larger applications

Running awk From the Command Line: $ awk '/pattern/{action}' file OR From an awk script file: $ cat awkscript # This is a comment /pattern/ {action} $ awk –f awkscript file

awk ’s Format using Input from a File $ awk /pattern/ filename –awk will act like grep $ awk '{action}' filename –awk will apply the action to every line in the file $ awk '/pattern/ {action}' filename –awk will apply the action to every line in the file that matches the pattern

Example 1 Input $ cat pingfile PING dt033n32.san.rr.com ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms ----dt033n32.san.rr.com PING Statistics packets transmitted, 127 packets received, 0% packet loss round-trip (ms) min/avg/max = 37/73/495 ms $ awk awk '/icmp/' pingfile Output 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms

Example 1 Input $ cat pingfile PING dt033n32.san.rr.com ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms ----dt033n32.san.rr.com PING Statistics packets transmitted, 127 packets received, 0% packet loss round-trip (ms) min/avg/max = 37/73/495 ms $ awk awk '{print $1}' pingfile Output PING dt033n32.san.rr.com PING Statistics round-trip

Example 1 Input $ cat pingfile PING dt033n32.san.rr.com ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms … ----dt033n32.san.rr.com PING Statistics packets transmitted, 127 packets received, 0% packet loss round-trip (ms) min/avg/max = 37/73/495 ms $ awk awk '/icmp/ {print $5}' pingfile Output icmp_seq=0 icmp_seq=1 icmp_seq=2 icmp_seq=3

Records and Fields awk divides the input into records and fields –Each line is a record (by default) field-1 field-2 field-3 | | | v v v record 1 -> George Jones Admin record 2 -> Anthony Smith Accounting –Each record is split into fields, delimited by a special character (whitespace by default) Can change delimeter with –F or FS

awk field variables awk creates variables $1, $2, $3… that correspond to the resulting fields (just like a shell script). –$1 is the first field, $2 is the second… –$0 is a special field which is the entire line –NF is always set to the number of fields in the current line (no dollar sign to access)

Example #1 $ cat students Bill White /01/01 Science Jill Blue /03/20 Arts Ben Teal /02/26 CompSci Sue Beige /09/12 Science $ $ awk '/Science/{print $1, $2}' students Bill White Sue Beige $ Commas indicates that we want the output to be delimited by spaces (otherwise they are concatonated): $ awk '/Science/{print $1 $2}' students BillWhite SueBeige

Example #2 -No pattern given, so matches ALL lines -Text strings to print are placed in double quotes $ cat phonelist Joe Smith Mary Jones Hank Knight $ $ awk '{print "Name: ", $1, $2, \ " Telephone:", $3}' phonelist Name: Joe Smith Telephone: Name: Mary Jones Telephone: Name: Hank Knight Telephone: $

Example #3 $ grep small /etc/passwd small000:x:1164:102:Faculty - Pam Smallwood:/export/home/small000:/bin/ksh $ $ awk -F: '/small000/{print $5}' /etc/passwd Faculty - Pam Smallwood $ Given a username, display the person’s real name:

awk using Input from Commands You can run awk in a pipeline, using input from another command: $ command | awk '/pattern/ {action}' –Takes the output from the command and pipes it into awk which will then perform the action on all lines that match the pattern

Piped awk Input Example $ w | awk '/ksh/{print $1}' pugli766 gibbo201 nelso828 $ $ w 1:04pm up 25 day(s), 5:37, 6 users, load average: 0.00, 0.00, 0.01 User tty idle JCPU PCPU what pugli766 pts/8 Tue10pm 3days -ksh lin318 pts/17 10:58am 1:45 vi choosesort small000 pts/18 12:43pm w mcdev712 pts/10 11:52am 14 1 vi adddata gibbo201 pts/12 12:15pm 18 -ksh nelso828 pts/16 7:17pm 17:43 -ksh $

Relational Operators awk can use relational operators (, =, ==, !=, ! ) to compare a field to a value –If the outcome of the comparison is true then the the action is performed Examples: –To print every record in the log.txt file in which the second field is larger than 10 $ awk '$2 > 10' log.txt –To print every record in the log.txt file which does NOT contain ‘Win32’ $ awk '!/Win32/' log.txt

Relational Operator Example $ who pugli766 pts/8 Jun 3 22:24 (da den.pcisys.net) lin318 pts/17 Jun 6 10:58 ( client.attbi.com) small000 pts/18 Jun 6 13:16 (mackey.rbe den.pcisys.net) mcdev712 pts/10 Jun 6 11:52 (ip lv.lv.cox.net) gibbo201 pts/12 Jun 6 12:15 ( client.mchsi.com) nelso828 pts/16 Jun 5 19:17 ( ) $ $ who | awk '$4 < 6 {print $1, $3, $4, $5}' pugli766 Jun 3 22:24 nelso828 Jun 5 19:17 $

Piping awk output $ who pugli766 pts/8 Jun 3 22:24 (da den.pcisys.net) lin318 pts/17 Jun 6 10:58 ( client.attbi.com) small000 pts/18 Jun 6 13:16 (mackey.rbe den.pcisys.net) mcdev712 pts/10 Jun 6 11:52 (ip lv.lv.cox.net) gibbo201 pts/12 Jun 6 12:15 ( client.mchsi.com) nelso828 pts/16 Jun 5 19:17 ( ) $ $ who | awk '$4 == 6 {print $1}' | sort gibbo201 lin318 mcdev712 small000 $

awk Programming awk programming is done by building a list –The list is a list of rules –Each rule is applied sequentially to each line (record) Example: /pattern1/ { action1 } /pattern2/ { action2 } /pattern3/ { action3 }

awk - pattern matching Before processing, lines can be matched with a pattern. /pattern/ { action }execute if line matches pattern The pattern is a regular expression. Examples: /^$/ { print "This line is blank" } /num/ { print "Line includes num" } /[0-9]+$/ { print "Integer at end:", $0 } /[A-z]+/ { print "String:", $0 } /^[A-Z]/{ print "Starts w/uppercase letter" }

awk program from a file The awk commands (program) can be placed into a file The –f (lowercase f) indicates that the commands come from a file whose name follows the –f $ awk –f awkfile datafile The contents of the file called awkfile will be used as the commands for awk

Example 1 $ cat students Bill White /01/01 Science Jill Blue /03/20 Arts Bill Teal /02/26 CompSci Sue Beige /09/12 Science $ cat awkprog /5?5/ {print $1, $2} /3*4/ {print $5} $ $ awk –f awkprog students Arts Bill Teal Sue Beige $ **NOTE: All patterns applied to each line before moving to next line

Example 2 $ cat students Bill White /01/01 Science Jill Blue /03/20 Arts Bill Teal /02/26 CompSci Sue Beige /09/12 Science $ cat awkprog /Science/ {print "Science stu:", $1, $2} /CompSci/ {print "Computing stu:", $1, $2} $ $ awk –f awkprog students Science stu: Bill White Computing stu: Bill Teal Science stu: Sue Beige $

More about Patterns Patterns can be: –Empty: will match everything –Regular expressions: /reg-expression/ –Boolean Expressions: $2=="foo" && $7=="bar" –Ranges: /jones/,/smith/

Example - Boolean Expressions $ cat students Bill White /01/01 Science Jill Blue /03/20 Arts Bill Teal /02/26 CompSci Sue Beige /09/12 Science $ cat awkprog $3 <= {print "Not counted"} $3 > {print $2 ",", $1} $ $ awk –f awkprog students Not counted Teal, Bill Beige, Sue $

Example - Ranges $ cat students Bill White /01/01 Science Jill Blue /03/20 Arts Bill Teal /02/26 CompSci Sue Beige /09/12 Science $ $ awk '/333333/,/555555/' students Bill White /01/01 Science Jill Blue /03/20 Arts Bill Teal /02/26 CompSci $

More Built-In awk Variables Two types: Informative and Configuration Informative: NR = Current Record Number (start at 1) –Counts ALL records, not just those that match NF = Number of Fields in the Current Record FILENAME = Current Input Data File –Undefined in the BEGIN block

Example using NF $ cat names Pam Sue Laurie Bob Joe Bill Dave Joan Jill $ $ awk '{print NF}' names $

Example using a boolen, NF, and NR $ cat names Pam Sue Laurie Bob Joe Bill Dave Joan Jill $ $ awk 'NF > 2 {print NR ":", NF, "fields"}' names 1: 3 fields 2: 4 fields $

Built-in awk functions log(expr) natural logarithm index(s1,s2) position of string s2 in string s1 length(s) string length substr(s,m,n) n-char substring of s starting at m tolower(s) converts string to lowercase printf() print formatted - like C printf

Example 2 Input PING dt033n32.san.rr.com ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms … Program /PING/ { print tolower($1) } /icmp/ { time = substr($7,6,2) print time } OutputPing …

print & printf Use print in an awk statement to output specific field(s) printf is more versatile –works like printf in the C language –May contain a format specifier and a modifier

Format Specification A format specification consists of a percent symbol, a modifier, width and precision values, and a conversion character To display the third field as a floating point number with two decimal places: awk '{printf("%.2f\n", $3)}' file You can include additional text in the printf statement '{printf ("3rd value: %.2f\n", $3)}'

Specifiers, Width, Precision, & Modifiers Type Specifiers: %c Single character %d integer (decimal) %f Floating point %s String Between the % and the specifier you can place the width and precision %6.2f means a floating point number in a field of width 6 in which there are two decimal places Modifiers control details of appearance: - minus sign is the left justification modifier right justification) + plus sign forces the appearance of a sign (+,-) for numeric output 0 zero pads a right justified number with zeros

awk Variables Variables –No need for declaration Implicitly set to 0 AND the Empty String –Variable type is a combination of a floating-point and string –Variable is converted as needed, based on its use title = "Number of students" no = 100 weight = 13.4

Example 2 Input PING dt033n32.san.rr.com ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms … Program /icmp/ { time = substr($7,6,2) printf( "%1.1f ms\n", time ); } Output49.0 ms 94.0 ms 50.0 ms 41.0 ms …

awk program execution BEGIN { …. } { …. } specification { ….. } END { ….. } Executes only once before reading input data Executes for each input line Executes at the end after all lines being processed Executes for each input line that matches specified /pattern/ or Boolean expression

Example #1: Count # lines in file - Set total to 0 before processing any lines - For every row in the file, execute {total = total + 1} - Print total after all lines processed. $ cat awkprog BEGIN {total = 0} {total = total + 1} END {print total " lines"} $ cat testfile Hello There Goodbye! $ $ awk –f awkprog testfile 2 lines $

Ex #2: Count lines containing a pattern $ cat Simpsons Marge34 Homer32 Lisa10 Bart11 Maggie01 $ cat countthem BEGIN {totalMa = 0; totalar = 0} /Ma/ { totalMa++ } /ar/ { totalar++ } END { print totalMa " Ma's" print totalar " ar's"} $ {totalpattern++} only executes if the line in filename has pattern appearing in the line. $ awk -f countthem Simpsons 2 Ma's 2 ar's $

Example #3: Add line numbers $ cat numawk BEGIN { print "Line numbers by awk" } { print NR ":", $0 } END { print "Done processing " FILENAME } $ cat testfile Hello There Goodbye! $ $ awk –f numawk testfile Line numbers by awk 1: Hello There 2: Goodbye! Done processing testfile $

More Built-In awk Variables Two types: Informative and Configuration Configuration FS = Input field separator OFS = Output field separator (default for both is space " " ) RS = Input record seperator ORS = Output record seperator (default for both is newline "\n" )

Example #1: Reverse 2 columns $ cat switch BEGIN{FS="\t"} {print $2 "\t" $1} $ awk -f switch Simpsons 34Marge 32Homer 10Lisa 11Bart 01Maggie $ Alternatively you could do the following: $ awk -F\t '{print $2 "\t" $1}' Simpsons NOTE: Columns separated by tabs

Example #2: Sum a column $ cat awksum2 BEGIN { FS="\t" sum = 0 } {sum = sum + $2} END { print "Done" print "Total sum is " sum } $ $ awk -f awksum2 Simpsons Done Total sum is 88 $

Example #3: Comma delimited file $ cat names Bill Jones,3333,M Pam Smith,5555,F Sue Smith,4444,F $ $ awk -F, '{print $2}' names $

Longer awk program $ cat awkprog BEGIN { print "Processing..." } # print number of fields in first line NR == 1 { print $0, NF, "fields"} /^Unix/ { print "Line starts with Unix: ", $0 } /Unix$/ { print "Line ends with Unix: " $0 } # finishing it up END {print NR " lines checked"} $

awk program execution $ cat datfile First Line Unix is great! What else is better? This is Unix Yes it is Unix Goodbye! $ $ awk -f awkprog datfile Processing... First Line 2 fields Line starts with Unix: Unix is great! Line ends with Unix: This is Unix Line ends with Unix: Yes it is Unix 6 lines checked $

awk programming language syntax if ( found == true )# if (expr) print “Found”; # {action1} else# else print “Not found”; # {action2} while ( i <= 100)# while (cond) { i = i + 1;# { actions... print i }# }

awk programming language syntax for (i=1; i < 10; i++ ) # for (set; test; incr) {# { sqr = i * i;#actions print i " squared is " sqr }# } do# do { i = i + 1; # { actions... print i }# } while ( i < 100);# while (cond);

awk – longer example Write an awk program that prints out content of a directory in the following format: BYTESFILE copyfile 736 copyfile.c 740 copyfile.c~ dirlist 989 dirlist.c 977 dirlist.c% envadv 185 envadv.c tmp 740 x.c Total: bytes in 9 regular files

awk example - code $ cat awkprog BEGIN {print " BYTES \t FILE"; sum=0; filenum=0 } # test for lines starting with - /^-/ { sum += $5 ++filenum printf ("%10d \t%s\n", $5, $9) } # test for directories - line starts with d /^d/ { print " \t", $9 } # conclusion END { print "\n Total: " sum " bytes in" print " " filenum " regular files" } $

awk example - output $ ls -l total 84 drwx small000 faculty 512 Jun 2 13:44 sub2 -rwx small000 faculty 224 Jun 3 10:35 sumnums -rw small000 faculty 2 Jun 3 21:08 tab -rw small000 faculty 187 Jun 8 11:15 tbook $ $ ls -l | awk –f awkprog BYTES FILE sub2 224sumnums 2tab 187tbook Total: 413 bytes in 3 regular files $

awk Handout Review awk examples on handout