The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

Introduction to C Programming
Lecture 2 Introduction to C Programming
Introduction to C Programming
 2000 Prentice Hall, Inc. All rights reserved. Chapter 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line.
Introduction to C Programming
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
CS 497C – Introduction to UNIX Lecture 29: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
CS 497C – Introduction to UNIX Lecture 33: - Shell Programming Chin-Chih Chang
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Chapter 9 Formatted Input/Output Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
Guide To UNIX Using Linux Third Edition
CSCI/CMPE 4341 Topic: Programming in Python Chapter 3: Control Structures (Part 1) – Exercises 1 Xiang Lian The University of Texas – Pan American Edinburg,
Introduction to C Programming
QUOTATION This chapter teaches you about a unique feature of the shell programming language: the way it interprets quote characters. Basically, the shell.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Chapter 3: Introduction to C Programming Language C development environment A simple program example Characters and tokens Structure of a C program –comment.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved Streams Streams –Sequences of characters organized.
Chapter 9 Formatted Input/Output. Objectives In this chapter, you will learn: –To understand input and output streams. –To be able to use all print formatting.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
The UNIX Shell. The Shell Program that constantly runs at terminal after a user has logged in. Prompts the user and waits for user input. Interprets command.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Input, Output, and Processing
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Sed, awk, & perl CS 2204 Class meeting 13 *Notes by Mir Farooq Ali and other members of the CS faculty at Virginia Tech. Copyright 2003.
Built-in Data Structures in Python An Introduction.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
CSC141 Introduction to Computer Programming Teacher: AHMED MUMTAZ MUSTEHSAN Lecture - 6.
Chapter Twelve sed, awk & perl1 System Programming sed, awk & perl.
TEXT PROCESSING UTILITIES. THE cat COMMAND $ cat emp1.lst $ cat emp1.lst 2233 | shukla | g.m | sales | 12/12/52 | | sharma |d.g.m |product.
Representing Strings and String I/O. Introduction A string is a sequence of characters and is treated as a single data item. A string constant, also termed.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
Linux+ Guide to Linux Certification, Second Edition
FILTERS USING REGULAR EXPRESSIONS – grep and sed.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
1 Lecture 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line of Text 2.3Another Simple C Program: Adding.
SIMPLE FILTERS. CONTENTS Filters – definition To format text – pr Pick lines from the beginning – head Pick lines from the end – tail Extract characters.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
Lesson 5-Exploring Utilities
awk- An advanced Filter
CSC 4630 Meeting 7 February 7, 2007.
Java Primer 1: Types, Classes and Operators
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
CS 403: Programming Languages
John Carelli, Instructor Kutztown University
Awk.
awk- An Advanced Filter
Introduction to Bash Programming, part 3
Introduction to C Programming
Presentation transcript:

The awk command

Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input, one or more files, or as output from a process. Awk can be used at the command line for simple operations, or it can be written into programs for larger applications. Awk scans a file ( or input) line by line, from the first to the last line, searching for lines that match a specified pattern and performing selected actions ( enclosed in curly braces ) on those lines.

Awk stands for the first initials in the last names of each of the authors of the language, Alfred Aho, Brian Kernighan, and peter Weinberger. There are a number of versions of awk : old awk, new awk, gnu awk, POSIX awk, and so on. Awk combines features of several filters, but it has two unique features. 1. it can identify and manipulate individual fields in a line. 2. awk is the only UNIX filter that can perform computation. Further, awk also accepts extended regular expressions (EREs) for pattern matching, has C-type programming constructs and several built-in variables and functions.

awk Preliminaries The awk command follows the general syntax: Awk ‘selection_criteria { action }’ Note the use of single quotes and curly braces. The selection_criteria ( a form of addressing) filters input and selects lines for the action component to act on. This component is enclosed within curly braces. The selection_criteria and action constitute an awk program that is surrounded by a set of single quotes. These programs are often one-liners though they can span several lines as well. Ex: to select the directors from the file, the awk command is: $ awk '/dir./ {print}' emp.lst 7898 | akash |dir. |mark. | 11/06/70 |9000

Unlike other filters, awk uses a contiguous sequence of spaces and tabs as the default delimiter. This default has been changed in the example by “|” using the –F option. A,(comma) has been used to delimit the field specification. $ awk -F"|" '/dir./ {print $2,$3,$4,$6}' emp.lst akash dir. mark Fields in awk are numbered $1,$2,etc. Awk also addresses the entire line as $0. Ex: to display the number of records in the file e.lst: $ awk '{print $0}' e.lst |wc -l 6

The action section is represented by the statement { print }, which has the effect of printing all the selected lines. If the selection_criteria is missing, then the action will apply to all lines of the file. If the action is missing, then the entire line will be printed. Either the address or the action is optional, but both must be enclosed within a pair of single quotes. All context patterns have to be enclosed within a pair of /’s. The print statement if used without any field specifiers prints the entire line, though you can also use the variable $0 to indicate that explicitly. Since print is the default action of awk, there is no need to specify it if you want to print the entire line. All the three forms are equivalent: $ awk ‘/dir/ ’ emp.lst $ awk ‘/dir/ {print} ‘ emp.lst $ awk ‘/dir/ {print $0} ‘ emp.lst

For pattern matching, awk uses regular expressions of the egrep variety, with the same requirement that all these expressions be bounded on either side by a /. This lets you locate both ‘sharma’ and ‘sarma’ : $ awk -F"|" '/[Ss]h*arma/ ' e.lst 9876 | sharma | mgr |product| 12/03/60 | | Sarma | dir.| sales | 05/09/60 |25000 Awk also accepts a line address (single or double) to select lines. Ex: to select lines 3 to 6 from a file, use the built-in variable NR to specify line numbers : $ awk -F"|" 'NR==3,NR==6 {print NR, $2, $3,$6}' e.lst 3 akash dir tiwary g.m kumar mgr Sarma dir

Formatting output with printf Awk uses the print and printf statements to write to standard output. Print produces unformatted output. Ex: to print all fields except the 4 th, we can assign the one we don’t want to an empty string : $ awk -F"|" '{ $4=""; print}' e.lst |head shukla g.m 12/12/ sharma mgr 12/03/ When placing multiple statements in a single line, use the ; as their delimiter. Print here is the same as print $0. With the C-like printf statement, you can use awk as a stream formatter. Printf uses a quoted format specificier and a field list. %s – String %d – Integer %f – Floating point number

To produce formatted o/p from unformatted i/p, using a regular expression, $ awk -F"|" '/[sS]h*arma/{ printf("%-20s %-12s %6d\n",$2,$3,$6) }' e.lst sharma mgr Sarma dir

The Logical And Relational Operators To print the 3 fields for the directors and the manager, you can write each awk program in a separate line: $ awk –F”|” ‘/director/ { printf “%-20s %-12s %d\n”, $2,$3,$6} >/manager/ {printf “%-20s %-12s %d\n”, $2,$3,$6}’ emp.lst But this method of repeating the print action on each line can be tedious. Awk also uses the || and && logical operators. $ awk -F"|" '$3==" mgr " || $3=="dir. "{ printf("%-20s %-12s %6d\n",$2,$3,$6) }' e1.lst akash dir kumar mgr 15000

If you want to print only those lines for persons who are neither director nor manager, you should use the != and && operators: $ awk -F"|" '$3!=" dir." && $3!=" mgr" { printf "%-20s %-12s %d\n", $2,$3,$6}' e1.lst While using the operators == and != for string matching, you must remember that they can handle only fixed strings, and not regular expressions. How to match regular expressions: Awk offers the ~ and !~ operators to match and negate a match, respectively. $ awk -F"|" '$3 ~/g.m/ {print}' e1.lst 2233 | shukla | g.m | sales | 12/12/52 | | sharma |d.g.m|product| 12/03 60 | | tiwary |g.m |product| 05/02/89 |23000

The previous example prints the d.g.m’s as well as the g.m’s, since the pattern g.m. is embedded in the larger string. Therefore use the characters ^ and $ used by the regular expressions, which indicate the beginning and the end of a field, respectively. $ awk -F"|" '$3 ~/^g.m/ {print}' e1.lst 3456 | tiwary |g.m |product| 05/02/89 |23000

The relational and regular expression matching operators used by awk OperatorSignificance <Less than <=less than or equal to ==equal to !=not equal to >=greater than or equal to >greater than ~match a regular expression !~doesn’t match a regular expression

Number Processing Awk uses the arithmetic operators +,-,*,/, and %(modulus). It also overcomes the most major limitations of the shell ; the inability to handle decimal numbers. You can use awk to print a pay-slip for the directors: $ awk -F"|" '$3~/^dir./ { >printf "%-20s %-12s,%d %d %d\n", $2,$3,$6,$6*0.4,$6*0.15}' e1.lst akash dir., While awk has certain built-in variables, like NR and $0, it also permits the user to use variables of his choice. A user-defined variable used by awk has a special feature ; no type declaration is needed, and it is initialized to zero or a null string, by default, depending on its type. Awk has a mechanism of identifying the type of variable used from its context.

$ awk -F"|" '$6>=15000 { > cnt = cnt+1 > print cnt,$2,$3,$6}' e1.lst 1 shukla g.m sharma d.g.m tiwary g.m kumar mgr 15000

THE –f OPTION Awk offers the –f option to take the program from the file that follows this option. $ cat q1.awk $6>=15000 { print ++count,$2,$3,$6} $ awk -F"|" -f q1.awk e1.lst 1 shukla g.m sharma d.g.m tiwary g.m kumar mgr 15000

THE BEGIN AND END SECTIONS If you are to print something before processing the first line, for example, a heading, then the BEGIN section can be used quite gainfully. Similarly, if you want to print some totals after the processing is over, then you should do it in the END section. The BEGIN and END are optional, and take the form: BEGIN {action} END {action} These two sections, when present, are delimited by the body of the awk program. They also use a pair of curly braces to enclose the program. You can use these two sections to print a suitable heading at the beginning, and the average salary at the end.

$ cat q2.awk BEGIN { printf "\n\t\t EMPLOYEE ABSTRACT \n\n" } $6>15000 { # used for comments count++; tot+=$6 printf "%3d%-20s%-12s%d\n", count,$2,$3,$6 } END{ printf "\n\t The average basic pay is %6d\n", tot/count }

$ awk -F"|" -f q2.awk e1.lst EMPLOYEE ABSTRACT 1 shukla g.m tiwary g.m The average basic pay is 21500

Positional Parameters The program q1.awk could take a more generalized form if the number is replaced with a variable. To do that, the entire awk command (not just the program) should be stored in a shell script, and the parameter supplied as an argument to the script. This parameter is then compared with the variable. These variables are known as positional parameters, and identified by the shell as $1,$2,$3, etc. in the order they are presented in the command line. The positional parameters used by awk should be enclosed within single quotes, so as to distinguish between a positional parameter and a field identifier.

Cat q1.awk awk -F"|" '$6>='$1' { print $2,$3,$6}' e1.lst $ q1.awk 15000

BUILT–IN VARIABLES VARIABLEFUNCTION NRCumulative number of records read FSThe input field separator OFSThe output field separator NFNumber of fields in current record FILENAME The current input file ARGCNumber of arguments in the command line ARGVThe list of arguments

NR stores the record number of the current line. FS defines the input field separator. This is an alternative to the –F option of the command. When used at all it must occur in the BEGIN section so that the body of the program knows its value before it starts processing : The default output field separator, can be reassigned using the variable OFS in the BEGIN section Ex: $ awk 'BEGIN {FS="|";OFS="~"} $6>15000 {print $1,$2,$3,$6}' e1.lst 2233 ~ shukla ~ g.m ~ ~ tiwary ~g.m ~23000

NF is used in cleaning up a database from records which don’t contain the right number of fields. Ex: to locate those records not having 6 fields, and which have crept in due to faulty data entry: $ awk 'BEGIN {FS="|"} > NF!=6 > print "record no ",NR," has ",NF, " fields"}' emp.lst FILENAME stores the name of the current file being processed. By default, awk doesn’t print the filename, but you can instruct it to do so: $ awk -F "|" '$6<15000 {print FILENAME,$0}' e1.lst e1.lst 7898 | akash |dir. |mark. | 11/06/70 |9000

While using awk program within shell scripts, you can arrange to pass parameters to the script. ARGV[ ], stores the entire list of arguments in the array. And the number of such arguments is stores in the variable ARGC $ emp.awk director Then ARGC takes the value 4, while the array ARGV[ ] is filled up with the words in the command line: ARGV[0] = empfind.awk ARGV[1] = 3500 ARGV[2] = 7000 ARGV[3] = director

FUNCTIONS Awk has several built-in functions, performing both arithmetic and string operations. The parameters are passed to a function in C-style, delimited by commas, and enclosed by a matched pair of parentheses.

Built – in functions in awk FunctionDescription int(x)Returns the integer value of x sqrt(x)Returns the square root of x index(s1,s2) Returns the position of the string s2 in the string s1 length( )Returns the length of the argument (the complete record in case of none) substr(s1,s2,s3) Returns portion of the string of length s3, starting from the position s2 in the strting s1 split(s,a)Split string s into the array a; optionally returns number of fields

Control flow – THE if statement the control command itself must be enclosed in parentheses. $ awk -F"|” '{ if ($6 >15000) print($2,$6)}' e1.lst shukla tiwary $ awk -F"|" '{ if ($6 >15000) commission = 0.15*$6 else commission = 0.10 *$6 } {print ($2,$6,commission)}' e1.lst shukla sharma akash tiwary kumar