Review of Awk Principles

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

Introduction to C Programming
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
1 Chapter 9 - Formatted Input/Output Outline 9.1Introduction 9.2Streams 9.3Formatting Output with printf 9.4Printing Integers 9.5Printing Floating-Point.
Lecture 5 sed and awk. Last week Regular Expressions –grep (BRE) –egrep (ERE) Sed - Part I.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Linux+ Guide to Linux Certification, Second Edition
Lecture 5 sed and awk. Last week Regular Expressions –grep –egrep.
Chapter 9 Formatted Input/Output Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 9 - Formatted Input/Output Outline 9.1Introduction.
Guide To UNIX Using Linux Third Edition
Introduction to C Programming
Sed and awk.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved Streams Streams –Sequences of characters organized.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 9 - Formatted Input/Output Outline 9.1Introduction.
Chapter 9 Formatted Input/Output. Objectives In this chapter, you will learn: –To understand input and output streams. –To be able to use all print formatting.
Lists in Python.
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
Chapter 5: Advanced Editors awk, sed, tr, cut. Objectives: After studying this lesson, you should be able to: –awk: a pattern scanning and processing.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Input, Output, and Processing
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Linux+ Guide to Linux Certification, Third Edition
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Constants Numeric Constants Integer Constants Floating Point Constants Character Constants Expressions Arithmetic Operators Assignment Operators Relational.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
1Computer Sciences Department Princess Nourah bint Abdulrahman University.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
13 More Advanced Awk Mauro Jaskelioff (originally by Gail Hopkins)
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
1 LAB 4 Working with Trace Files using AWK. 2 Structure of Trace File.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
Chapter Six Introduction to Shell Script Programming.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
CSC 1010 Programming for All Lecture 3 Useful Python Elements for Designing Programs Some material based on material from Marty Stepp, Instructor, University.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming Input and Output.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
Operating System Discussion Section. The Basics of C Reference: Lecture note 2 and 3 notes.html.
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
Linux+ Guide to Linux Certification, Second Edition
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
Awk Programmable Filters 1.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
Lesson 5-Exploring Utilities
CSC 4630 Meeting 7 February 7, 2007.
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
John Carelli, Instructor Kutztown University
PHP.
Sed and awk.
Presentation transcript:

Review of Awk Principles Awk’s purpose: to give Unix a general purpose programming language that handles text (strings) as easily as numbers This makes Awk one of the most powerful of the Unix utilities Awk process fields while ed/sed process lines nawk (new awk) is the new standard for Awk Designed to facilitate large awk programs Awk gets it’s input from files redirection and pipes directly from standard input Remember, Awk is a filter so it can get input from: command line file arguments redirection pipes standard input (terminal-keyboard) AND can output the same way into a pipe redirected into a file to standard output (screen)

History Originally designed/implemented in 1977 by Al Aho, Peter Weinberger, and Brian Kernigan In part as an experiment to see how grep and sed could be generalized to deal with numbers as well as text Originally intended for very short programs But people started using it and the programs kept getting bigger and bigger! In 1985, new awk, or nawk, was written to add enhancements to facilitate larger program development Major new feature is user defined functions

Other enhancements in nawk include: Dynamic regular expressions Text substitution and pattern matching functions Additional built-in functions and variables New operators and statements Input from more than one file Access to command line arguments nawk also improved error messages which makes debugging considerably easier under nawk than awk On most systems, nawk has replaced awk On ours, both exist Lets edit our .cshrc file to add an alias alias awk nawk

Running an AWK Program There are several ways to run an Awk program awk ‘program’ input_file(s) program and input files are provided as command-line arguments awk ‘program’ program is a command-line argument; input is taken from standard input (yes, awk is a filter!) awk -f program_file_name input_files program is read from a file

Awk as a Filter Since Awk is a filter, you can also use pipes with other filters to massage its output even further Suppose you want to print the data for each employee along with their pay and have it sorted in order of increasing pay awk ‘{ printf(“%6.2f %s\n”, $2 * $3, $0) }’ emp.data | sort

Errors If you make an error, Awk will provide a diagnostic error message awk '$3 == 0 [ print $1 }' emp.data awk: syntax error near line 1 awk: bailing out near line 1 Or if you are using nawk nawk '$3 == 0 [ print $1 }' emp.data nawk: syntax error at source line 1 context is $3 == 0 >>> [ <<< 1 extra } 1 extra [ nawk: bailing out at source line 1

Structure of an AWK Program An Awk program consists of: An optional BEGIN segment For processing to execute prior to reading input pattern - action pairs Processing for input data For each pattern matched, the corresponding action is taken An optional END segment Processing after end of input data BEGIN{action} pattern {action} . pattern { action} END {action} BEGIN and END - two special patterns provided by Awk Both are optional and having one does not imply that you have to have the other BEGIN executes its actions prior to reading any of the input file(s) END executes its actions after the last record of the input file(s) has been read After the optional BEGIN is handled, Awk reads the first line of input, checks each of its pattern-action pairs against it sequentially, executing any that match Awk then reads the next line of input and begins over again with the first pattern-action pair When the script is provided on the command line, the pattern-actions are enclosed in ‘ ‘ to protect them from the shell

BEGIN and END Special pattern BEGIN matches before the first input line is read; END matches after the last input line has been read This allows for initial and wrap-up processing BEGIN { print “NAME RATE HOURS”; print “” } { print } END { print “total number of employees is”, NR } { print “” } prints a blank line vs { print } which prints current record

Pattern-Action Pairs Both are optional, but one or the other is required Default pattern is match every record Default action is print record Patterns BEGIN and END expressions $3 < 100 $4 == “Asia” string-matching /regex/ - /^.*$/ string - abc matches the first occurrence of regex or string in the record

compound $3 < 100 && $4 == “Asia” && is a logical AND || is a logical OR range NR == 10, NR == 20 matches records 10 through 20 inclusive Patterns can take any of these forms and for /regex/ and string patterns will match the first instance in the record

Selection Awk patterns are good for selecting specific lines from the input for further processing Selection by Comparison $2 >=5 { print } Selection by Computation $2 * $3 > 50 { printf(“%6.2f for %s\n”, $2 * $3, $1) } Selection by Text Content $1 == “Susie” /Susie/ Combinations of Patterns $2 >= 4 || $3 >= 20 Prints all lines where hourly pay is >= to 5 Prints all lines where total pay is > 50 Prints all lines where first field is “Susie” Prints all lines having “Susie” anywhere in line Prints all lines where hourly pay >= to 4 or total pay >= 20 Can use && for AND

Data Validation Validating data is a common operation Awk is excellent at data validation NF != 3 { print $0, “number of fields not equal to 3” } $2 < 3.35 { print $0, “rate is below minimum wage” } $2 > 10 { print $0, “rate exceeds $10 per hour” } $3 < 0 { print $0, “negative hours worked” } $3 > 60 { print $0, “too many hours worked” }

Regular Expressions in Awk Awk uses the same regular expressions we’ve been using ^ $ - beginning of/end of field . - any character [abcd] - character class [^abcd] - negated character class [a-z] - range of characters (regex1|regex2) - alternation * - zero or more occurrences of preceding expression + - one or more occurrences of preceding expression ? - zero or one occurrence of preceding expression NOTE: the min max {m, n} or variations {m}, {m,} syntax is NOT supported

Awk Variables $0, $1, $2, … ,$NF NR - Number of records read FNR - Number of records read from current file NF - Number of fields in current record FILENAME - name of current input file FS - Field separator, space or TAB by default OFS - Output field separator, space by default ARGC/ARGV - Argument Count, Argument Value array Used to get arguments from the command line FS and OFS are good things to set in the BEGIN pattern-action pair if you don’t want the default

Arrays Awk provides arrays for storing groups of related data values # reverse - print input in reverse order by line { line[NR] = $0 } # remember each line END { i = NR # print lines in reverse order while (i > 0) { print line[i] i = i - 1 }

Operators = assignment operator; sets a variable equal to a value or string == equality operator; returns TRUE is both sides are equal != inverse equality operator && logical AND || logical OR ! logical NOT <, >, <=, >= relational operators +, -, /, *, %, ^ String concatenation

Control Flow Statements Awk provides several control flow statements for making decisions and writing loops If-Else if (expression is true or non-zero){ statement1 } else { statement2 where statement1 and/or statement2 can be multiple statements enclosed in curly braces { }s the else and associated statement2 are optional Computes the total and average pay of employees making more than $6/hour and uses the if to defend against a divide by zero error

Loop Control While while (expression is true or non-zero) { statement1 } uses formula value = amount(1 + rate) ^ years ^ is the exponentiation operator \t is a TAB # is a comment symbol, every thing following is ignored by Awk Get into the habit of commenting your scripts, I will require it and it will help you remember what you did a week or a month later

For for(expression1; expression2; expression3) { statement1 } This has the same effect as: expression1 while (expression2) { expression3 for(;;) is an infinite loop

Do While do { statement1 } while (expression)

Computing with AWK Counting is easy to do with Awk $3 > 15 { emp = emp + 1} END { print emp, “employees worked more than 15 hrs”} Computing Sums and Averages is also simple { pay = pay + $2 * $3 } END { print NR, “employees” print “total pay is”, pay print “average pay is”, pay/NR } {pay = pay + $2 * $3 } accumulates total employee pay END begins processing after all records have been processed print NR, “employees” prints the number of records (employees) print “total pay is”, pay prints total pay print “average pay is”, pay/NR prints the average Total pay / number of employees

Handling Text One major advantage of Awk is its ability to handle strings as easily as many languages handle numbers Awk variables can hold strings of characters as well as numbers, and Awk conveniently translates back and forth as needed This program finds the employee who is paid the most per hour $2 > maxrate { maxrate = $2; maxemp = $1 } END { print “highest hourly rate:”, maxrate, “for”, maxemp }

Printing the Last Input Line String Concatenation New strings can be created by combining old ones { names = names $1 “ “ } END { print names } Printing the Last Input Line Although NR retains its value after the last input line has been read, $0 does not { last = $0 } END { print last } This program combines all the employee names into one string by appending each name and a blank to the previous value in the variable name The concatenation operation in Awk is represented by writing string values one after another

Command Line Arguments Accessed via built-ins ARGC and ARGV ARGC is set to the number of command line arguments ARGV[ ] contains each of the arguments For the command line awk ‘script’ filename ARGC == 2 ARGV[0] == “awk” ARGV[1] == “filename the script is not considered an argument

ARGC and ARGV can be used like any other variable They can be assigned, compared, used in expressions, printed They are commonly used for verifying that the correct number of arguments were provided So, to check for the proper number of arguments in an awk script BEGIN {if (ARGC != 2) print “I need a filename for this script”; exit}

ARGC/ARGV in Action #argv.awk – get a cmd line argument and display BEGIN {if(ARGC != 2) {print "Not enough arguments!"} else {print "Good evening,", ARGV[1]} }

BEGIN {if(ARGC != 3) {print "Not enough arguments!" print "Usage is awk -f script in_file field_separator" exit} else {FS=ARGV[2] delete ARGV[2]} } $1 ~ /..3/ {print $1 "'s name in real life is", $5; ++nr} END {print; print "There are", nr, "students registered in your class."}

getline How do you get input into your awk script other than on the command line? The getline function provides input capabilities getline is used to read input from either the current input or from a file or pipe getline returns 1 if a record was present, 0 if an end-of-file was encountered, and –1 if some error occurred

getline Function Expression Sets getline $0, NF, NR, FNR getline var var, NR, FNR getline <"file" $0, NF getline var <"file" var "cmd" | getline "cmd" | getline var

getline from stdin #getline.awk - demonstrate the getline function BEGIN {print "What is your first name and major? " while (getline > 0) print "Hi", $1 ", your major is", $2 "." } while (getline > 0) – since getline returns –1 on failure, this ensures you don't go into an endless loop if the getline fails for some reason, such as using a non-existant file

getline From a File #getline1.awk - demo getline with a file BEGIN {while (getline <"emp.data" >0) print $0}

getline From a Pipe #getline2.awk - show using getline with a pipe BEGIN {{while ("who" | getline) nr++} print "There are", nr, "people logged on clyde right now."}

Simple Output From AWK Printing Every Line Printing Certain Fields If an action has no pattern, the action is performed for all input lines { print } will print all input lines on stdout { print $0 } will do the same thing Printing Certain Fields Multiple items can be printed on the same output line with a single print statement { print $1, $3 } Expressions separated by a comma are, by default, separated by a single space when output

Computing and Printing NF, the Number of Fields Any valid expression can be used after a $ to indicate a particular field One built-in expression is NF, or Number of Fields { print NF, $1, $NF } will print the number of fields, the first field, and the last field in the current record Computing and Printing You can also do computations on the field values and include the results in your output { print $1, $2 * $3 }

Putting Text in the Output Printing Line Numbers The built-in variable NR can be used to print line numbers { print NR, $0 } will print each line prefixed with its line number Putting Text in the Output You can also add other text to the output besides what is in the current record { print “total pay for”, $1, “is”, $2 * $3 } Note that the inserted text needs to be surrounded by double quotes

Formatted Output printf provides formatted output Syntax is printf(“format string”, var1, var2, ….) Format specifiers %c – single character %d - number %f - floating point number %s - string \n - NEWLINE \t - TAB Format modifiers - left justify in column n column width .n number of decimal places to print

printf Examples printf(“I have %d %s\n”, how_many, animal_type) format a number (%d) followed by a string (%s) printf(“%-10s has $%6.2f in their account\n”, name, amount) prints a left justified string in a 10 character wide field and a float with 2 decimal places in a six character wide field printf(“%10s %-4.2f %-6d\n”, name, interest_rate, account_number > "account_rates") prints a right justified string in a 10 character wide field, a left justified float with 2 decimal places in a 4 digit wide field and a left justified decimal number in a 6 digit wide field to a file printf(“\t%d\t%d\t%6.2f\t%s\n”, id_no, age, balance, name >> "account") appends a TAB separated number, number, 6.2 float and a string to a file prints number and a string prints a left justified string in a 10 character wide field and a float with 2 decimal places in a six character wide field prints a right justified string in a 10 character wide field, a left justified float with 2 decimal places in a 4 digit wide field and a left justified decimal number in a 6 digit wide field prints a TAB separated number, number, 6.2 float and a string

Built-In Functions Arithmetic String Output Special sin, cos, atan, exp, int, log, rand, sqrt String length, substitution, find substrings, split strings Output print, printf, print and printf to file Special system - executes a Unix command system(“clear”) to clear the screen Note double quotes around the Unix command exit - stop reading input and go immediately to the END pattern-action pair if it exists, otherwise exit the script

Built-In Arithmetic Functions Return Value atan2(y,x) arctangent of y/x (-p to p) cos(x) cosine of x, with x in radians sin(x) sine of x, with x in radians exp(x) exponential of x, ex int(x) integer part of x log(x) natural (base e) logarithm of x rand() random number between 0 and 1 srand(x) new seed for rand() sqrt(x) square root of x

Built-In String Functions Description gsub(r, s) substitute s for r globally in $0, return number of substitutions made gsub(r, s, t) substitute s for r globally in string t, return number of substitutions made index(s, t) return first position of string t in s, or 0 if t is not present length(s) return number of characters in s match(s, r) test whether s contains a substring matched by r, return index or 0 sprint(fmt, expr-list) return expr-list formatted according to format string fmt

Built-In String Functions Description split(s, a) split s into array a on FS, return number of fields split(s, a, fs) split s into array a on field separator fs, return number of fields sub(r, s) substitute s for the leftmost longest substring of $0 matched by r sub(r, s, t) substitute s for the leftmost longest substring of t matched by r substr(s, p) return suffix of s starting at position p substr(s, p, n) return substring of s of length n starting at position p