CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

Lecture 2 Introduction to C Programming
Introduction to C Programming
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
Lecture 5 sed and awk. Last week Regular Expressions –grep (BRE) –egrep (ERE) Sed - Part I.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Linux+ Guide to Linux Certification, Second Edition
Lecture 5 Awk and Shell. Sed Drawbacks Hard to remember text from one line to another Not possible to go backward in the file No way to do forward references.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Lecture 5 sed and awk. Last week Regular Expressions –grep –egrep.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.
Guide To UNIX Using Linux Third Edition
Introduction to C Programming
Introduction to Unix (CA263) Introduction to Shell Script Programming By Tariq Ibn Aziz.
Sed and awk.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved Streams Streams –Sequences of characters organized.
Chapter 9 Formatted Input/Output. Objectives In this chapter, you will learn: –To understand input and output streams. –To be able to use all print formatting.
Introduction to Shell Script Programming
Lists in Python.
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Input, Output, and Processing
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Linux+ Guide to Linux Certification, Third Edition
Chapter 8: Arrays.
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
1 LAB 4 Working with Trace Files using AWK. 2 Structure of Trace File.
Representing Strings and String I/O. Introduction A string is a sequence of characters and is treated as a single data item. A string constant, also termed.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
Chapter Six Introduction to Shell Script Programming.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
CSC 1010 Programming for All Lecture 3 Useful Python Elements for Designing Programs Some material based on material from Marty Stepp, Instructor, University.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
Review of Awk Principles
The awk command. Introduction Awk is a programming language used for manipulating data and generating reports. The data may come from standard input,
Operating System Discussion Section. The Basics of C Reference: Lecture note 2 and 3 notes.html.
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
Linux+ Guide to Linux Certification, Second Edition
CSCI 330 UNIX and Network Programming Unit IX: awk II.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
Awk Programmable Filters 1.
Arun Vishwanathan Nevis Networks Pvt. Ltd.
Lesson 5-Exploring Utilities
CSC 4630 Meeting 7 February 7, 2007.
Lecture 9 Shell Programming – Command substitution
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
CS 403: Programming Languages
John Carelli, Instructor Kutztown University
Sed and awk.
Awk.
Introduction to C Programming
Presentation transcript:

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of program q Awk is a pattern action language q The language looks a little like C but automatically handles input, field splitting, initialization, and memory management u Built-in string and number data types u No variable type declarations q Awk is a great prototyping language u Start with a few lines and keep adding until it does what you want

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 2 History q Originally designed/implemented in 1977 by Al Aho, Peter Weinberger, and Brian Kernigan u In part as an experiment to see how grep and sed could be generalized to deal with numbers as well as text u Originally intended for very short programs u But people started using it and the programs kept getting bigger and bigger! q In 1985, new awk, or nawk, was written to add enhancements to facilitate larger program development u Major new feature is user defined functions

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 3 q Other enhancements in nawk include: u Dynamic regular expressions l Text substitution and pattern matching functions u Additional built-in functions and variables u New operators and statements u Input from more than one file u Access to command line arguments q nawk also improved error messages which makes debugging considerably easier under nawk than awk q On most systems, nawk has replaced awk u On ours, both exist

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 4 Tutorial q Program structure q Running an Awk program q Error messages q Output from Awk q Record selection q BEGIN and END q Number crunching q Handling text q Built-in functions q Control flow q Arrays

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 5 BEGIN pattern {action}. pattern { action} END Structure of an AWK Program q An Awk program consists of: u An optional BEGIN segment l For processing to execute prior to reading input u pattern - action pairs l Processing for input data l For each pattern matched, the corresponding action is taken u An optional END segment l Processing after end of input data

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 6 Pattern-Action Structure q Every program statement has to have a pattern, an action, or both q Default pattern is to match all lines q Default action is to print current record q Patterns are simply listed; actions are enclosed in { }s q Awk scans a sequence of input lines, or records, one by one, searching for lines that match the pattern u Meaning of match depends on the pattern u /Beth/ matches if the string “Beth” is in the record u $3 > 0 matches if the condition is true

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 7 Running an AWK Program q There are several ways to run an Awk program u awk ‘program’ input_file(s) l program and input files are provided as command- line arguments u awk ‘program’ l program is a command-line argument; input is taken from standard input (yes, awk is a filter!) u awk -f program_file_name input_files l program is read from a file

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 8 Errors q If you make an error, Awk will provide a diagnostic error message awk '$3 == 0 [ print $1 }' emp.data awk: syntax error near line 1 awk: bailing out near line 1 q Or if you are using nawk nawk '$3 == 0 [ print $1 }' emp.data nawk: syntax error at source line 1 context is $3 == 0 >>> [ <<< 1 extra } 1 extra [ nawk: bailing out at source line 1 1 extra } 1 extra [

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 9 Some of the Built-In Variables q NF - Number of fields in current record q NR - Number of records read so far q $0- Entire line q $n- Field n q $NF - Last field of current record

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 10 Simple Output From AWK q Printing Every Line u If an action has no pattern, the action is performed fo all input lines l { print } will print all input lines on stdout l { print $0 } will do the same thing q Printing Certain Fields u Multiple items can be printed on the same output line with a single print statement u { print $1, $3 } u Expressions separated by a comma are, by default, separated by a single space when output

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 11 q NF, the Number of Fields u Any valid expression can be used after a $ to indicate a particular field u One built-in expression is NF, or Number of Fields u { print NF, $1, $NF } will print the number of fields, the first field, and the last field in the current record q Computing and Printing u You can also do computations on the field values and include the results in your output u { print $1, $2 * $3 }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 12 q Printing Line Numbers u The built-in variable NR can be used to print line numbers u { print NR, $0 } will print each line prefixed with its line number q Putting Text in the Output u You can also add other text to the output besides what is in the current record u { print “total pay for”, $1, “is”, $2 * $3 } u Note that the inserted text needs to be surrounded by double quotes

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 13 Fancier Output q Lining Up Fields u Like C, Awk has a printf function for producing formatted output u printf has the form l printf( format, val1, val2, val3, … ) { printf(“total pay for %s is $%.2f\n”, $1, $2 * $3) } u When using printf, formatting is under your control so no automatic spaces or NEWLINEs are provided by Awk. You have to insert them yourself. { printf(“%-8s %6.2f\n”, $1, $2 * $3 ) }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 14 Awk as a Filter q Since Awk is a filter, you can also use pipes with other filters to massage its output even further q Suppose you want to print the data for each employee along with their pay and have it sorted in order of increasing pay awk ‘{ printf(“%6.2f %s\n”, $2 * $3, $0) }’ emp.data | sort

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 15 Selection q Awk patterns are good for selecting specific lines from the input for further processing q Selection by Comparison u $2 >=5 { print } q Selection by Computation u $2 * $3 > 50 { printf(“%6.2f for %s\n”, $2 * $3, $1) } q Selection by Text Content u $1 == “Susie” u /Susie/ q Combinations of Patterns u $2 >= 4 || $3 >= 20

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 16 Data Validation q Validating data is a common operation q Awk is excellent at data validation u NF != 3 { print $0, “number of fields not equal to 3” } u $2 < 3.35 { print $0, “rate is below minimum wage” } u $2 > 10 { print $0, “rate exceeds $10 per hour” } u $3 < 0 { print $0, “negative hours worked” } u $3 > 60 { print $0, “too many hours worked” }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 17 BEGIN and END q Special pattern BEGIN matches before the first input line is read; END matches after the last input line has been read q This allows for initial and wrap-up processing BEGIN { print “NAME RATE HOURS”; print “” } { print } END { print “total number of employees is”, NR }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 18 Computing with AWK q Counting is easy to do with Awk $3 > 15 { emp = emp + 1} END { print emp, “employees worked more than 15 hrs”} q Computing Sums and Averages is also simple { pay = pay + $2 * $3 } END { print NR, “employees” print “total pay is”, pay print “average pay is”, pay/NR }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 19 Handling Text q One major advantage of Awk is its ability to handle strings as easily as many languages handle numbers q Awk variables can hold strings of characters as well as numbers, and Awk conveniently translates back and forth as needed q This program finds the employee who is paid the most per hour $2 > maxrate { maxrate = $2; maxemp = $1 } END { print “highest hourly rate:”, maxrate, “for”, maxemp }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 20 q String Concatenation u New strings can be created by combining old ones { names = names $1 “ “ } END { print names } q Printing the Last Input Line u Although NR retains its value after the last input line has been read, $0 does not { last = $0 } END { print last }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 21 Built-in Functions q Awk contains a number of built-in functions. length is one of them. q Counting Lines, Words, and Characters using length ( a poor man’s wc ) { nc = nc + length($0) + 1 nw = nw + NF } END { print NR, “lines,”, nw, “words,”, nc, “characters” }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 22 Control Flow Statements q Awk provides several control flow statements for making decisions and writing loops q If-Else $2 > 6 { n = n + 1; pay = pay + $2 * $3 } END { if (n > 0) print n, “employees, total pay is”, pay, “average pay is”, pay/n else print “no employees are paid more than $6/hour” }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 23 Loop Control q While # interest1 - compute compound interest # input: amount rate years # output: compound value at end of each year { i = 1 while (i <= $3) { printf(“\t%.2f\n”, $1 * (1 + $2) ^ i) i = i + 1 }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 24 q For # interest2 - compute compound interest # input: amount rate years # output: compound value at end of each year { for (i = 1; i <= $3; i = i + 1) printf(“\t%.2f\n”, $1 * (1 + $2) ^ i) }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 25 Arrays q Awk provides arrays for storing groups of related data values # reverse - print input in reverse order by line { line[NR] = $0 } # remember each line END { i = NR# print lines in reverse order while (i > 0) { print line[i] i = i - 1 }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 26 Useful “One(or so)-liners” q END { print NR } q NR == 10 q { print $NF } q {field = $NF } END { print field } q NF > 4 q $NF > 4 q { nf = nf + NF } END { print nf }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 27 q /Beth/ { nlines = nlines + 1 } END { print nlines } q $1 > max { max = $1; maxline = $0 } END { print max, maxline } q NF > 0 q length($0) > 80 q { print NF, $0} q { print $2, $1 } q { temp = $1; $1 = $2; $2 = temp; print } q { $2 = “”; print }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 28 q { for (i = NF; i > 0; i = i - 1) printf(“%s “, $i) printf(“/n”) } q { sum = 0 for (i = 1; i <= NF; i = i + 1) sum = sum + $i print sum { q { for (i = 1; i <= NF; i = i + 1) sum = sum $i } END { print sum }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 29 Review of Awk Principles q Awk’s purpose: to give Unix a general purpose programming language that handles text (strings) as easily as numbers u This makes Awk one of the most powerful of the Unix utilities q Awk process fields while ed/sed process lines q nawk (new awk) is the new standard for Awk u Designed to facilitate large awk programs q Awk gets it’s input from u files u redirection and pipes u directly from standard input

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 30 BEGIN pattern {action}. pattern { action} END Structure of an AWK Program q An Awk program consists of: u An optional BEGIN segment l For processing to execute prior to reading input u pattern - action pairs l Processing for input data l For each pattern matched, the corresponding action is taken u An optional END segment l Processing after end of input data

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 31 Pattern-Action Pairs q Both are optional, but one or the other is required u Default pattern is match every record u Default action is print record q Patterns u BEGIN and END u expressions l $3 < 100 l $4 == “Asia” u string-matching l /regex/ - /^.*$/ l string - abc –matches the first occurrence of regex or string in the record

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 32 u compound l #3 < 100 && $4 == “Asia” –&& is a logical AND –|| is a logical OR u range l NR == 10, NR == 20 –matches records 10 through 20 inclusive q Patterns can take any of these forms and for /regex/ and string patterns will match the first instance in the record

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 33 Regular Expressions in Awk q Awk uses the same regular expressions we’ve been using u ^ $ - beginning of/end of line u. - any character u [abcd] - character class u [^abcd] - negated character class u [a-z] - range of characters u (regex1|regex2) - alternation u * - zero or more occurrences of preceding expression u + - one or more occurrences of preceding expression u ? - zero or one occurrence of preceding expression u NOTE: the min max {m, n} or variations {m}, {m,} syntax is NOT supported

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 34 Awk Variables q $0, $1, $2, $NF q NR - Number of records processed q FNR - Number of records processed in current file q NF - Number of fields in current record q FILENAME - name of current input file q FS - Field separator, space or TAB by default q OFS - Output field separator, space or TAB default q ARGC/ARGV - Argument Count, Argument Value array u Used to get arguments from the command line

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 35 Command Line Arguments q Accessed via built-ins ARGC and ARGV q ARGC is set to the number of command line arguments q ARGV[ ] contains each of the arguments u For the command line u awk ‘script’ filename l ARGC == 2 l ARGV[0] == “awk” l ARGV[1] == “filename l the script is not considered an argument

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 36 q ARGC and ARGV can be used like any other variable q The can be assigned, compared, used in expressions, printed q They are commonly used for verifying that the correct number of arguments were provided

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 37 Operators q = assignment operator; sets a variable equal to a value or string q == equality operator; returns TRUE is both sides are equal q != inverse equality operator q && logical AND q || logical OR q ! logical NOT q, = relational operators q +, -, /, *, %, ^ q String concatenation

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 38 Control Flow Statements q Awk provides several control flow statements for making decisions and writing loops q If-Else if (expression is true or non-zero){ statement1 } else { statement2 } where statement1 and/or statement2 can be multiple statements enclosed in curly braces { }s u the else and associated statement2 are optional

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 39 Loop Control q While while (expression is true or non-zero) { statement1 }

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 40 q For for(expression1; expression2; expression3) { statement1 } u This has the same effect as: expression1 while (expression2) { statement1 expression3 } u for(;;) is an infinite loop

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 41 q Do While do { statement1 } while (expression)

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 42 Built-In Functions q Arithmetic u sin, cos, atan, exp, int, log, rand, sqrt q String u length, substitution, find substrings, split strings q Output u print, printf, print and printf to file q Special u system - executes a Unix command l system(“clear”) to clear the screen l Note double quotes around the Unix command u exit - stop reading input and go immediately to the END pattern-action pair if it exists, otherwise exit the script

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 43 Formatted Output q printf provides formatted output q Syntax is printf(“format string”, var1, var2, ….) q Format specifiers u %d - decimal number u %f - floating point number u %s - string u \n - NEWLINE u \t - TAB q Format modifiers u - left justify in column u n column width u.n number of decimal places to print

CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 44 printf Examples q printf(“I have %d %s\n”, how_many, animal_type) q printf(“%-10s has $%6.2f in their account\n”, name, amount) q printf(“%10s %-4.2f %-6d\n”, name, interest_rate, account_number) q printf(“\t%d\t%d\t%6.2f\t%s\n”, id_no, age, balance, name)