AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

Introduction to C Programming
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC Santa Barbara.
The awk Utility CS465 - Unix. Background awk was developed by –Aho, Weinberger, and Kernighan (of K & R) –Was further extended at Bell Labs Handles simple.
Guide To UNIX Using Linux Third Edition
Guide To UNIX Using Linux Third Edition
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Advanced File Processing
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
A Guide to Unix Using Linux Fourth Edition
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users”
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Exam Revision Ruibin Bai (Room AB326) Division of Computer Science The University of Nottingham.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
Perl: Lecture 1 The language. What Perl is Merger of Unix tools – Very popular under UNIX – shell, sed, awk Programming language – C syntax Scripting.
A talk about AWK Don Newcomb 18 Jan What is AWK? AWK is an interpreted computer language It is primarily used for text processing and data formatting.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
Introduction to Unix – CS 21
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (2) Ruibin Bai (Room AB326) Division of Computer Science The University.
Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
1 LAB 4 Working with Trace Files using AWK. 2 Structure of Trace File.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
16-Dec-15Advanced Programming Spring 2002 sed and awk Henning Schulzrinne Dept. of Computer Science Columbia University.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
– Introduction to the Shell 1/21/2016 Introduction to the Shell – Session Introduction to the Shell – Session 3 · Job control · Start,
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming (3) Ruibin Bai (Room AB326) Division of Computer Science The University.
Alon Efrat Computer Science Department University of Arizona Unix Tools.
CISC 1480/KRF Copyright © 1999 by Kenneth R. Frazer 1 AWK q A programming language for handling common data manipulation tasks with only a few lines of.
Review of Awk Principles
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
CSCI 330 UNIX and Network Programming Unit IX: awk II.
CSC 4630 Meeting 17 March 21, Exam/Quiz Schedule Due to ice, travel, research and other commitments that we all have: –Quiz 2, scheduled for Monday.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Programming Languages Meeting 12 November 18/19, 2014.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
AWK One tool to create them all AWK Marcel Nijenhof Eth-0 11 Augustus 2010.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
Lesson 5-Exploring Utilities
CSC 4630 Meeting 7 February 7, 2007.
Lecture 14 Programming with awk II
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
CS 403: Programming Languages
John Carelli, Instructor Kutztown University
Guide To UNIX Using Linux Third Edition
LING 408/508: Computational Techniques for Linguists
Programming Languages
Awk.
Presentation transcript:

AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego

AWK - Sherwood 2 Duct Tape Research Environment Lots of simulators, data, and analysis tools Since it is research, nothing works together Unix pipes are the ducts Awk is the duct tape It’s not the “best” way to connect everything Maintaining anything complicated problematic It is a good way of getting it to work quickly In research, most stuff doesn’t work anyways Really good at a some common problems

AWK - Sherwood 3 Goals My Goals for this talk Introduce the Awk language Demonstrate how it has been useful Discuss the limits / pitfalls Eat some pizza What this talk is not A promotion of all-awk all-the-time (tools) A perl vs. awk

AWK - Sherwood 4 Outline Background Applications Programming in awk Examples Other tools that play nice Summary and Pointers

AWK - Sherwood 5 Background Developed by Aho, Weinberger, and Kernighan Further extended by Bell Further extended in Gawk Developed to handle simple data-reformatting jobs easily with just a few lines of code. C-like syntax The K in Awk is the K in K&R Easy learning curve

AWK - Sherwood 6 Applications Smart grep All the functionality of grep with added logical and numerical abilities File conversion Quickly write format converters for text files Spreadsheet Easy use of columns and rows Graphing/tables/tex Gluing pipes

AWK - Sherwood 7 Running Awk Two ways to run it From the Command line cat file | gawk ‘(pattern){action}’ Or you can call gawk with the file name From a script (recommended) #!/usr/bin/gawk –f # This is a comment (pattern) {action} …

AWK - Sherwood 8 Programming Programming is done by building a list This is a list of rules Each rule is applied sequentially to each line Each line is a record (pattern1) { action } (pattern2) { action } …

AWK - Sherwood 9 Example 1 Input PING dt033n32.san.rr.com ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms … ----dt033n32.san.rr.com PING Statistics packets transmitted, 1270 packets received, 0% packet loss round-trip (ms) min/avg/max = 37/73/495 ms Program (/icmp_seq/) {print $0} Output 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms

AWK - Sherwood 10 Fields Awk divides the file into records and fields Each line is a record (by default) Fields are delimited by a special character Whitespace by default Can change with –F or FS Fields are accessed with the ‘$’ $1 is the first field, $2 is the second $0 is a special field which is the entire line NF is always set to the number of fields

AWK - Sherwood 11 Example 2 Input PING dt033n32.san.rr.com ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms … ----dt033n32.san.rr.com PING Statistics packets transmitted, 1270 packets received, 0% packet loss round-trip (ms) min/avg/max = 37/73/495 ms Program (/icmp_seq/) {print $7} Output time=49 time=94 time=50 time=41

AWK - Sherwood 12 Variables Variables uses are naked No need for declaration Implicitly set to 0 AND Empty String There is only one type in awk Combination of a floating-point and string The variable is converted as needed Based on it’s use No matter what is in x you can always x = x + 1 length(x)

AWK - Sherwood 13 Example 2 Input PING dt033n32.san.rr.com ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=48 time=49 ms 64 bytes from : icmp_seq=1 ttl=48 time=94 ms 64 bytes from : icmp_seq=2 ttl=48 time=50 ms 64 bytes from : icmp_seq=3 ttl=48 time=41 ms … Program (/icmp_seq/) { n = substr($7,6); printf( "%s\n", n/10 ); #conversion } Output …

AWK - Sherwood 14 Variables Some built in variables Informative NF = Number of Fields NR = Current Record Number Configuration FS = Field separator Can set them externally From command line use Gawk –v var=value

AWK - Sherwood 15 Patterns Patterns can be Empty: match everything Regular expression: (/regular expression/) Boolean Expression: ($2==“foo” && $7==“bar”) Range: ($2==“on”, $3==“off”) Special: BEGIN and END

AWK - Sherwood 16 “Arrays” All arrays in awk are associative A[1] = “foo”; B[“awk talk”] = “pizza”; To check if there is an element in the array Use “in” If ( “awk talk” in B ) … Arrays can be sparse, they automatically resize, auto-initialize, and are fast (unless they get huge) Multi-dimensional (sort of)

AWK - Sherwood 17 Example 4 Input PING dt033n32.san.rr.com ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=48 time=49 ms … Program (/icmp_seq/) { n = int(substr($7,6)/10); hist[n]++; #array } END { for(x in hist) printf(“%s: %s”, x*10, hist[x]); } Output 40: : 216 … 490: 1

AWK - Sherwood 18 Built-in Functions Numeric: cos, exp, int, log, rand, sqrt … String Functions Gsub( regex, replacement, target ) Index( searchstring, target ) Length( string ) Split( string, array, regex ) Substr( string, start, length=inf) Tolower( string )

AWK - Sherwood 19 Writing Functions Functions were not part of the original spec Added in later, and it shows Rule variables are global Function variables are local Function MyFunc(a,b, c,d) { Return a+b+c+d }

AWK - Sherwood 20 Other Tools Awk is best used with pipes Other tools that work well with pipes Fgrep: fgrep mydata *.data Uniq: Sort Sed/tr Cut/paste Jgraph/Ploticus

AWK - Sherwood 21 Jgraph Example

AWK - Sherwood 22 My Scripts Functions to handle hex data Set of scripts for handling 2-D arrays A:1:1.0 A:2:1.2 B:1:4.0 B:2:5.0 Name:1:2 A:1.0:1.2 B:4.0:5.0 Name | 1 | 2 A | 1.0 | 1.2 B | 4.0 | 5.0

AWK - Sherwood 23 Pitfalls White space No whitespace between function and ‘(‘ Myfunc( $1 ) = Myfunc ( $1 ) =  No line break between pattern and action Don’t forget the -f on executable scripts

AWK - Sherwood 24 Summary Awk is a very powerful tool If properly applied It is not for everything (I know) Very handy for pre-processing Data conversion