1 © 2001 John Urrutia. All rights reserved. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk.

Slides:



Advertisements
Similar presentations
CST8177 sed The Stream Editor. The original editor for Unix was called ed, short for editor. By today's standards, ed was very primitive. Soon, sed was.
Advertisements

1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
Grep (Global REgular expresion Print) Operation –Search a group of files –Find all lines that contain a particular regular expression pattern –Write the.
2000 Copyrights, Danielle S. Lahmani UNIX Tools G , Fall 2000 Danielle S. Lahmani Lecture 6.
CS 497C – Introduction to UNIX Lecture 23: - Simple Filters Chin-Chih Chang
Guide To UNIX Using Linux Third Edition
Lecture 02CS311 – Operating Systems 1 1 CS311 – Lecture 02 Outline UNIX/Linux features – Redirection – pipes – Terminating a command – Running program.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
UNIX Filters.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Shell Script Examples.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Advanced File Processing
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved Streams Streams –Sequences of characters organized.
Introduction to Shell Script Programming
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
Guide To UNIX Using Linux Fourth Edition
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
Unix programming Term: III B.Tech II semester Unit-II PPT Slides Text Books: (1)unix the ultimate guide by Sumitabha Das (2)Advanced programming.
Chapter 5: Advanced Editors awk, sed, tr, cut. Objectives: After studying this lesson, you should be able to: –awk: a pattern scanning and processing.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
(Stream Editor) By: Ross Mills.  Sed is an acronym for stream editor  Instead of altering the original file, sed is used to scan the input file line.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
1 © 2001 John Urrutia. All rights reserved. Chapter 10 using the Bourne Again Shell.
Chapter 13: sed Say what?. In this chapter … Basics Programs Addresses Instructions Control Spaces Examples.
Introduction to C Programming Chapter 2 : Data Input, Processing and Output.
Awk Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Sed Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
WHAT IS SED? A non-interactive stream editor Interprets sed instructions and performs actions Use sed to: Automatically perform edits on file(s) ‏ Simplify.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
Chapter 1 – Matlab Overview EGR1302. Desktop Command window Current Directory window Command History window Tabs to toggle between Current Directory &
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
LIN Unix Lecture 7 Hana Filip. LIN Text Processing Command Line Utility Programs (cont.) sed LAST WEEK wc sort tr uniq awk TODAY join paste.
1 © 2000 John Urrutia. All rights reserved. Session 5 The Bourne Shell.
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
1 P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University.
Sed. Class Issues vSphere Issues – root only until lab 3.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
Lesson 6-Using Utilities to Accomplish Complex Tasks.
CSCI 330 UNIX and Network Programming
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
SIMPLE FILTERS. CONTENTS Filters – definition To format text – pr Pick lines from the beginning – head Pick lines from the end – tail Extract characters.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/9/2006 Lecture 6 – String Processing.
Lesson 5-Exploring Utilities
CSC 4630 Meeting 7 February 7, 2007.
CST8177 sed The Stream Editor.
Chapter 6 Filters.
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
CS 403: Programming Languages
John Carelli, Instructor Kutztown University
Guide To UNIX Using Linux Third Edition
Unix Talk #2 (sed).
Presentation transcript:

1 © 2001 John Urrutia. All rights reserved. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk

2 © 2001 John Urrutia. All rights reserved. Overview comm – comparison of sorted files cut – output sections of lines in a file find – find files that match a pattern paste – merges records in files pr – paginate files into pages tr – translate or delete characters

3 © 2001 John Urrutia. All rights reserved. Overview regular expressions sed – S tream Ed itor (batch file editor) awk – A ho,W einberger,K ernighan ( Pattern match )

4 © 2001 John Urrutia. All rights reserved. The comm before the storm Compares 2 sorted files  Results reported in 3 columns  1 st – records found only in file 1  2 nd – records found only in file 2  3 rd – records that match in both files  Options remove corresponding columns  – [1] [2] [3]

5 © 2001 John Urrutia. All rights reserved. comm – cont. Either file name can be substituted with standard input Example:  File1File2 aabb ddcc eedd ggee hhff

6 © 2001 John Urrutia. All rights reserved. comm results File1File2Both aa bb cc dd ee ff gg hh option bb cc dd ee ff option -2-2 aa dd ee gg hh option -12 dd ee

7 © 2001 John Urrutia. All rights reserved. cut to the chase Allows you to extract portions of each record in a file. Delimits data in the file into fields or columns.  Default delimiter is the tab character  Can be changed by the –d option

8 © 2001 John Urrutia. All rights reserved. cut cont. cut - [b | c | [ f [-d char ] [-s] ] list [--output-delimiter=string]  b – bytes  c – characters (same as bytes)  f – fields  d – delimiter character  s– display only records with delimiters

9 © 2001 John Urrutia. All rights reserved. cut ! print char – single byte used to delimit fields in a record list – list of range/s of characters to display  Ranges are comma separated.  1-7 first 7 characters in record  1,7 first and seventh characters

10 © 2001 John Urrutia. All rights reserved. cut ! print again string – list of characters to substitute for the delimiters.

11 © 2001 John Urrutia. All rights reserved. cut - Example uid]$ cat file1 The quick brown fox eyed the jactitating dog uid]$ cut –f1,3,5,8 –d’ ‘ file1 The brown eyed dog uid]$ cut –f1,4-6,8 –d’ ‘ file1 The fox eyed the dog

12 © 2001 John Urrutia. All rights reserved. find that pot of gold find – selects all files that meet the selection criteria in the expression  No action is taken unless it is specified  Sub-directories are scanned automatically  The expression can be simple or complex

13 © 2001 John Urrutia. All rights reserved. find me something The criteria expression:  And’s each operand separated by a space  Or’s each operand separated by –o  Processes left to right sequentially

14 © 2001 John Urrutia. All rights reserved. find criteria continued Actions  -print prints the path of all files that meet the selection criteria  -exec cmds\; executes the commands before the \:  -ok same as –exec but must have a Y from stdin.

15 © 2001 John Urrutia. All rights reserved. find criteria continued again Evaluations  -type specify a type of file ( ie. directory )  -atime ±n accessed ±n days ago.  -mtime ±n modified ±n days ago.  -user uid owner of the file  -nouser uid owner is not known to system

16 © 2001 John Urrutia. All rights reserved. paste tastes good paste [options] [filelist] each record in the file is merged into 1 record  -s process filelist sequentially. All records are processed before going to the next file  -d [delimiter list] each character in turn delimits the file records.

17 © 2001 John Urrutia. All rights reserved. paste continued uid]$ cat file1 A B C uid]$ cat file uid]$ cat file3 x y z

18 © 2001 John Urrutia. All rights reserved. paste continued uid]$ paste file1 file2 file3 Output file A1x B2y C3z uid]$ paste –s file1 file2 file3 Output file ABC 123 xyz

19 © 2001 John Urrutia. All rights reserved. pr – public relations--NOT pr paginate file(s) for printing  Can specify page attributes  Changed lines through the –l option  For multiple files each starts a new page

20 © 2001 John Urrutia. All rights reserved. pr – continued pr paginate a file for printing  Creates a header and trailer  Changed through the –h option  Suppress through the –t option  Can create columns of data  – nbr Number of columns per line  –S x Character used to separate columns

21 © 2001 John Urrutia. All rights reserved. pr – continued  Can create numbers for each line  –n ck  c - character data separator default is tab character  k – number of digits

22 © 2001 John Urrutia. All rights reserved. Regular Expressions A set of characters that define the criteria used to identify a string within a record. Used by vi, grep, sed, awk, and others.

23 © 2001 John Urrutia. All rights reserved. tr – Translate this tr – [c] [d] [s] [t] set1 [ set2 ] Translate from set1 to set2  c – compliment of set1  d – delete characters found in set1  s – squeeze out duplicates  t – truncate set1 to length of set2

24 © 2001 John Urrutia. All rights reserved. Regular Expressions Simple strings  Bound by / … /  Interpreted literally  ie. /e D/ - matches exactly e D  Taste Dee – OK  Taste don’t – not OK

25 © 2001 John Urrutia. All rights reserved. Regular Expressions The special single sub character  Matches any single character  ie. – /.eny/ matches Aeny Beny Ceny The [ char-range ] define a character class The [^ char-range ] define the not-in- character class

26 © 2001 John Urrutia. All rights reserved. Regular Expressions The  (asterisk)  Matches 0 or more of the preceding character. What’s this?  /.  /  / [ a-zA-Z ]  /  / ([ ^ )]  )/

27 © 2001 John Urrutia. All rights reserved. Regular Expressions The /^ ( for the rabbit ) character  In the beginning … The $/ ( for the teacher ) character  At the end …

28 © 2001 John Urrutia. All rights reserved. Regular Expressions Quote the raven – backslash  \. This yields   \\ This yields  \  \* This yields  *  \[ This yields  [  \] This yields  ]  \ / This yields  /

29 © 2001 John Urrutia. All rights reserved. sed – the old Stream EDitor sed [-n] [-f script ] [file-list] Copies and edits to standard output Edits file(s) in a non-interactive mode Gets its instructions from a script file  –f filename contains sed instructions  No option 1 st command argument is used  –n suppress stdout unless specified

30 © 2001 John Urrutia. All rights reserved. sed – the old mill stream Record processing 1.Read record from file list 2.Read record from script (or cmd line) 3.Apply selection criteria 4.If selected perform instruction and repeat 2  4 until no more script 5.Repeat 1  5 until no more file list.

31 © 2001 John Urrutia. All rights reserved. He sed what!!?? Instruction format [addr1 ],addr2 ] ] inst [arg-list] Address  A line number  Regular expression  Addr1 – start  Addr2 – stop

32 © 2001 John Urrutia. All rights reserved. Address line numbers $ Designates the last line of the last file 1 st address line number  Starts selecting records based on their position in the input file list relative to 1. 2 nd address line number  Stops selecting records when position in the input file list is > than the line number.

33 © 2001 John Urrutia. All rights reserved. He sed some more Instructions  ! – Not negates the address selection  sed ‘!/line/ p’ file.list  {…} – Groups the instructions for the address selection

34 © 2001 John Urrutia. All rights reserved. sed Instructions p – Print now and continue d – Delete and get the next record q – Quit processing; Stop; Go Away

35 © 2001 John Urrutia. All rights reserved. sed Instructions c – Change  [addr1] [addr2] c\ yada yada yada all selected records are replaced as a group by the change value a – Append  [addr1] a\ … add the text to the end of the selected records

36 © 2001 John Urrutia. All rights reserved. sed Instructions i – Insert  [addr1] a\ … add the text to the beginning of the selected records n – Next  [addr1] n writes the current, gets the next and continues the script

37 © 2001 John Urrutia. All rights reserved. sed Instructions w – Write  [addr1] [,addr2] w filename writes the selected records to a file r – Read  [addr1] r filename reads records from the filename and appends them to the selected record

38 © 2001 John Urrutia. All rights reserved. sed Instructions s – Substitute  [addr1] [,addr2] s/ ptrn / repl /[g] [p] [w f ] for each selected record match the pattern and replace  g – Replace all non-overlapping occurrences  p – Print the record  w – write the record to the filename

39 © 2001 John Urrutia. All rights reserved. Hawk – Squawk – awk The programmable utility that does everything. Aho – Weinberger – Kernighan Provides:  Conditional execution  Looping Handles:  Numeric & string variables  Regular expresions  C print facilities

40 © 2001 John Urrutia. All rights reserved. awk awk [–F c ] [–f] program-file [ file list ]  F – field delimiter character  f – name of the awk program file  program-file instream instructions  List of files to process

41 © 2001 John Urrutia. All rights reserved. awk – program lines pattern [ action ]  Like sed pattern selects records  Record processing is the same as sed

42 © 2001 John Urrutia. All rights reserved. awk – pattern Patterns follow regular expression format.  ~ Tests for match to regular expression  !~ Tests for NO match to regular expression , – Establishes a pattern range all records are processed inclusively within the range  BEGIN executes before the first record is processed  END executes after the last record is processed

43 © 2001 John Urrutia. All rights reserved. awk – relational operators < – less than <= – less than or equal to == – equal to != – not equal to >= – greater than or equal to > – greater than

44 © 2001 John Urrutia. All rights reserved. awk – operators Arithmetic  + – addition  - – subtraction  * – multiplication  / – division Assignment  = – assigns value to the left  += – adds value to the left

45 © 2001 John Urrutia. All rights reserved. awk – boolean operators &&– and ||– or !– not

46 © 2001 John Urrutia. All rights reserved. awk – actions # - Comment to the right on any line Default action is print to stdout Multiple actions can be taken  Use {…} to enclose multiple actions  Separate actions with ;

47 © 2001 John Urrutia. All rights reserved. awk – actions print variable …  Var, Var2, Var3  Prints variables separated by delimiter  Var Var2 Var3  NO separators  “ literal value “  Prints exactly everything between the “ “

48 © 2001 John Urrutia. All rights reserved. awk – actions printf “cntl string” variable …  Control String  \n – new line  \t – tab  %[-] [ n ] [. d ] conv char  - left justification  n number of character . d decimal positions

49 © 2001 John Urrutia. All rights reserved. awk – actions  %[-] [ n ] [. d ] conv char  - left justification  n number of character . d decimal positions  conv char – conversion character d - decimal, e - exponent, f - floating-point o - octal, x - hexadecimal s - string

50 © 2001 John Urrutia. All rights reserved. awk – variables awk provided variables  NF – total number of fields  $1…$n – each field in the current record  FS – input field separator (default space or tab )  OFS – output field separator (default space )

51 © 2001 John Urrutia. All rights reserved. awk – variables awk provided variables  NR – current record number  $0 – entire current record  RS – record separator (default newline )  ORS – output record separator (default newline )  FILENAME – name of current input file

52 © 2001 John Urrutia. All rights reserved. awk - variables Associative Arrays  array_name [ string ]  The array name should be meaningful  The index of the array is a string  Elements are automatically created  for ( element in array ) actions

53 © 2001 John Urrutia. All rights reserved. awk - functions length(string) – returns the number of characters in string int(num) – returns the integer portion index(str1,str2) – returns the index of str2 found in str1 or 0 if not present split(str,arr,del) – populates arr[ ] from fields in str delimited by del – returns count of elements.

54 © 2001 John Urrutia. All rights reserved. awk - functions sprintf(fmt, args) – formats args using the fmt and returns the formatted string. substr(str, pos, len) – returns a substring of str starting with position pos for a length of len.