UNIX Filters.

Slides:



Advertisements
Similar presentations
การใช้ระบบปฏิบัติการ UNIX พื้นฐาน บทที่ 4 File Manipulation วิบูลย์ วราสิทธิชัย นักวิชาการคอมพิวเตอร์ ศูนย์คอมพิวเตอร์ ม. สงขลานครินทร์ เวอร์ชั่น 1 วันที่
Advertisements

A Guide to Unix Using Linux Fourth Edition
 *, ? And [ …] . Any single character  ^ beginning of a line  $ end of the line.
CS 497C – Introduction to UNIX Lecture 24: - Simple Filters Chin-Chih Chang
Now, return to the Unix Unix shells: Subshells--- Variable---1. Local 2. Environmental.
CS 497C – Introduction to UNIX Lecture 25: - Simple Filters Chin-Chih Chang
CS 497C – Introduction to UNIX Lecture 23: - Simple Filters Chin-Chih Chang
Guide To UNIX Using Linux Third Edition
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Grep, comm, and uniq. The grep Command The grep command allows a user to search for specific text inside a file. The grep command will find all occurrences.
CSCI 330 T HE UNIX S YSTEM File operations. OPERATIONS ON REGULAR FILES 2 CSCI The UNIX System Create Edit Display Contents Display Contents Print.
Unix Files, IO Plumbing and Filters The file system and pathnames Files with more than one link Shell wildcards Characters special to the shell Pipes and.
CSC 4630 Meeting 2 January 22, Filters Definition: A filter is a program that takes a text file as an input and produces a text file as an output.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
CS 124/LINGUIST 180 From Languages to Information Unix for Poets (in 2014) Dan Jurafsky (From Chris Manning’s modification of Ken Church’s presentation)
Shell Script Examples.
Chapter 4: UNIX File Processing Input and Output.
Advanced File Processing
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files.
Guide To UNIX Using Linux Fourth Edition
LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus.
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
Unix programming Term: III B.Tech II semester Unit-II PPT Slides Text Books: (1)unix the ultimate guide by Sumitabha Das (2)Advanced programming.
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
CS 403: Programming Languages Lecture 21 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Module 6 – Redirections, Pipes and Power Tools.. STDin 0 STDout 1 STDerr 2 Redirections.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to Unix (CA263) File Processing (continued) By Tariq Ibn Aziz.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Chapter Four I/O Redirection1 System Programming Shell Operators.
Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.
TEXT PROCESSING UTILITIES. THE cat COMMAND $ cat emp1.lst $ cat emp1.lst 2233 | shukla | g.m | sales | 12/12/52 | | sharma |d.g.m |product.
CS 124/LINGUIST 180 From Languages to Information Unix for Poets (in 2013) Christopher Manning Stanford University.
Agenda Basic Unix Commands (Chapters 2 & 3) Miscellaneous Commands: which, passwd, date, ps / kill Working with Files: file, touch, cat, more, less, grep,
– Introduction to the Shell 1/21/2016 Introduction to the Shell – Session Introduction to the Shell – Session 3 · Job control · Start,
CS 124/LINGUIST 180 From Languages to Information
CSC 352– Unix Programming, Spring 2015 February 2015 Unix Filters.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
Lesson 6-Using Utilities to Accomplish Complex Tasks.
In the last class, Filters and delimiters The sample database pr command head and tail commands cut and paste commands.
1 © 2001 John Urrutia. All rights reserved. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Comp 145 – Introduction to UNIX $200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 UNIX Processes.
Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
SIMPLE FILTERS. CONTENTS Filters – definition To format text – pr Pick lines from the beginning – head Pick lines from the end – tail Extract characters.
Tutorial of Unix Command & shell scriptS 5027
Lesson 5-Exploring Utilities
The UNIX Shell Learning Objectives:
Chapter 6 Filters.
Linux command line basics III: piping commands for text processing
CS 403: Programming Languages
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Command & shell scriptS 5027
CS 124/LINGUIST 180 From Languages to Information
The Linux Command Line Chapter 6
Guide To UNIX Using Linux Third Edition
Tutorial of Unix Command & shell scriptS 5027
Chapter Four UNIX File Processing.
CS 124/LINGUIST 180 From Languages to Information
Software I: Utilities and Internals
Presentation transcript:

UNIX Filters

Some Simple UNIX Filters (Commands That Use Both Standard Input & Standard Output) File Level Content Level pr cmp, comm, diff sort uniq head,tail cut, paste tr grep There are a lot of others!!

Formatting Output: pr Command pr prepares files for printing by adding formatting, headers, footers, … Some options: -k prints in k columns -n numbers the lines of output -d double spaces output -l n sets length to n lines -w m sets width to m chars Example: a.out | pr –n –d –l 64 Prints output with line numbers, double spacing, and 64 lines per page

Comparing Files: cmp Command cmp compares 2 files and stops when it finds a difference The comparison is character by character (byte by byte). Option: -l lists all byte differences in the files Examples: cmp file1 file2 file1 file2 differ: char 12, line 3 cmp –l file1 file2 | wc –l Displays the number of differences in the 2 files.

Comparing Files: comm Command comm compares 2 files and lists 3 columns of information: 1. lines unique to 1st file 2.lines unique to 2nd file 3. lines common to both files The files must be sorted. Options: -1, -2, -3 indicates the columns to drop in the output Examples: comm -3 file1 file2 Lists all the unique lines in both files. comm –l2 file1 file2 Lists all the lines common to

Comparing Files: diff Command diff compares 2 files and lists the instructions needed to make the files the same. Here’s an example: $ diff file1 file2 3c3 Change line 3 < This is line 3 of file 1. from this --- > This is line 3 of file 2. to this. 7a8 > This is line 8 from file 2. Add this line after line 7 of file1.

Extracting Vertical Data: cut Command cut extracts vertical slices of data from a file. Either columns (-c option) or fields (-f option) of data may be extracted. A delimiter (-d option) may be defined to separate fields. Default delimiter is tab. Examples: cut –c1-4,8,15- file1 Extracts chars 1 thru 4, the 8th character, and characters 15 thru end of each line from file1. No whitespace in the column list!! cut –d: -f1-3 file2 Extracts fields 1 thru 3 from file2. The fields are separated by the : character who | cut –d” “ –f1 Lists the names of all users logged in.

Joining Vertical Data: paste Command paste vertically joins 2 files together. A delimiter (-d option) may be defined to separate fields. Default delimiter is tab. The -s option joins lines of a single file together. Examples: paste file1 file2 Displays file1 and file2 side by side. paste –s –d”::\n” addressbook rick rick@att.com >>>> rick:rick@att.com:1234567890 1234567890

Displaying Files: head and tail Commands head displays the top of a file. (1st 10 lines, by default). tail displays the end of the file (last 10 lines, by default). Options: -n x or –x displays 1st (last) x lines of the file. -f continuously displays the end of a file as it grows. This option is for the tail command only. You must use the interrupt key to stop monitoring the file growth. Examples: ls –t | head –n 1 Displays the file most recently edited. tail –f install.log Continuously displays the log file as it grows. Use interrupt key to stop.

Ordering Files: sort Command sort reorders the lines of a file in ascending (descending) order. The default order is ASCII: whitespace, numbers, uppercase, and finally, lowercase letters. Options: -k n sort on the nth field of the line -tchar use char as the field delimiter -n sort numerically -r reverse order sort -u remove repeated lines -m list merge sorted files in list

sort Examples Examples: sort –t: -k 2 list Sort on the 2nd field of file list. Fields are separated by : sort –t: -k 5.7 –r list Sort file list in reverse order on the 7th character of 5th field. Fields separated by : sort –n list Numerically sort file list, assumed to contain numbers. sort –m file1 file2 Sorted files file1 and file2 are merged. cut –d: -f3 list | sort –u Extract the 3rd field from list & sort that field, removing the repeated lines.

Removing Duplicates: uniq Command uniq displays a presorted file, removing all the duplicate lines from it. If 2 files are specified, uniq reads from the first and writes to the second. Options: -u lists only the lines that are unique -d lists only the lines that are duplicates -c counts the frequency of occurrences Examples: sort list | uniq – xlist Sorts file list; uniq reads from stdin and writes the output to xlist. uniq –c list Displays count of each unique line in the file list.

Character Manipulation: tr Command tr translates characters from one format to another. Input always comes from standard input. Arguments don’t include filenames. General form is: tr options expression1 expression2 standard input expression1 is the set of characters to change; expression2 is what they change to. (The expressions should be equal length.) Examples: tr ‘+-’ ‘*/’ < math In the file math, replace all +’s with *’s and replace all -’s with /’s. head –n 3 list | tr ‘[a-z]’ ‘[A-Z]’ The 1st 3 lines of the file list are translated to uppercase.

tr Command Options Options: -d delete characters from the input stream -s compress multiple consecutive characters (squeeze) -c complementing value of expression Examples: tr –d ‘/’ < dates Remove all /’s from the file dates. tr –s ‘ ’ < names Replace all strings of blanks with a single blank in the file names. tr –cd ‘:’ < file1 In the file file1, delete everything that isn’t a colon (:). All that’s left is a file full of :’s

Finding Patterns in Files with grep grep searches a file and displays the lines containing a pattern. Form is: grep options pattern files If more than 1 file is listed, the filename is also displayed in the output. Some options: -i ignore case when matching -n display line numbers as well as lines -c displays a count of the number of occurrences Examples: grep “professor” college.lst Displays all lines in file, college.lst, that contain the string professor. grep –i “Rick” college.lst Displays all lines in the file, college.lst, that contain the string Rick. (Also finds rick, RICK, rIcK, …)

Regular Expressions in grep Regular expressions are metacharacter patterns used in ways different from how the shell uses them. Regular Expression Meaning * 0 or more of the previous character . a single character [pqr] a single p or q or r [c1-c2] a single char in the ASCII range of c1 thru c2 [^pqr] a single character not p nor q nor r ^abc abc at the beginning of line abc$ abc at the end of the line

Example Regular Expressions with grep grep “g*” file1 Displays all lines in file1 that contain nothing or g, gg, ggg, … grep “.*” file1 Displays all lines in file1 that contain nothing or any # of chars grep “[1-3]” file1 Displays all lines in file1 that contain a digit between 1 & 3. grep “[^a-zA-Z]” file1 Displays all lines in file1 that contain a non-alphabetic character grep “^Rick$” file1 Displays all lines in file1 that contain only Rick grep “^$” file1 Displays all lines in file1 that contain nothing grep “R[aeiou]ck” file1 Displays all lines in file1 that contain Rack, Reck, Rick, Rock or Ruck

Putting It All Together An author wants to count the frequency of words used in a book chapter. 1. Put each word on a separate line: tr “ \011” “\012\012” < chapter 2. Strip out everything that isn’t an alphabetic character or newline: tr -cd “[a-zA-Z\012]” 3. Sort the list: sort 4. Count the word frequency: uniq –c 5. Put it all together: tr “ \011” “\012\012” < chapter | tr –cd “[a-zA-Z\012]” | sort | uniq -c