Chapter 5: Advanced Editors awk, sed, tr, cut
Objectives: After studying this lesson, you should be able to: –awk: a pattern scanning and processing language –sed: stream editor –tr: translate one character to another –cut: cut specific columns vertically
Awk awk is a pattern scanning and processing language. Named after its developers Aho, Weinberger, and Kernighan. (developed in 1977) Search files to see if they contain lines that match specified patterns and then perform associated actions.
awk Syntax : awk –F(separator) ‘pattern{action}’ filenames awk checks to see if the input records in the specified files satisfy the pattern If they do, awk executes the action associated with it. If no pattern is specified, the action affects every input record. A common use of awk is to process input files by formatting them, and then output the results in the chosen form.
awk A sample data file named countries Canada:3852:25:North America USA:3615:237:North America Brazil:3286:134:South America England:94:56:Europe France:211:55:Europe Japan:144:120:Asia Mexico:762:78:North America China:3705:1032:Asia India:1267:746:Asia country name, area (km^2), population density(10^6/km^2), continent
awk awk -F: '{ printf "%-10s \t%d \t%d \t%15s \n",$1,$2,$3,$4 }' countries Outputs: Canada North America USA North America Brazil South America England Europe France Europe Japan Asia Mexico North America China Asia India Asia
Some build-in Variables NF - Number of fields in current record $NF - Last field of current record NR - Number of records processed so far FILENAME - name of current input file FS - Field separator, space or TAB by default $0- Entire line $1, $2, …, $n- Field 1, 2, …, n
Formatted output printf syntax: printf "control-string" arg1, arg2,..., argn The control-string determines how printf will format arg1 - argn. The control-string contains conversion specifications, one for each argument. A conversion specification has the following format: %[-][x[.y]]conv
Formatted output %[-][x[.y]]conv - causes printf to left justify the argument. x is the minimum field width.y is the number of places to the right of a decimal point in a number. conv is a letter from the following list: d decimal e exponential notation f floating point number g use f or e, whichever is shorter o unsigned octal s string of characters x unsigned hexadecimal
printf examples printf “I have %d %s\n”, how_many, animal_type printf “%-10s has $%6.2f in their account\n”, name, amount printf “%10s %-4.2f %-6d\n”, name, interest_rate, account_number printf “\t%d\t%d\t%6.2f\t%s\n”, id_no, age, balance, name
awk awk opens a file and reads it serially, one line at a time. By specifying a pattern, we can select only those lines that contain a certain string of characters. For example we could use a pattern to display all countries from our data file which are situated within Europe. awk '/Europe/' countries
Match operator A sample data file named countries Canada:3852:25:North America USA:3615:237:North America Brazil:3286:134:South America England:94:56:Europe France:211:55:Europe Japan:144:120:Asia Mexico:762:78:North America China:3705:1032:Asia India:1267:746:Asia awk -F: '$3 == 55' countries Matching operators are : ==equal to; != not equal to; > greater than; < less than; >= greater than or equal to;<= less than or equal to
File Breaking Default is on space and tab and multiple contiguous white space counts as a single white space and leading separators are discarded
Logic Operations Sample file named cars: ford mondeo ford fiesta honda accord toyota tercel buick centry buick centry $ awk '$3 >=1991 && $4 < 6250' cars $ awk '$1 == "ford" || $1 == "buick"' cars
Data processing Sample file named wages Brooks Everest 8 40 Hatcher Phillips 8 30 Wilcox name, $/hour, hours/week Calculate $/week, tax/week, (25% on tax). awk '{ print $1,$2,$3,$2*$3,$2*$3*0.25 }' wages
Other examples $ who | awk '{ print $5, $1 }' | sort prints name and login time sorted by time $ awk -F: '{ print $1 }' /etc/passwd | sort print existing user names and sort it awk -F: '{ print "username: " $1 "\t\tuid:" $3 }' /etc/passwd print user name and user id
sed sed stands for stream editor. sed is a non-interactive editor used to make global changes to entire files at once An interactive editor like vi would be too cumbersome to try to use to replace large amounts of information at once sed command is primarily used to substitute one pattern for another
sed Typical Usage of sed: edit files too large for interactive editing edit any size files where editing sequence is too complicated to type in interactive mode perform “multiple global” editing functions efficiently in one pass through the input edit multiples files automatically good tool for writing conversion programs
sed Syntax : sed – e ‘ command ’ file(s) sed – e ‘ command ’ – e ‘ command ’ … file(s) sed – f scriptfile file(s)
sed Whole line oriented functions DELETEd APPENDa CHANGEc SUBSTITUTEs INSERTi
sed examples sed 's/Tx/Texas/' foo replaces Tx with Texas in the file foo sed -e '1,10d' foo delete lines 1-10 from the file foo sed ‘/^Co*t/,/[0-9]$/d’ foo deletes from the first line that begins with Cot, Coot, Cooot, etc through the first line that ends with a digit
sed examples cat file I have three dogs and two cats sed -e 's/dog/cat/g' -e 's/cat/elephant/g' file I have three elephants and two elephants sed –e /^$/d foo deletes all blank lines sed -e 6d foo deletes line 6.
sed examples sed 's/Tx/Texas/' foo replaces Tx with Texas in the file foo sed -e '1,10d' foo delete lines 1-10 from the file foo sed '11,$d' foo A dollar sign ($) can be used to indicate the last line in a file. For example, delete lines 11 through the end of myfile.
sed examples sed can also delete lines based on a matching string. Use /string/d For example, sed '/warning/d' log deletes every line in the file log that contains the string warning. To delete a string, not the entire line containing the string, substitut text with nothing. For example, sed 's/draft//g' foo removes the string draft everywhere it occurs in the file foo.
tr translates characters from stdin to stdout. Syntax: tr [options] string1 [string2] Options: -c complement set with respect to the entire ASCII character set -s squeeze duplicates to single characters -d delete all input characters contained in string1
tr examples Typical usages: tr chars1 chars2 outputfile tr chars1 chars2 < inputfile | less
tr tr s z replaces all instances of s with z tr so zx replaces all instances of s with z and o with x tr '[a-z]' '[A-Z]' replaces all lower case characters with upper case tr '[a-m]' '[A-M]' translates only lower case a through m to upper case A though M
My first Shell Script tr ´.,:;?!´ ´.´ converts all punctuation to a period tr –c ´[0-9a-zA-Z]´ ´_´ converts all non-characters to _ tr –s ´a-zA-Z´ squish all consecutive multiple characters
tr The output of tr can be redirected to a file or piped to another filter and its input can be redirected from a file or piped from another command This implies that certain characters must be protected from the shell by quotes or \, such as: spaces : ; & ( ) | ^ [ ] \ ! NEWLINE TAB Example: tr o ‘ ‘ replaces all o’s with a blank (space)
tr tr -d lets you delete any character matched in string1. Examples tr -d '[a-z]' deletes all lower case characters tr -d aeiou deletes all vowels tr -dc aeiou deletes all character except vowels (note: this includes spaces, TABS, and NEWLINES as well)
tr tr -cs '[A-Z][a-z]' '[\n*]' out_file It replaces all characters that are not a letter (-c) with a newline ( \n ) and then squeezes multiple newlines into a single newline (-s). The * after /n means as many repetitions as needed.
cut cut - used to cut specific columns vertically cut -c2-5 filename cut column numbers from 2 to 5 (all inclusive) from the file filename. cut -f3-4 filename if the filename has field delimiters, then individual fields can be cut out using the -f option.
cut A sample file named bar madan;SS;MRC-LMB;Ohio christine;SS;MRC-LMB;Nebraska This particular examples has 3 fields which are 'delimited' by a ; so to get field number three, you should run cut -f4 -d';' bar
Summery awk: a pattern scanning and processing language sed: stream editor tr: translate one character to another cut: cut specific columns vertically