Presentation is loading. Please wait.

Presentation is loading. Please wait.

LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus.

Similar presentations


Presentation on theme: "LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus."— Presentation transcript:

1 LIN 6932 Unix Lecture 6 Hana Filip

2 LIN 6932 HW6 - Part II solutions posted on my website see syllabus

3 LIN 6932 Text Processing Command Line Utility Programs sed wc awk comm cut ex iconv join paste sort tr uniq xargs

4 LIN 6932 TextPro Lexicon File Lexicon file “core.text” Background: TextPro An information extraction system used as SRI International, Menlo Park, CA Developed by Doug Appelt

5 LIN 6932 copy “machen.txt” into your account > cd.. > cd c6932aab > ls … machen.txt … > cp machen.txt ~ c6932aad > cd > ls … machen.txt …

6 LIN 6932 Text Processing Command Line Utility Programs tr translate or delete characters Example 1: delete (-d) all the new line characters from “machen.txt”, and redirect the output to a file named “machen-cont.txt”. % cat machen.txt | tr -d "\n" > machen-cont.txt Example 2: delete (-d) all characters from “machen.txt” except for alphabetical characters, new lines, and spaces, and redirect the output to a file named “machen-alpha.txt”. % cat machen.txt | tr -c -d "[:alpha:]\n " > machen-alpha.txt Try also: % cat machen.txt | tr -c -d "[:alpha:]\n" > machen-alpha.txt

7 LIN 6932 Text Processing Command Line Utility Programs tr can be used to make a wordlist from a text. This can be done by replacing all spaces with a newline: % cat machen.txt | tr " " "\n" | less % cat machen.txt | tr " " "\012" | less We can combine the command above with the delete functionality of tr to make a wordlist without unwanted characters: % cat machen.txt | tr " " "\n" | tr -c -d "[:alpha:]\n" > lex

8 LIN 6932 Text Processing Command Line Utility Programs sort prints the lines of its input or concatenation of all files listed in its argument list in sorted order. (The -r flag will reverse the sort order.) % sort -r movie_characters

9 LIN 6932 Text Processing Command Line Utility Programs uniq takes a text file and outputs the file with adjacent identical lines collapsed to one it is a kind of filter program typically it is used after sort % cat machen.txt | tr " " "\n" | tr -c -d "[:alpha:]\n” | sort | uniq > lex

10 LIN 6932 Text Processing Command Line Utility Programs sed = stream editor a special editor for automatically modifying files a find and replace program, it reads text from standard input and writes the result to standard outout (normally the screen) The search pattern is a regular expression (see references). sed search pattern is a regular expression, essentially the same as a grep regular expression often used in a program to make changes in a file

11 LIN 6932 Text Processing Command Line Utility Programs sed: simple example 1 % sed 's/United States/USA/' new-usa-gaz.text s Substitute command /../../ Delimiter United States Regular Expression Pattern String USA Replacement string new_file

12 LIN 6932 Text Processing Command Line Utility Programs sed: simple example 2 % sed 's/\(United\)\(States\)/\2\1/' usa-switch-gaz.text switch two words around \( word onset \) word end /../../delimiter \1 register 1 \2 register 2

13 LIN 6932 Text Processing Command Line Utility Programs multiple sed commands may also be stored in a script file. The "-f" option is used on the command line to access the commands in the script: % sed -f sedscript.sed [file]

14 LIN 6932 Text Processing Command Line Utility Programs % sed 's/^/LexEntry: /g;s/$/ ;./' lex > newlex ^ match the beginning of the line $ match the end of the line

15 LIN 6932 Text Processing Command Line Utility Programs & shell script #! /usr/local/bin/tcsh #usage: make_lex filename1; make_lex filename1 filename2, … # first, make sure the user typed in at least one argument if ( $# < 1 ) then echo "This script needs at least 1 argument." echo "Exiting...(annoyed)" exit 666 endif foreach name ($*) cat $name | tr " " "\n" | tr -c -d "[:alpha:]\n" | sort | uniq > mylex sed 's/^/LexEntry: /g;s/$/ ;./' mylex > newlex echo "Your new lexical file is called 'newlex'." end


Download ppt "LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus."

Similar presentations


Ads by Google