7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.

Slides:



Advertisements
Similar presentations
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
Advertisements

LINUX System : Lecture 3 (English-Only Lecture) Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Acknowledgement.
Regular Expressions grep
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson
CS 497C – Introduction to UNIX Lecture 29: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Shell Basics CS465 - Unix. Shell Basics Shells provide: –Command interpretation –Multiple commands on a single line –Expansion of wildcard filenames –Redirection.
Quotes: single vs. double vs. grave accent % set day = date % echo day day % echo $day date % echo '$day' $day % echo "$day" date % echo `$day` Mon Jul.
Linux+ Guide to Linux Certification, Second Edition
QUOTATION This chapter teaches you about a unique feature of the shell programming language: the way it interprets quote characters. Basically, the shell.
Shell Script Examples.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
3 File Processing Mauro Jaskelioff. Introduction More UNIX commands for handling files Regular Expressions and Searching files Redirection and pipes Bash.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
System Programming Regular Expressions Regular Expressions
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
1 Operating Systems Lecture 3 Shell Scripts. 2 Brief review of unix1.txt n Glob Construct (metacharacters) and other special characters F ?, *, [] F Ex.
8 Shell Programming Mauro Jaskelioff. Introduction Environment variables –How to use and assign them –Your PATH variable Introduction to shell programming.
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
The UNIX Shell. The Shell Program that constantly runs at terminal after a user has logged in. Prompts the user and waits for user input. Interprets command.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
1 Lecture 5 Additional useful commands COP 3353 Introduction to UNIX.
Introduction to Bash Programming Ellen Zhang. Previous three classes What have we learnt so far ?
Linux+ Guide to Linux Certification, Third Edition
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
CSC 352– Unix Programming, Spring 2015 April 28 A few final commands.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
Appendix A: Regular Expressions It’s All Greek to Me.
Shell Advanced Features. Module 8 Shell Advanced Features ♦ Introduction In Linux systems, the shells are often referred to as command line interfaces.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
BASH – Text Processing Utilities Erick, Joan © Sekolah Tinggi Teknik Surabaya 1.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
Chapter 5: The Shell The Man in the Middle. In this chapter … The command line Input, output, and redirection Process management Wildcards and expansion.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 5 – Regular Expressions, grep, Other Utilities.
Agenda The Bourne Shell – Part II Special Characters Ambiguous File Reference Variable Names and Values User Created Variables Read-only Variables (Positional.
What is grep ?  % man grep  DESCRIPTION  The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses.
Linux+ Guide to Linux Certification, Second Edition Chapter 4 Exploring Linux Filesystems.
Linux+ Guide to Linux Certification, Second Edition
FILTERS USING REGULAR EXPRESSIONS – grep and sed.
CSC 352– Unix Programming, Fall 2011 November 8, 2011, Week 11, a useful subset of regular expressions, grep and sed, parts of Chapter 11.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
CIRC Summer School 2016 Baowei Liu
PROGRAMMING THE BASH SHELL PART III by İlker Korkmaz and Kaya Oğuz
Regular Expressions Copyright Doug Maxwell (
CSC 352– Unix Programming, Spring 2016
CST8177 sed The Stream Editor.
CIRC Winter Boot Camp 2017 Baowei Liu
Regular Expression - Intro
BASIC AND EXTENDED REGULAR EXPRESSIONS
Lecture 9 Shell Programming – Command substitution
The ‘grep’ Command Colin Masterson.
The Linux Command Line Chapter 7
CSC 352– Unix Programming, Spring 2016
Unix Talk #2 grep/egrep/fgrep (maybe add more to this one….)
Lecture 5 Additional useful commands COP 3353 Introduction to UNIX 1.
Unix Talk #2 (sed).
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
CSCI The UNIX System Regular Expressions
Lecture 5 Additional useful commands COP 3353 Introduction to UNIX 1.
Presentation transcript:

7 Searching and Regular Expressions (Regex) Mauro Jaskelioff

Introduction Shell metacharacters –What are they? –Why they are not the same as regular expressions! More about regular expressions –Searching file contents using: grep egrep fgrep

Shell Metacharacters

Special characters are characters that have some meaning to the shell Also known as metacharacters They are interpreted by the shell for expansion unless they are quoted or escaped (more on this later) E.g.: $ file../* (gives the file type for all files in the directory one level up)

Filename Expansion The * metacharacter matches multiple files. It means any string of zero or more characters. Eg.: –*.txt matches any filename ending in.txt –myfile.* matches all files with a prefix of myfile and any suffix –*.* matches files with any prefix and suffix –* matches all files –UST/* matches all files in the UST directory –.* matches all hidden files –*ology matches all filenames with ology at the end (or a filename of just ology ☺ )

Filename Expansion (2) The previous example: $ file../* 1.The shell expands the metacharacters in the command line $ file../file1../file2 /file3 2.The command is executed. Commands don’t interpret shell metacharacters The interpretation is done by the shell

Other Filename Metacharacters ? matches any single character [abc…] matches any of the enclosed characters. A hyphen can be used to specify a range, e.g. a-z [!abc…] matches any character not enclosed

Command substitution The shell also supports substituting the output of a command $ ls –l `cat filenames` The command should be enclosed in backquotes (`) ~]$ cat filenames temp temp2 ~]$ ls -l `cat filenames` -rw-r--r-- 1 zlizmj Domain U 6 Mar 21 03:00 temp -rw-r--r-- 1 zlizmj Domain U 567 Mar 30 11:14 temp2 ~]$ ls -l temp temp2 -rw-r--r-- 1 zlizmj Domain U 6 Mar 21 03:00 temp -rw-r--r-- 1 zlizmj Domain U 567 Mar 30 11:14 temp2 ~]$

Avoiding Shell Expansion What happens if we actually want to pass a metacharacter to the command? (i.e. we don’t want the shell to interpret it as a metacharacter) For example, me may have a file named temp* The character needs to be quoted or escaped –We can quote an argument with single quotes (’) or with double quotes (”) –We escape characters with the backslash character (\)

Single or Double Quotes? ″ –everything between ″ and ″ is taken literally, except for: $ - variable substitution will occur ` - command substitution will occur ″ - marks the end of the double quote ’ – doesn’t have special meaning ′ –everything between ′ and ′ is taken literally except for another ′. –You cannot embed another ′ within such a quoted string (unless you escape it)

Escaping a Character The character following a backslash \ is taken literally. $ echo I\’m Mauro I’m Mauro $ Use \ within ″ ″ or ’ ’ to escape ″, $, and ′ when necessary. How to escape \?

Regular Expressions

Also called regex For describing a set of strings using a pattern –Follows a set of rules –Used for finding occurrences of strings in files Contain normal characters mixed with special characters (called metacharacters) These metacharacters are NOT the same as shell metacharacters which are used for filename expansion!

Regular Expressions Regular Expressions must be put inside quotes otherwise the shell will interpret metacharacters for filename expansion E.g.: –grep ‘[Ff]red’ myfile.txt –Searches the file myfile.txt for lines containing either Fred or fred

Fixed Patterns vs. Regular Expressions To search a file for the word computer: –grep computer myfile.txt –Will only match the word computer –A fixed pattern not a regular expression Supposing we want to find occurrences (including potential misspellings) of: –computer, computor, Computer, Computor, Computers, and so on… –grep ‘[cC]omput[eo]rs*’ myfile.txt –Uses a regular expression

Three versions of grep grep: supports for the most common metacharacters. egrep: (extended grep) supports extended set of metacharacters. It’s more expressive but may be slower. fgrep: (fast grep) doesn’t support metacharacters. It’s less expressive but faster.

Regex Metacharacters.Matches any single character except newlinec.t matches cat, cbt, cct … [ ]Matches one character between [ and ][abc] matches a, b or c -Indicates a rangea-z matches all characters from a to z *Matches zero or more occurrences of the preceding character 12* matches 1, 12, 122, 1222 … +Matches one or more occurrences of the preceding character. NOTE: for use with egrep 12+ matches 12, 122, 1222 … ?Matches zero or one occurrence of the preceding character. NOTE: for use with egrep 12? matches 1 and 12 \Treats the next character literally\* will match the character * and NOT the metacharacter * ^Matches the start of the line^Fred will match only lines that have the word Fred at the start of the line $Matches the end of the lineFred$ will match only lines that have the word Fred at the end of the line

grep Revisited Used to search a file for a pattern (remember STDIN, STDOUT, etc. are also treated as files in UNIX) cat myfile.txt | grep “chocolate” who | grep zlizmj grep ‘pingu’ penguinNames.txt grep ‘[Ww]ib*le’ wobble.txt

egrep Extended grep. Slower but greater functionality Includes additional metacharacters, e.g.: –+ matches one of more of it’s preceding character. E.g. abc+ means abc, abcc, abccc, … –? matches zero or one of it’s preceding character. E.g. abc? means ab or abc –| an alternative. E.g. A | B means A or B

egrep Example egrep ‘(bio|geo)logy’ subjects.txt –will search the file subjects.txt for all lines that contain the words biology or geology

fgrep Fast grep Does not use regular expressions –Used for matching an exact string, not a pattern –$, *, [, ^, |, (, ), and \ are interpreted literally –(but still have special meaning to the shell) –Enclose entire string in quotes

Summary The shell performs filename expansion and command substitution. Shell metacharacters are not the same as regular expressions! Regular expressions allow us to search for a pattern in a file Commands used for searching: –grep –egrep –fgrep (does not use regular expressions)