LIS651 lecture 4 regular expressions Thomas Krichel 2006-12-03.

Slides:



Advertisements
Similar presentations
LIS651 lecture 1 PHP basics, database introduction Thomas Krichel
Advertisements

LIS651 lecture 3 functions and arrays Thomas Krichel
LIS651 lecture 1 arrays functions & sessions Thomas Krichel
LIS651 lecture 3 functions & sessions Thomas Krichel
LIS651 lecture 4 regular expressions Thomas Krichel
Session 3BBK P1 ModuleApril 2010 : [#] Regular Expressions.
Perl & Regular Expressions (RegEx)
FORM VALIDATION Faheem Ahmed Khokhar. FORM VALIDATION Faheem Ahmed Khokhar.
An Introduction to Sed & Awk Presented Tues, Jan 14 th, 2003 Send any suggestions to Siobhan Quinn
BBK P1 Module2010/11 : [‹#›] Regular Expressions.
Searching using regular expressions. A regular expression is also a ‘special text string’ for describing a search pattern. Regular expressions define.
Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
Asp.NET Core Vaidation Controls. Slide 2 ASP.NET Validation Controls (Introduction) The ASP.NET validation controls can be used to validate data on the.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
Regular Expressions In ColdFusion and Studio. Definitions String - Any collection of 0 or more characters. Example: “This is a String” SubString - A segment.
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
LIS651 lecture 4 regular expressions Thomas Krichel
String Escape Sequences
Chapter 3: Introduction to C Programming Language C development environment A simple program example Characters and tokens Structure of a C program –comment.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Last Updated March 2006 Slide 1 Regular Expressions.
1 Chapter 6 – Creating Web Forms and Validating User Input spring into PHP 5 by Steven Holzner Slides were developed by Jack Davis College of Information.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
PHP Workshop ‹#› Data Manipulation & Regex. PHP Workshop ‹#› What..? Often in PHP we have to get data from files, or maybe through forms from a user.
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
LIS651 lecture 5 regular expressions & wotan use Thomas Krichel
Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Satisfy Your Technical Curiosity Regular Expressions Roy Osherove Methodology & Team System Expert Sela Group The.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Pattern Matching CSCI N321 – System and Network Administration.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
1 DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida Information Management.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
Operators and Expressions. 2 String Concatenation  The plus operator (+) is also used for arithmetic addition  The function that the + operator performs.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
© 2004 Pearson Addison-Wesley. All rights reserved August 27, 2007 Primitive Data Types ComS 207: Programming I (in Java) Iowa State University, FALL 2007.
Looking for Patterns - Finding them with Regular Expressions
Lecture 19 Strings and Regular Expressions
Regular Expressions in Perl
Primitive Data Types August 28, 2006 ComS 207: Programming I (in Java)
Regular Expressions and perl
Lecture 9 Shell Programming – Command substitution
Chapter 19 PHP Part II Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
Folks Carelli, Instructor Kutztown University
Data Manipulation & Regex
Regular Expressions and Grep
CIT 383: Administrative Scripting
CSCI The UNIX System Regular Expressions
LIS651 lecture 4 regular expressions
Presentation transcript:

LIS651 lecture 4 regular expressions Thomas Krichel

remember DOS? DOS had the * character as a wildcard. If you said DIR *.EXE It would list all the files ending with.EXE Thus the * wildcard would mean all characters except the dot Similarly, you could say DEL *.* to delete all your files

regular expression Is nothing but a fancy wildcard. There are various flavours of regular expressions. –We will be using POSIX regular expressions here. They themselves come in two flavors old-style extended We study extended here aka POSIX –Perl regular expressions are more powerful and more widely used. POSIX regular expressions are accepted by both PHP and mySQL. Details are to follow.

pattern The regular expression describes a pattern of characters. Patters are common in other circumstances. –Query: Krichel Thomas in Google –Query: "Thomas Krichel" in Google –Dates are of the form yyyy-mm-dd.

pattern matching We say that a regular expression matches the string if an instance of the pattern described by the regular expression can be found in the string. If we say matches in the string may make it a little more clearer. Sometimes people also say that the string matches the regular expression. I am confused.

metacharacters Instead of just giving the star * special meaning, in a regular expression all the following have special meaning \ ^ $. | ( ) * + { } ? [ ] Collectively, these characters are knows as metacharacters. They don't stand for themselves but they mean something else. For example DEL *.EXE does not mean: delete the file "*.EXE". It means delete anything ending with.EXE.

metacharacters We are somehow already familiar with metacharacters. –In XML < means start of an element. To use < literally, you have to use < –In PHP the "\n" does not mean backslash and then n. It means the newline character.

simple regular expressions Characters that are not metacharacters just simply mean themselves gooddoes not match inGood Beer d Bmatches inGood Beer dBdoes not match inGood Beer Beer does not match in Good Beer If there are several matches, the pattern will match at the first occurrence. omatches in Good Beer

the backslash \ quote If you want to match a metacharacter in the string, you have to quote it with the backslash a 6+ pack does not match ina 6+ pack a 6\+ packdoes match ina 6+ pack \ does not match in a \ against boozing \\ does match in a \ against boozing

other characters to be quoted Certain non-metacharacters also need to be quoted. These include some of the usual suspects –\nthe newline –\r the carriage return –\tthe tabulation character But this quoting occurs by virtue of PHP, it is not part of the regular expression. Remember Sandfords law.

anchor metacharacters ^ and $ ^ matches at the beginning of the string. $ matches at the end of the string. keeper matches in beerkeeper keeper$ matches in beerkeeper ^keeper does not match inbeerkeeper ^$matches in Note that in a double quoted-string an expression starting with $ will be replaced by the variable's string value (or nothing if the variable has not been set).

character classes We can define a character class by grouping a list of characters between [ and ] b[ie]er matches in beer b[ie]er matches in bier [Bb][ie]er matches in Bier Within a class, metacharacters need not be escaped. In the class only -, ] and ^ are metacharacters.

- in the character class Within a character class, the dash - becomes a metacharacter. You can use to give a range, according to the sequence of characters in the character set you are using. Its usually alphabetic be[a-e]rmatches inbeer be[a-e]rmatches inbecr be[a-e]rdoes not match inbefr If the dash - is the last character in the class, it is treated like an ordinary character.

] in the character class ] gives you the end of the class. But if you put it first, it is treated like an ordinary character, because having it there otherwise would create an empty class, and that would make no sense. be[],]rmatches inbe]r

^ in the character class If the caret ^ appears as the first element in the class, it negates the characters mentioned. be[^i]rmatches inbeer b[^ie]erdoes not match inbier be[^a-e]rdoes match inbefr be[e^]rmatches inbeer beer[^6-9] matchesbeer0 to beer5 Otherwise, it is an ordinary character.

standard character classes The following predefined classes exist [:alnum:] any alphanumeric characters [:digit:] any digits [:punct:] any punctuation characters [:alpha:] any alphabetic characters (letters) [:graph:] any graphic characters [:space:] any space character (blank and \n, \r) [:blank:] any blank character (space and tab) [:lower:] any lowercase character

standard character classes [:upper:] any uppercase character [:cntrl:] any control character [:print:] any printable character [:xdigit:] any character for a hex number They are locale and operating system dependent. With this discussion we leave character classes.

The period. metacharacter The period matches any character except the newline \n. The reason why the \n is not counted is historic. In olden days matching was done line by line, because the computer could not hold as much memory..does not match in ; ^.$ does not match in "\n" ^.$ matches ina

alternative operator | This acts like an or beer|wine matches in beer beer|wine matches in wine Alternatives are performed last, i.e. they take the component alternative as large as they can.

grouping with ( ) You can use ( ) to group (beer|wine) (glass|) matches in beer glass (beer|wine) (glass|) matches in wine glass (beer|wine) (glass|) matches in beer (beer|wine) (glass|) matches in wine (beer|wine) (glass(es|)|) matches in beer glasses Yes, groups can be nested.

repetition operators * means zero or more times what preceeds it. + means one or more times what preceeds it. ? means zero or one time what preceeds it. The shortest preceding expression is used, i.e. either a single character or a group. (beer )* matches in (beer )? matches in (beer )+ matches in beer beer beer be+rmatches in beer be+rdoes not match inbebe

enumeration We can use {min,max} to give a minimum min and a maximum max. min and max are positive integers. be{1,3}r matches inber be{1,3}r matches inbeer be{1,3}r matches inbeeer be{1,3}r does not matches inbeeeer ? is just a shorthand for {0,1} + is just a shorthand for {1,} * is just a shorthand for {0,}

examples US zip code ^[0-9]{5}(-[0-9]{4})?$ something like a current date in ISO form ^(20[0-9]{2})-(0[1-9]|1[0-2])-([1-2][0-9]|3[01])$ Something like a Palmer School course code (DIS[89])|(LIS[5-9]))[0-9]{2} Something like an XML tag

not using posix regular expressions Do not use regular expressions when you want to accomplish a simple for which there is a special PHP function already available. A special PHP function will usually do the specialized task easier. Parsing and understanding the regular expression takes the machine time.

ereg() ereg(regex, string) searches for the pattern described in regex within the string string. It returns false if no match was found. If you call the function as ereg(regex, string, matches) the matches will be stored in the array matches. Thus matches will be a numeric array of the grouped parts (something in ()) of the string in the string. The first group match will be $matches[1].

ereg_replace ereg_replace ( regex, replacement, string ) searches for the pattern described in regex within the string string and replaces occurrences with replacement. It returns the replaced string. If replacement contains expressions of the form \\number, where number is an integer between 1 and 9, the number sub- expression is used. $better_order=ereg_replace('glass of (Karlsberg|Bruch)', 'pitcher of \\1',$order);

split() split(regex, string, [max]) splits the string string at the occurrences of the pattern described by the regular expression regex. It returns an array. The matched pattern is not included. If the optional argument max is given, it means the maximum number of elements in the returned array. The last element then contains the unsplit rest of the string string. Use explode() if you are not splitting at a regular expression pattern. It is faster.

case-insensitive function eregi() does the same as ereg() but work case-insensitively. eregi_replace() does the same as ereg_replace() but work case-insensitively. spliti() does the same as split() but work case-insensitively.

regular expressions in mySQL You can use POSIX regular expressions in mySQL in the SELECT command SELECT … WHERE REGEXP regex where regex is a regular expression.

Thank you for your attention! Please switch off machines b4 leaving!