Data Manipulation & Regex

Slides:



Advertisements
Similar presentations
Session 3BBK P1 ModuleApril 2010 : [#] Regular Expressions.
Advertisements

BBK P1 Module2010/11 : [‹#›] Regular Expressions.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
Data Manipulation & Regular Expressions CSCI 215.
Bellevue University CIS 205: Introduction to Programming Using C++ Lecture 3: Primitive Data Types.
ISBN Chapter 6 Data Types Character Strings Pattern Matching.
IS 1181 IS 118 Introduction to Development Tools Chapter 4 String Manipulation and Regular Expressions.
Form Validation CS What is form validation?  validation: ensuring that form's values are correct  some types of validation:  preventing blank.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Slide 6a-1 CHAPTER 6 Matching Patterns: Using Regular expressions to match patterns.
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
Last Updated March 2006 Slide 1 Regular Expressions.
1 Chapter 6 – Creating Web Forms and Validating User Input spring into PHP 5 by Steven Holzner Slides were developed by Jack Davis College of Information.
Strings IDIA 618 Spring 2013 Bridget M. Blodgett.
Regular Expressions Week 07 TCNJ Web 2 Jean Chu. Regular Expressions Regular Expressions are a powerful way to validate and format text strings that may.
Tutorial 14 Working with Forms and Regular Expressions.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
PHP Workshop ‹#› Data Manipulation & Regex. PHP Workshop ‹#› What..? Often in PHP we have to get data from files, or maybe through forms from a user.
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
Sys.Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 18: Regular Expressions in PHP.
PHP Controlling Program Flow Mohammed M. Hassoun 2012.
ASP.NET Programming with C# and SQL Server First Edition Chapter 5 Manipulating Strings with C#
CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Strings in PHP Working with Text in PHP Strings and String Functions Mario Peshev Technical Trainer Software University
Regular Expressions.
PHP with Regular Expressions Web Technologies Computing Science Thompson Rivers University.
What is PHP? PHP stands for PHP: Hypertext Preprocessor PHP is a server-side scripting language, like ASP PHP scripts are executed on the server PHP supports.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
Instructor: Craig Duckett Lecture 08: Thursday, October 22 nd, 2015 Patterns, Order of Evaluation, Concatenation, Substrings, Trim, Position 1 BIT275:
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Copyright © 2003 Pearson Education, Inc. Slide 6a-1 The Web Wizard’s Guide to PHP by David Lash.
Powerpoint Templates Page 1 Powerpoint Templates GROUP 8:REGULAR EXPRESSION GURU BESAR: PN. SARINA SULAIMAN CIKGU-CIKGU: 1.CIKGU NENI 2.CIKGU
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
JavaScript III ECT 270 Robin Burke. Outline Validation examples password more complex Form validation Regular expressions.
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
Copyright © 2003 Pearson Education, Inc. Slide 6a-1 The Web Wizard’s Guide to PHP by David Lash.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Computer Science I Split. Regular Expressions Classwork: Trivia questions. Share. Show (stage 1) final project. Homework: work on final project.
Regular Expressions.
Regular Expressions Copyright Doug Maxwell (
RE Tutorial.
Regular Expressions Upsorn Praphamontripong CS 1110
Regular Expressions 'RegEx'.
CS 330 Class 7 Comments on Exam Programming plan for today:
Looking for Patterns - Finding them with Regular Expressions
Lecture 19 Strings and Regular Expressions
Strings, Characters and Regular Expressions
Chapter 19 PHP Part II Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
Strings Part 1 Taken from notes by Dr. Neil Moore
PHP Programming 6th~19th, 12, 2015 Prof. YOON, Byeong Nam, PhD.
Advanced Find and Replace with Regular Expressions
PHP.
String Processing 1 MIS 3406 Department of MIS Fox School of Business
PolyAnalyst Web Report Training
Validation using Regular Expressions
CIS 136 Building Mobile Apps
REGEX.
Strings Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Presentation transcript:

Data Manipulation & Regex

What..? Often in PHP we have to get data from files, or maybe through forms from a user. Before acting on the data, we: Need to put it in the format we require. Check that the data is actually valid.

What..? To achieve this, we need to learn about PHP functions that check values, and manipulate data. Input PHP functions. Regular Expressions (Regex).

PHP Functions There are a lot of useful PHP functions to manipulate data. We’re not going to look at them all – we’re not even going to look at most of them… http://php.net/manual/en/ref.strings.php http://php.net/manual/en/ref.ctype.php http://php.net/manual/en/ref.datetime.php

Useful Functions: splitting Often we need to split data into multiple pieces based on a particular character. Use explode(). // expand user supplied date.. $input = ‘1/12/2007’; $bits = explode(‘/’,$input); // array(0=>1,1=>12,2=>2007)

Useful functions: trimming Removing excess whitespace.. Use trim() // a user supplied name.. $input = ‘ Rob ’; $name = trim($input); // ‘Rob’

Useful functions: string replace To replace all occurrences of a string in another string use str_replace() // allow user to user a number of date separators $input = ’01.12-2007’; $clean = str_replace(array(‘.’,’-’), ‘/’,$input); // 01/12/2007

Useful functions: cAsE To make a string all uppercase use strtoupper(). To make a string all uppercase use strtolower(). To make just the first letter upper case use ucfirst(). To make the first letter of each word in a string uppercase use ucwords().

Useful functions: html sanitise To make a string “safe” to output as html use htmlentities() // user entered comment $input = ’The <a> tag & ..’; $clean = htmlentities($input); // ‘The <a> tag & ..’

More complicated checks.. It is usually possible to use a combination of various built-in PHP functions to achieve what you want. However, sometimes things get more complicated. When this happens, we turn to Regular Expressions.

Regular Expressions Regular expressions are a concise (but obtuse!) way of pattern matching within a string. There are different flavours of regular expression (PERL & POSIX), but we will just look at the faster and more powerful version (PERL).

Some definitions ‘rob@example.com’ Actual data that we are going to work upon (e.g. an email address string) ‘rob@example.com’ '/^[a-z\d\._-]+@([a-z\d-]+\.)+ [a-z]{2,6}$/i‘ preg_match(), preg_replace() Definition of the string pattern (the ‘Regular Expression’). PHP functions to do something with data and regular expression.

'/^[a-z\d\._-]+@([a-z\d-]+\.)+[a-z]{2,6}$/i‘ Regular Expressions '/^[a-z\d\._-]+@([a-z\d-]+\.)+[a-z]{2,6}$/i‘ Are complicated! They are a definition of a pattern. Usually used to validate or extract data from a string.

Regex: Delimiters The regex definition is always bracketed by delimiters, usually a ‘/’: $regex = ’/php/’; Matches: ‘php’, ’I love php’ Doesn’t match: ‘PHP’ ‘I love ph’

Regex: First impressions Note how the regular expression matches anywhere in the string: the whole regular expression has to be matched, but the whole data string doesn’t have to be used. It is a case-sensitive comparison.

Regex: Case insensitive Extra switches can be added after the last delimiter. The only switch we will use is the ‘i’ switch to make comparison case insensitive: $regex = ’/php/i’; Matches: ‘php’, ’I love pHp’, ‘PHP’ Doesn’t match: ‘I love ph’

Regex: Character groups A regex is matched character-by-character. You can specify multiple options for a character using square brackets: $regex = ’/p[hu]p/’; Matches: ‘php’, ’pup’ Doesn’t match: ‘phup’, ‘pop’, ‘PHP’

Regex: Character groups You can also specify a digit or alphabetical range in square brackets: $regex = ’/p[a-z1-3]p/’; Matches: ‘php’, ’pup’, ‘pap’, ‘pop’, ‘p3p’ Doesn’t match: ‘PHP’, ‘p5p’

Regex: Predefined Classes There are a number of pre-defined classes available: \d Matches a single character that is a digit (0-9) \s Matches any whitespace character (includes tabs and line breaks) \w Matches any “word” character: alphanumeric characters plus underscore.

Regex: Predefined classes $regex = ’/p\dp/’; Matches: ‘p3p’, ’p7p’, Doesn’t match: ‘p10p’, ‘P7p’ $regex = ’/p\wp/’; Matches: ‘p3p’, ’pHp’, ’pop’ Doesn’t match: ‘phhp’

Regex: the Dot The special dot character matches anything apart from line breaks: $regex = ’/p.p/’; Matches: ‘php’, ’p&p’, ‘p(p’, ‘p3p’, ‘p$p’ Doesn’t match: ‘PHP’, ‘phhp’

Regex: Repetition There are a number of special characters that indicate the character group may be repeated: ? Zero or 1 times * Zero or more times + 1 or more times {a,b} Between a and b times

Regex: Repetition $regex = ’/ph?p/’; Matches: ‘pp’, ’php’, Doesn’t match: ‘phhp’, ‘pap’ $regex = ’/ph*p/’; Matches: ‘pp’, ’php’, ’phhhhp’ Doesn’t match: ‘pop’, ’phhohp’

Regex: Repetition $regex = ’/ph+p/’; Matches: ‘php’, ’phhhhp’, Doesn’t match: ‘pp’, ‘phyhp’ $regex = ’/ph{1,3}p/’; Matches: ‘php’, ’phhhp’ Doesn’t match: ‘pp’, ’phhhhp’

Regex: Bracketed repetition The repetition operators can be used on bracketed expressions to repeat multiple characters: $regex = ’/(php)+/’; Matches: ‘php’, ’phpphp’, ‘phpphpphp’ Doesn’t match: ‘ph’, ‘popph’ Will it match ‘phpph’?

Regex: Anchors So far, we have matched anywhere within a string (either the entire data string or part of it). We can change this behaviour by using anchors: ^ Start of the string $ End of string

Regex: Anchors With NO anchors: $regex = ’/php/’; Matches: ‘php’, ’php is great’, ‘in php we..’ Doesn’t match: ‘pop’

Regex: Anchors With start and end anchors: $regex = ’/^php$/’; Matches: ‘php’, Doesn’t match: ’php is great’, ‘in php we..’, ‘pop’

Regex: Escape special characters We have seen that characters such as ?,.,$,*,+ have a special meaning. If we want to actually use them as a literal, we need to escape them with a backslash. $regex = ’/p\.p/’; Matches: ‘p.p’ Doesn’t match: ‘php’, ‘p1p’

$emailRegex = '/^[a-z\d\._-]+@([a-z\d-]+\.)+[a-z]{2,6}$/i‘; So.. An example Lets define a regex that matches an email: $emailRegex = '/^[a-z\d\._-]+@([a-z\d-]+\.)+[a-z]{2,6}$/i‘; Matches: ‘rob@example.com’, ‘rob@subdomain.example.com’ ‘a_n_other@example.co.uk’ Doesn’t match: ‘rob@exam@ple.com’ ‘not.an.email.com’

So.. An example Starting delimiter, and start-of-string anchor /^ [a-z\d\._-]+ @ ([a-z\d-]+\.)+ [a-z]{2,6} $/i User name – allow any length of letters, numbers, dots, underscore or dashes The @ separator Domain (letters, digits or dash only). Repetition to include subdomains. com,uk,info,etc. End anchor, end delimiter, case insensitive

Phew.. So we now know how to define regular expressions. Further explanation can be found at: http://www.regular-expressions.info/ We still need to know how to use them!

Boolean Matching We can use the function preg_match() to test whether a string matches or not. // match an email $input = ‘rob@example.com’; if (preg_match($emailRegex,$input) { echo ‘Is a valid email’; } else { echo ‘NOT a valid email’; }

Pattern replacement We can use the function preg_replace() to replace any matching strings. // strip any multiple spaces $input = ‘Some comment string’; $regex = ‘/\s\s+/’; $clean = preg_replace($regex,’ ‘,$input); // ‘Some comment string’

Sub-references We’re not quite finished: we need to master the concept of sub-references. Any bracketed expression in a regular expression is regarded as a sub-reference. You use it to extract the bits of data you want from a regular expression. Easiest with an example..

Sub-reference example: I start with a date string in a particular format: $str = ’10, April 2007’; The regex that matches this is: $regex = ‘/\d+,\s\w+\s\d+/’; If I want to extract the bits of data I bracket the relevant bits: $regex = ‘/(\d+),\s(\w+)\s(\d+)/’;

Extracting data.. I then pass in an extra argument to the function preg_match(): $str = ’The date is 10, April 2007’; $regex = ‘/(\d+),\s(\w+)\s(\d+)/’; preg_match($regex,$str,$matches); // $matches[0] = ‘10, April 2007’ // $matches[1] = 10 // $matches[2] = April // $matches[3] = 2007

Back-references This technique can also be used to reference the original text during replacements with $1,$2,etc. in the replacement string: $str = ’The date is 10, April 2007’; $regex = ‘/(\d+),\s(\w+)\s(\d+)/’; $str = preg_replace($regex, ’$1-$2-$3’, $str); // $str = ’The date is 10-April-2007’

Phew Again! We now know how to define regular expressions. We now also know how to use them: matching, replacement, data extraction.