REGULAR EXPRESSIONS CHAPTER 14
REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation Patterns are strings inside single quotation marks and delimiters, usually /
REGULAR EXPRESSIONS Example pattern: $pattern = '/NC/'; The pattern is then sent with the string to a built-in PHP function.
REGULAR EXPRESSIONS preg_match(pattern, subject); returns 1 if the pattern is found, 0 if not, FALSE if there is an error in the pattern stops after the first match preg_match_all(); will find all matches
REGULAR EXPRESSIONS Create a regular expression: $pattern = '/Larry/'; Two strings to test: $author = 'Larry Ullman'; $editor = 'Rebecca Gulick'; Search for the pattern: $author_match = preg_match($pattern, $author); // $author_match is 1 $author_match = preg_match($pattern, $editor); // $author_match is 0
REGULAR EXPRESSIONS if ($author_match === false) { echo 'Error testing author name.'; } else if ($author_match === 0) { echo 'Author name does not contain Larry.'; } else { echo 'Author name contains Larry.'; }
DEFINING PATTERNS In addition to the literal characters being matched, meta-characters have other meanings: CharacterMeaning \Escape ^Beginning of a string $End of a string.Any single character except for newline |Or [ ]Beginning/ending of a class ( )Beginning/ending of a sub-pattern { }Beginning/ending of a quantifier
EXAMPLES Single character. $pattern = '/1.99/'; matches: 1.99, 1B99 or 1299 To match a period in a string, it must be escaped: $pattern = '/1\.99/'; matches: 1.99 only
EXAMPLES Specifying where characters are found: ^ at the beginning $ at the end $pattern = '/^a/'; matches any string beginning with the letter a $pattern = '/a$/'; matches any string ending with the letter a
EXAMPLES Pipe operator: | => or yes|no Grouping subpatterns: () (even|heavy) handed col(o|ou)r
CASE SENSITIVITY By default, a search is case sensitive. To perform a case-insensitive search, include a lowercase i after the closing / : $pattern = '/cat/i';
QUANTIFIERS CharacterMeaning ? 0 or 1 *0 or more +1 or more {x}Exactly x occurrences {x,y}Between x and y occurrences (inclusive) {x,}At least x occurrences Apply to the preceding character/group a{3} matches aaa a? matches 0 or 1 a's (ab)? matches 0 or 1 ab's
EXAMPLE Pattern for a five-digit number: $pattern = '/(0|1|2|3|4|5|6|7|8|9){5}/'; This will match but also will match contained within any other string. Use ^ and $ to match exactly: $pattern = '/^(0|1|2|3|4|5|6|7|8|9){5}$/';
CHARACTER CLASSES Square brackets define a set with an implied | or – for a range: [aeiou] matches a vowel (the | is implied) [a-z] matches any single lowercase letter [0-9] matches any single digit [A-Za-z] matches any character in the alphabet
CHARACTER CLASSES EXAMPLE A 3-letter, lowercase word: [a-z]{3} 5-digit zip code: ^[0-9]{5}$
CHARACTER CLASSES EXAMPLE $string = 'The product code is MBT-3461.'; preg_match('/MB[TF]/', $string); // Matches MBT and returns 1 preg_match('/[.]/', $string); // Matches. and returns 1 // Equivalent to preg_match('/\./', $string); preg_match('/[13579]/', $string); // Matches 3 and returns 1
METACHARACTERS characters which have special meanings in patterns such as / \. [ ] $ ^ ( ) – most metacharacters lose their special meanings inside a character class the exceptions: ^ (caret) negation: match any character except those in the class - (dash): represents a range of characters between the ones on either side to find a dash as part of a character class, the dash must be at the end.
METACHARACTERS Usage: $string = 'The product code is MBT-3461.'; preg_match('/MB[^FT]/', $string); // Matches nothing and returns 0 preg_match('/MBT[^^]/', $string); // Matches MBT- and returns 1 preg_match('/MBT-[1-5]/', $string); // Matches MBT-3 and returns 1 preg_match('/MBT[_*-]/', $string); // Matches MBT- and returns 1
CHARACTER CLASS SHORTCUTS ClassShortcutMeaning [0-9]\dAny digit [\f\r\t\n\v]\sAny white space [A-Za-z0-9]\wAny alphanumeric character [^0-9]\DNot a digit [^\f\r\t\n\v]\SNot white space [^A-Za-z0-9]\WNot an alphanumeric character
MORE DETAILED RESULTS Recall that the default for preg_match() is to stop and return 1 when the first occurrence of the pattern is found. To find out what, exactly, was found that matched the pattern include an optional 3 rd parameter: preg_match($pattern, $subject, $matches);
FINDING ALL MATCHES To find out how many matches were found, use preg_match_all() and include an optional 3 rd parameter: preg_match_all($pattern, $subject, $matches); This will return an integer or FALSE if there is an error in the pattern.
FINDING ALL MATCHES Additionally, preg_match_all($pattern, $subject, $matches); will put all of the found results into a multidimensional array assigned to $matches
USING MODIFIERS Pattern modifiers are parts of regular expressions, but these are placed after the ending delimiter. CharacterResult AAnchors the pattern to the beginning of the string IEnables case-insensitive mode MEnables multi-line matching SHas the period match every character including newline XIgnores most white space UPerforms as non-greedy match.
MATCHING AND REPLACING PATTERNS preg_replace($pattern, $replacement, $subject,$num) This function returns the altered (or unaltered if not found) $subject. The optional fourth parameter limits the number of replacements made.