7.1 Some Eclipse Tips Try Ctrl+Shift+L Quick help (keyboard shortcuts) Try Ctrl+SPACE Auto-complete Source→Format ( Ctrl+Shift+F ) Correct indentation.

Slides:



Advertisements
Similar presentations
Computer Science & Engineering 2111 Text Functions 1CSE 2111 Lecture-Text Functions.
Advertisements

Regular Expressions Software Tools. Slide 2 What is a Regular Expression? A regular expression is a pattern to be matched against a string. For example,
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
6.1 Pattern Matching. 6.2 We often want to find a certain piece of information within the file: Pattern matching 1.Find all names that end with “man”
Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
6ex.1 Pattern Matching. 6ex.2 We often want to find a certain piece of information within the file: Pattern matching 1.Find all names that end with “man”
COS 381 Day 19. Agenda  Assignment 5 Posted Due April 7  Exam 3 which was originally scheduled for Apr 4 is going to on April 13 XML & Perl (Chap 8-10)
7.1 Last time on: Pattern Matching. 7.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");
Regular Expressions Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string. if( $str =~ /hello/){
8.1 Last time on: Pattern Matching. 8.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
6.1 Short foreach revision. 6.2 $arr[2]$arr[1]$arr[3]$arr[4] Loops: foreach The foreach loop passes through all the elements of an array = (2,3,4,5,6);
More Regular Expressions. List/Scalar Context for m// Last week, we said that m// returns ‘true’ or ‘false’ in scalar context. (really, 1 or 0). In list.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
7.1 Last time on: Pattern Matching. 7.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
6b.1 Pattern Matching. 6b.2 We often want to find a certain piece of information within the file, for example: Pattern matching 1.Find all names that.
Regular Expressions Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string. if( $str =~ /hello/){
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
An Introduction to Textual Programming
Regular Expressions Week 07 TCNJ Web 2 Jean Chu. Regular Expressions Regular Expressions are a powerful way to validate and format text strings that may.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Regular Expressions Dr. Ralph D. Westfall May, 2011.
System Programming Regular Expressions Regular Expressions
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
CMSC 330: Organization of Programming Languages Theory of Regular Expressions.
CS 403: Programming Languages Fall 2004 Department of Computer Science University of Alabama Joel Jones.
Programming Languages Meeting 13 December 2/3, 2014.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl.
By: Andrew Cory. Grouping Things & Hierarchical Matching Grouping characters – ( and ) Allows parts of a regular expression to be treated as a single.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Strings, output, quotes and comments
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
20-753: Fundamentals of Web Programming 1 Lecture 10: Server-Side Scripting II Fundamentals of Web Programming Lecture 10: Server-Side Scripting II.
6.1 Before we start ( צילום : איתן שור ) Let’s talk a bit about the last exercise, and Eclipse…
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
More about Strings. String Formatting  So far we have used comma separators to print messages  This is fine until our messages become quite complex:
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
CMSC330 More Ruby. Last lecture Scripting languages Ruby language –Implicit variable declarations –Many control statements –Classes & objects –Strings.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Regular Expressions Upsorn Praphamontripong CS 1110
CS 330 Class 7 Comments on Exam Programming plan for today:
Looking for Patterns - Finding them with Regular Expressions
Lecture 9 Shell Programming – Command substitution
Unit 3: Variables in Java
Regular Expression: Pattern Matching
REGEX.
Presentation transcript:

7.1 Some Eclipse Tips Try Ctrl+Shift+L Quick help (keyboard shortcuts) Try Ctrl+SPACE Auto-complete Source→Format ( Ctrl+Shift+F ) Correct indentation You can maximize a single view of Eclipse. A note about running scripts over and over again… Debug Debug & Debug!!! Break points... The (default) location of your files are: Home: D:\eclipse\perl_ex Computer class: C:\eclipse\perl_ex

7.2 Last time on: Pattern Matching

7.3 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash and not back-slash ( \ ) Will be true for “hello” and for “the cat” but not for “good bye” or “Hercules”. You can ignore case of letters by adding an “ i ” after the pattern: m/he/i (matches for “hello”, “Hello” and “hEHD”) There is a negative form of the match operator: if ($line !~ m/he/)... Pattern matching

7.4 Replacing a sub string (substitute): $line = "the cat on the tree"; $line =~ s/he/hat/; $line will be turned to “ that cat on the tree ” To Replace all occurrences of a sub string add a “ g ” (for “globally”): $line = "the cat on the tree"; $line =~ s/he/hat/g; $line will be turned to “ that cat on that tree ” Pattern matching

7.5 m/./ Matches any character except “\n” You can also ask for one of a group of characters: m/[abc]/ Matches “a” or “b” or “c” m/[a-z]/ Matches any lower case letter m/[a-zA-Z]/ Matches any letter m/[a-zA-Z0-9]/ Matches any letter or digit m/[a-zA-Z0-9_]/ Matches any letter or digit or an underscore m/[^abc]/ Matches any character except “a” or “b” or “c” m/[^0-9]/ Matches any character except a digit Single-character patterns

7.6 Perl provides predefined character classes: \d a digit (same as: [0-9] ) \w a “word” character (same as: [a-zA-Z0-9_] ) \s a space character (same as: [ \t\n\r\f] ) Single-character patterns And their negatives: \D anything but a digit \W anything but a word char \S anything but a space char

7.7 1.Write the following regular expressions. Test them with a script that reads a line from STDIN and prints "yes" if it matches and "no" if not. a)Match a name containing a capital letter followed by three lower case letters b)Replace every digit in the line with a #, and print the result c)Match "is" in either small or capital letters d*)Remove all such appearances of "is" from the line, and print it Reminder: last class exercise

7.8 This week: More Pattern Matching

7.9 Generally – use {} for a certain number of repetitions, or a range: m/ab{3}c/ Matches “ abbbc ” m/ab{3,6}c/ Matches “ a ”, 3-6 times “ b ” and then “ c ” ? means zero or one repetitions: m/ab?c/ Matches “ ac ” or “ abc ” + means one or more repetitions: m/ab+c/ Matches “ abc ” ; “ abbbbc ” but not “ ac ” A pattern followed by * means zero or more repetitions of that patern: m/ab*c/ Matches “ abc ” ; “ ac ” ; “ abbbbc ” Use parentheses to mark more than one character for repetition: m/h(el)*lo/ Matches “ hello ” ; “ hlo ” ; “ helelello ” Repetitive patterns

7.10 To force the pattern to be at the beginning of the string add a “^”: m/^>/ Matches only strings that begin with a “ > ” “$” forces the end of string: m/\.pl$/ Matches only strings that end with a “.pl ” And together: m/^\s*$/ Matches all lines that do not contain any non-space characters Enforce line start/end

7.11 m/\d+(\.\d+)?/ Matches numbers that may contain a decimal point: “ 10 ”; “ 3.0 ”; “ 4.75 ” … m/^NM_\d+/ Matches Genbank RefSeq accessions like “ NM_ ” m/^\s*CDS\s+\d+\.\.\d+/ Matches annotation of a coding sequence in a Genbank DNA/RNA record: “ CDS ” m/^\s*CDS\s+(complement\()?\d+\.\.\d+\)?/ Allows also a CDS on the minus strand of the DNA: “ CDS complement( ) ” Some examples Note: We could just use m/^\s*CDS/ - it is a question of the strictness of the format. Sometimes we want to make sure.

7.12 RegEx Coach An easy to use tool for testing regular expressions:

Write the following regular expressions. Test them with a script that reads a line and prints "yes" if it matches and "no" if not. a)Match a name beginning with a capital letter followed by any number of lower case letters. b)Match a string that matches a phone number in Tel-aviv 03- followed by 6 or 7 digits. such as: c)Match a string that matches a cell phone number in 05 followed by 0 or 2 or 4 and then 7 digits. such as: d*)Match an hour no later than 19:59 in 24h format such as: 09:15 and 19:42. Class exercise 7a

7.14 We can extract parts of the pattern by parentheses: $line = "1.35"; if ($line =~ m/(\d+)\.(\d+)/ ) { print "$1\n"; 1 print "$2\n"; 35 } Extracting part of a pattern

7.15 We can extract parts of the string that matched parts of the pattern that are marked by parentheses: $line = " CDS "; if ($line =~ m/CDS\s+(\d+)\.\.(\d+)/ ) { print "regexp:$1,$2\n";regexp:4815,5888. $start = $1; $end = $2; } Extracting part of a pattern

7.16 Usually, we want to scan all lines of a file, and find lines with a specific pattern. E.g.: foreach $line { if ($line =~ m/CDS\s+(\d+)\.\.(\d+)/ ) { $start = $1; $end = $2; } } Finding a pattern in an input file

7.17 We can extract parts of the string that matched parts of the pattern that are marked by parentheses: $line = " CDS "; if ($line =~ m/CDS\s+(complement\()?((\d+)\.\.(\d+))\)?/ ) { print "regexp:$1,$2,$3,$4.\n"; $start = $3; $end = $4; } Use of uninitialized value in concatenation... regexp:, ,4815,5888. Extracting part of a pattern

7.18 Class exercise 7b 1.Write the following regular expressions. Test them with a script that reads a line and prints "yes" if it matches and "no" if not. a)Match a first name followed by a last name, and print the last name b)Match a FASTA header line and print the whole line except for the “ > ” c)As in the previous question, but print the header only until the first white space

7.19 Class exercise 7c Write a script that extracts and prints the following features from a Genbank record of a genome (Use the example of an adenovirus genome which is available from the course site) 1. Find the JOURNAL lines and print only the page numbers 2. Find lines of protein_id in that file and extract the ids (add to your script from the previous question) 3. Find lines of coding sequence annotation (CDS) and extract the separate coordinates (get each number into a separate variable; add to previous script). Try to match all CDS lines! (This question is in home ex. 4)