Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Perl with Applications in Web Page Scraping.

Similar presentations


Presentation on theme: "An Introduction to Perl with Applications in Web Page Scraping."— Presentation transcript:

1 An Introduction to Perl with Applications in Web Page Scraping

2 What is Perl? Practical Extraction and Report Language High Level General purpose Interpreted, dynamic programming language Borrows from Unix shell scripting languages Ideal for “small” tasks which involve text processing

3 What is going to be taught during this workshop? Most of this presentation takes from the www.perl.com introductionwww.perl.com Perl language constructs  Variables  Flow control  String processing  File I/O  Subroutines  Object oriented Perl Application: Web page scraping

4 Hello World > perl -e 'print "hello world\n"' hello world > perl -e 'print "hello ", "world\n"' hello world > perl -e "print 'hello ', 'world\n'" hello world\n>

5 Scalars Single things  Number  String $fruitCount=5; $fruitType='apples'; $countReport = "> There are $fruitCount $fruitType"; print $count_report; > There are 5 apples

6 Scalars continued $a = "8"; $b = $a + "1"; print “> $b\n”; > 9 $c = $a. "1"; print “> $c\n” > 81

7 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Even more scalar examples* $a = 5; $a++; # $a is now 6; we added 1 to it. $a += 10; # Now it's 16; we added 10. $a /= 2; # And divided it by 2, so it's 8.

8 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Arrays Lists of scalars @months = ("July", "August", "September"); print $months[0]; #This prints "July". $months[2] = "Smarch"; If an array doesn't exist you'll create it when you try to assign a value to one of its elements. $winterMonths[0] = "December"; #This implicitly #creates @winterMonths.

9 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Arrays continued If you want to find the last index of an array, use: print “> $#months\n”; > 2 If the array is empty or doesn't exist, -1 is returned You can also resize a list $#months=0 #Now months only contains “July”

10 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Hashes Map a key to a value %daysInMonth = ( "July" => 31, "August" => 31, "September" => 30 ); print “> $daysInMonth{'September'}\n”; > 30 To add a new key and value, $daysInMonth{"February"} = 28;

11 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Hashed continued Getting the key values print “>”. keys(%daysInMonth). “\n”; > 3

12 For loops print “> “; for ($i=0; $i <= 5; $i++) ‏ { print “I can count to $i\n”; } print “\n”; > 0 1 2 3 4 5

13 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. For loops Iterating over a list print “> “; for $i (5, 4, 3, 2, 1) { print "$i "; } print “\n”; > 5 4 3 2 1

14 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. For loops continued @one_to_ten = (1.. 10); $top_limit = 25; for $i (@one_to_ten, 15, 20.. $top_limit) { print "$i\n"; }

15 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. One more for loop for $marx ('Groucho', 'Harpo', 'Zeppo', 'Karl') { print "> $marx is my favorite Marx brother.\n"; } > Groucho is my favorite Marx brother. > Harpo is my favorite Marx brother. > Zeppo is my favorite Marx brother. > Karl is my favorite Marx brother.

16 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. While loop my $count = 0; print “> “; while ($count != 3) { $count++; print "$count "; } print “\n”; > 1 2 3

17 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Until loop $count=3; print “> “; until ($count == 0) { $count--; print "$count "; } print “\n”; > 2 1 0

18 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. if/elsif/else if ($a == 5) { print "It's five!\n"; } elsif ($a == 6) { print "It's six!\n"; } else { print "It's something else.\n"; }

19 *Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Unless unless ($pie eq 'apple') { print "Ew, I don't like $pie flavored pie.\n"; } else { print "Apple! My favorite!\n"; }

20 Comparing unless and if print "I'm burning the 7 pm oil\n" unless $day eq 'Friday'; print “I'm burning the 7pm oil\n” if not ($day eq 'Friday');

21 String operations $yes_no = 'no'; print “> affirmative\n” if $yes_no == 'yes'; > affirmative Strings are automatically converted to numbers for operations like '==' Use eq instead of == for this to work correctly

22 More string comparisons my $five = 5; print "> Numeric equality!\n" if $five == " 5 "; print "> String equality!\n" if $five eq "5"; > Numeric equality > String equality print "> No string equality!\n" if not($five eq " 5"); > No string equality

23 substr $greeting = "Welcome to Perl!\n"; print “> “.substr($greeting, 0, 7).”\n”; > Welcome print “> “, substr($greeting, 7) ”\n”; > to Perl! print “> “, substr($greeting, -6, 6), “>”; > Perl! >

24 substr continued my $greeting = "Welcome to Java!\n"; substr($greeting, 11, 4) = 'Perl'; # $greeting is now "Welcome to Perl!\n"; substr($greeting, 7, 3) = ''; #... "Welcome Perl!\n"; substr($greeting, 0, 0) = 'Hello. '; #... "Hello. Welcome Perl!\n";

25 split my $greeting = "Hello. Welcome Perl!\n"; my @words = split(/ /, $greeting); # Three items: "Hello.", "Welcome", "Perl!\n" my $greeting = "Hello. Welcome Perl!\n"; my @words = split(/ /, $greeting, 2); # Two items: "Hello.", "Welcome Perl!\n";

26 join my @words = ("Hello.", "Welcome", "Perl!\n"); my $greeting = join(' ', @words); # "Hello. Welcome Perl!\n"; my $andy_greeting = join(' and ', @words); # "Hello. and Welcome and Perl!\n"; my $jam_greeting = join('', @words); # "Hello.WelcomePerl!\n";

27 Reading from a file This is a test test.txt

28 Reading from a file continued open my $testfile, 'test.txt' or die "I couldn't get at log.txt: $!"; while ($line= ){ print “> “, $line; } > This > is > a > test

29 chomp open my $testfile, 'test.txt' or die "I couldn't get at log.txt: $!"; print “> “; while (chomp($line= )){ print “$line “; } print “\n”; > This is a test

30 Writing to a file open my $overwrite, '>', 'overwrite.txt' or die "error trying to overwrite: $!"; # Wave goodbye to the original contents. open my $append, '>>', 'append.txt' or die "error trying to append: $!"; # Original contents still there; add to the end of the file

31 Subroutines sub multiply{ my (@ops) = @_; my $ret = 1; for $val (@ops) { $ret *= $val; } return $ret; } print "> ",multiply(2.. 5), "\n"; > 120

32 Programming with objects An objects is a programmer defined data structure which encapsulates  Data  Behavior (methods) ‏ A web browser object may have  Data The current page A history of recently visited URL  Behavior Can navigate to a page Can display a page

33 An Application: Scraping Web Pages

34 References Beginners introduction to Perl http://www.perl.com/pub/a/2000/10/begperl1.html http://www.perl.com/pub/a/2000/10/begperl1.html Perl Mechanize Library Documentation http://search.cpan.org/dist/WWW-Mechanize/ http://search.cpan.org/dist/WWW-Mechanize/ Schwartz, R.L and Phoeniz, T., Lerning Perl, 3 rd Edition, November 1993.


Download ppt "An Introduction to Perl with Applications in Web Page Scraping."

Similar presentations


Ads by Google