Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collecting/Reorganizing Web Info Using Perl: A Tutorial

Similar presentations


Presentation on theme: "Collecting/Reorganizing Web Info Using Perl: A Tutorial"— Presentation transcript:

1 Collecting/Reorganizing Web Info Using Perl: A Tutorial
2018/11/14 1998 CS430 (Web Programming) Collecting/Reorganizing Web Info Using Perl: A Tutorial 張智星 (J.-S. Roger Jang) 清華大學 資訊系 (CS, NTHU) ... In this talk, we are going to apply two neural network controller design techniques to fuzzy controllers, and construct the so-called on-line adaptive neuro-fuzzy controllers for nonlinear control systems. We are going to use MATLAB, SIMULINK and Handle Graphics to demonstrate the concept. So you can also get a preview of some of the features of the Fuzzy Logic Toolbox, or FLT, version 2.

2 Outline Scripting Languages Perl Basics Perl for CGI Programming
2018/11/14 Outline Scripting Languages Perl Basics Perl for CGI Programming Perl for Web Client Programming Perl and ODBC Examples and Demos Link extractor Web robots Link verifier News collecting Specifically, this is the outline of the talk. Wel start from the basics, introduce the concepts of fuzzy sets and membership functions. By using fuzzy sets, we can formulate fuzzy if-then rules, which are commonly used in our daily expressions. We can use a collection of fuzzy rules to describe a system behavior; this forms the fuzzy inference system, or fuzzy controller if used in control systems. In particular, we can can apply neural networks?learning method in a fuzzy inference system. A fuzzy inference system with learning capability is called ANFIS, stands for adaptive neuro-fuzzy inference system. Actually, ANFIS is already available in the current version of FLT, but it has certain restrictions. We are going to remove some of these restrictions in the next version of FLT. Most of all, we are going to have an on-line ANFIS block for SIMULINK; this block has on-line learning capability and it ideal for on-line adaptive neuro-fuzzy control applications. We will use this block in our demos; one is inverse learning and the other is feedback linearization. 2018/11/14

3 Scripting Languages Also known as: Examples: Characteristics:
2018/11/14 Scripting Languages Also known as: Glue languages or system integration languages (in contrast to system programming languages such as C and c++) Examples: Perl, MATLAB, Python, Rexx, Tcl, VB, Unix Shell Characteristics: Typeless, interpreted, rapid turnaround during development 2018/11/14

4 2018/11/14 Scripting Languages “Scripting: Higher-Level Programming for the 21st Century” by John K. Ousterhout, Computer Magazine, March 1998 2018/11/14

5 Scripting is on the Rise
2018/11/14 Scripting is on the Rise Several factors have combined to increase the importance of scripting languages: Internet Perl, JavaScript Better scripting support Hardware/software Component frameworks ActiveX, Java Beans Casual programmers Database queries or macros for a spreadsheet Graphical user interface VB, HyperCard, Tck/Tk, MATLAB 2018/11/14

6 Criteria to Use Scripting
2018/11/14 Criteria to Use Scripting Is the application’s main task to connect preexisting components? Will the application manipulate a variety of different things? Does the application include a GUI? Does the application do a lot of string manipulation? Will the application’s functions evolve rapidly over time? Does the application need to be extensible? Affirmative answers suggest that a scripting language will work well for the application. 2018/11/14

7 What Is Perl? Perl: Why Perl’s Popular?
2018/11/14 What Is Perl? Perl: Practical Extraction and Report Language Pathologically Eclectic Rubbish Lister Example: (garbage signals from a noisy modem connection?) $html=~m#<\s*a\s+href\s*=\s*"?(.*?)"?\s*>(.*?)<\s*/\s*a\s*>#gi; Why Perl’s Popular? Free (and powerful) A large user community (from UNIX) Ports for Win32 (95 & NT) Batch files suck! Command line based Growth of the Internet 2018/11/14

8 Spirits of Perl Sharing: The Moral: Design Spirit:
2018/11/14 Spirits of Perl Sharing: Just like Linux, free BSD, and the Internet The Moral: Help people help Perl help people. There is more than one way to do it. Design Spirit: No unnecessary limits Take defaults 2018/11/14

9 Perl: Now and Past Initial Goal: Now Serves for:
2018/11/14 Perl: Now and Past Initial Goal: A glue language to tie loose ends Now Serves for: CGI programming Web robots Web link verifiers HTML syntax validation ODBC database connectivity ActiveX scripting OLE automation Windows NT admin. Registry Event log User account Socket programming Tck/Tk integration for GUI 2018/11/14

10 Portability of Perl OS support:
2018/11/14 Portability of Perl OS support: UNIX Win32 (95 & NT) Mac All available (binary and source) from CPAN (comprehensive Perl Archive Network): 2018/11/14

11 Perl for Win32 Perl PerlIS PerlScript http://www.perl.com/CPAN
2018/11/14 Perl for Win32 Perl PerlIS ISAPI extension for MS IIS PerlScript Active scritping language that play the same role as JavaScript, VBScript, ASP (Active Server Pages) 2018/11/14

12 Internet Resources Official Home Newsgroups Mailing Lists
2018/11/14 Internet Resources Official Home Newsgroups comp.lang.perl, comp.lang.perl.misc, comp.lang.perl.announce, comp.lang.perl.modules Mailing Lists Perl-for-Win32 FAQ 2018/11/14

13 Basic Perl Programs UNIX Win32 Program: To Execute: Program:
2018/11/14 Basic Perl Programs UNIX Program: #! /user/local/bin/perl # My first Perl program print “Hello World!\n”; To Execute: unix> perl hello.pl or unix> hello.pl Win32 Program: # My first Perl program print “Hello World!\n”; To Execute: dos> perl hello.pl or dos> hello.pl dos> hello 2018/11/14

14 Executable Perl File under UNIX
2018/11/14 Executable Perl File under UNIX To make a Perl file executable under UNIX: Add one line to indicate where the Perl engine is: #! /usr/local/bin/perl Make the file executable: unix> chmod +x hello.pl or unix> chmod 755 hello.pl Now you can execute the file directly: unix> hello.pl Hello World! 2018/11/14

15 File Associations under Win-NT
2018/11/14 File Associations under Win-NT To be able to invoke “hello.pl” directly: dos> assoc .pl=Perl dos> ftype Perl=c:\perl\bin\perl.exe %1 %* dos> hello.pl Hello World! To be able to invoke “hello” directly: dos> set PATHEXT=%PATHEXT%;.pl dos> hello 2018/11/14

16 On-line Local Documentation
2018/11/14 On-line Local Documentation DOS Prompt dos> perldoc perldoc dos> perldoc perlfunc dos> perldoc -f print Web Browser c:\perl\html\lib\*.html c:\perl\html\lib\site\*.html c:\perl\html\lib\site\lwp\*.html 2018/11/14

17 Perl: Interpreter or Compiler?
2018/11/14 Perl: Interpreter or Compiler? The Perl engine completely parses and compiles the program before executing any of it. Therefore It’s a compiler: The program is completely read and parsed before the first statement is executed. It’s an interpreter: No object code sits around 2018/11/14

18 Basic Data Types Four basic data types Scalars (numbers and strings)
2018/11/14 Basic Data Types Four basic data types Scalars (numbers and strings) $exchange_rate = 32.5; $univ = “Tsing Hua University”; Arrays (lists) @depts = (“EE”, “CS”, “PME”, “Econ”); Hashes (associative arrays) %chairs = (“EE” => “Ray-Sing Huang”, “CS” => “Nen-Fu Huang”, “PME” => “Jing-Tang Yang”); Functions (subroutines) $output = &subcall($input); 2018/11/14

19 Advanced Data Types Advanced data types References (pointers)
2018/11/14 Advanced Data Types Advanced data types References (pointers) $ra = \$univ; $rb = $rc = \%chairs; Multidimensional arrays or hashes Arrays of arrays Arrays of hashes Hashes of arrays Hashes of hashes 2018/11/14

20 Numbers in Perl All numbers are double-precision internally. Examples:
2018/11/14 Numbers in Perl All numbers are double-precision internally. Examples: $x = 2.5; $x = 4.5e-12; # scientific notation $x = 0377; # 377 octal, same as 255 decimal $y = 0xfe; # FE hex, same as 254 decimal $z = $x - $y; # $z = 1 $z = $x*$y; # $z = 64770 $z /= 255; # $z = $z/255 = 254 $z--; # $z = 253; same as --$z $u = $z++; # $u = 253, $z = 254; 2018/11/14

21 Functions/Operstors for Numbers
2018/11/14 Functions/Operstors for Numbers + - * / ** ++ -- abs atan2 cos exp hex int log oct rand sin sqrt srand 2018/11/14

22 Strings in Perl Examples $name = “Timmy”;
2018/11/14 Strings in Perl Examples $name = “Timmy”; $str1 = “name is $name”; # name is Timmy $str2 = ‘name is $name’; # name is $name $str3 = “name is \$name”; # name is $name $str4 = “$name is ” . (2*4) . “ years old”; $leng = length($repeat); # 15 $repeat = $name x 3; # TimmyTimmyTimmy $str = (1+2) x 4; # 3333 $str = 4 x (1+2); # 444 2018/11/14

23 Get Standard Inputs as Strings
2018/11/14 Get Standard Inputs as Strings Use <STDIN> to get strings from standard input: $a = <STDIN>; # get a line of text from STDIN chomp($a); # get rid of the newline or chomp($a = <STDIN>); # same as the above 2018/11/14

24 Get File Contents as Strings
2018/11/14 Get File Contents as Strings Use File Handle to Get Strings from Files: open(FILE, “test.txt”) || die(“Cannot open file”); $line = <FILE>; while (defined($line)) { print “$line”; } 2018/11/14

25 Comparison Operators Numeric and String Comparison Operators:
2018/11/14 Comparison Operators Numeric and String Comparison Operators: Reason: automatic conversion if (20 > 8) { … } if (20 lt 8) { … } 2018/11/14

26 Functions for Strings chomp chop chr crypt hex index lc lcfirst length
2018/11/14 Functions for Strings chomp chop chr crypt hex index lc lcfirst length oct ord pack q/STRING/ qq/STRING/ reverse rindex sprintf substr tr/// uc ucfirst y/// 2018/11/14

27 Arrays in Perl (I) Examples @depts = (“EE”, “CS”, “PME”, “Econ”);
2018/11/14 Arrays in Perl (I) Examples @depts = (“EE”, “CS”, “PME”, “Econ”); @numbers = (2..5); # same as (2,3,4,5) @new $a # 4 ($a) # EE ($a, $b) # $a=“EE”, $b=“CS” # $a = $depts[0]; # EE $a = $depts[$#new]; # Econ @depts[0,1] # swap “EE” and “CS” 2018/11/14

28 Arrays in Perl (II) No Memory Allocation Push and Pop
2018/11/14 Arrays in Perl (II) No Memory Allocation @a = (‘A’, ‘B’); $a[3] = ‘C’; = (“A”, “B”, undef, “C”) Push and Pop “D”); # same = “D”) $b = # remove the last element Shift and Unshift “E”); # same = $b = # remove the first element Reverse @b = = (“C”, undef, “B”, “A”) 2018/11/14

29 Arrays in Perl (III) Join Split Sort @a = (“A”, “B”, “C”);
2018/11/14 Arrays in Perl (III) Join @a = (“A”, “B”, “C”); $a = join(“ # $a = “A B C” Split $path = “c:\win\system;c:\dos;c\perl\bin”; @path = split(/;/, $path); = (“c:\win\system”, “c:\dos”, “c\perl\bin”); Sort @a = (“shine”, “rain”, “cloud”); @b = 2018/11/14

30 Arrays in Perl (IV) Map Grep Examples:
2018/11/14 Arrays in Perl (IV) Map @a = (“shine”, “rain”, “cloudy”); @b = map # $b = (5, 4, 6) Grep @b = grep # $b = (“cloudy”) Examples: @odd {$_*2} # odd-indexed @even {$_*2+1} # even-indexed 2018/11/14

31 Traverse an Array To Print Each Element in an Array:
2018/11/14 Traverse an Array To Print Each Element in an Array: foreach $element { print “$element\n”; } or foreach $i { print “$array[$i]\n”; print 2018/11/14

32 Get File Contents as an Array
2018/11/14 Get File Contents as an Array Use File Handle to File Contents as an Array: open(FILE, “test.txt”) || die(“Cannot open file”); @line = <FILE>; foreach $line { print “$line”; } 2018/11/14

33 Functions for Arrays pop push shift splice unshift grep join map
2018/11/14 Functions for Arrays pop push shift splice unshift grep join map qw/STRING/ reverse sort split unpack 2018/11/14

34 Hashes in Perl A hash is a set of key-value pairs, for example:
2018/11/14 Hashes in Perl A hash is a set of key-value pairs, for example: %chairs = (“EE” => “Huang”, “CS” => “Huang”, “PME” => “Yang”); # three pairs $chairs{“Chem”} = “Chou”; # add a pair delete $chairs{“PME”}; # delete a pair @allkey = keys %chairs; # three keys (in no particular order) @allvalue = values %chairs; # three values (in no particular order) $count # 3 %new = reverse %chairs; # two pairs left! 2018/11/14

35 Traverse a Hash To Traverse an Hash: foreach $key (keys %hash) {
2018/11/14 Traverse a Hash To Traverse an Hash: foreach $key (keys %hash) { print “\$hash{$key} = $hash{$key}\n”; } or (a faster version) while (($key, $value) = each %hash) { print “\$hash{$key} = $value\n”; 2018/11/14

36 If/Unless Statement (I)
2018/11/14 If/Unless Statement (I) “If” Format: If (condition1) { } elsif (condition2) { } else { ... } “Unless” Format: unless (condition) { 2018/11/14

37 If/Unless Statement (II)
2018/11/14 If/Unless Statement (II) The following are equivalent: if ($x > 0) {print “positive\n”}; print “positive\n” if ($x > 0); print “positive\n” unless ($x <= 0); $x > 0 && print “positive\n”; $x <= 0 || print “positive\n”; 2018/11/14

38 While/Until Statements
2018/11/14 While/Until Statements While (condition) { } until (condition) { do { } while (condition); } until (condition); 2018/11/14

39 For Statement The common for-loop (just like in C):
2018/11/14 For Statement The common for-loop (just like in C): for ($i = 0; $I < 10; $I++) { } For-loop for arrays: foreach $element { 2018/11/14

40 Last/Next in a Loop To get out of a loop: last (break in C)
2018/11/14 Last/Next in a Loop To get out of a loop: last (break in C) for ($i = 0; $i < 10; $i++) { last if $array[$I] < 0; } To go to the end of a loop: next (continue in C) foreach $element { next if $element > 0; ... 2018/11/14

41 Regular Expressions (I)
2018/11/14 Regular Expressions (I) Pattern Matching: /pattern/ m|pattern| Pattern Substitution s/pattern1/pattern2/modifier m#pattern1#pattern2#modifier Example: Print lines containing “href=” open(FILE, “test.htm”) || die(“Cannot open file”); = <FILE>); foreach $line { print “$line\n” if $line =~ /href=/; } 2018/11/14

42 Regular Expressions (II)
2018/11/14 Regular Expressions (II) \d: a decimal digit \D: anything except a digit \s: a whitespace (space, tab, newline, etc) \S: anything except a whitespace \n: a newline .: anything except a newline [list]: any single character in the list [^list]: any single character not in the list a*: the maximum of consecutive a’s a*?: the minimum of consecutive a’s a+: one or more of a’s a?: none or one of a 2018/11/14

43 Modifiers in Match/Substitution
2018/11/14 Modifiers in Match/Substitution Modifiers for match: gimosx Modifiers for substitution: gimosex (Mnemonic!) Meanings: g: Match/substitute globally i: Do case-insensitive match/substitution m: Treat string as multiple lines o: Compile pattern only once s: Treat string as a single line e: Substitute pattern as expression x: Use extended regular expression 2018/11/14

44 Regular Expressions (III)
2018/11/14 Regular Expressions (III) A default variable $_ is used if no operand. s/^([^ ]*) *([^ ]*)/$2 $1/; # swap first two words Greedy match: $_ = "The food is under the bar in the barn."; if ( /foo(.*)bar/ ) { print "$1\n"; # $1=“d is under the bar in the ” } Minimal match: if ( /foo(.*?)bar/ ) { print "$1\n"; # $1=“d is under the ” 2018/11/14

45 Examples of Using Regular Exp.
2018/11/14 Examples of Using Regular Exp. HTML link extraction: = <>); $html = join(' $html =~ s|<\s*xmp\s*>.*?<\s*/\s*xmp\s*>||gi; $html =~ s|<\s*script.*?>.*?<\s*/\s*script\s*>||gi; @text_link = $html =~ m|<\s*a\s+href\s*=\s*"?(.*?)"?\s*>(.*?)<\s*/\s*a\s*>|gi; # Print out the results print "$b ==> $a\n" while (($a, $b) = 0, 2)); 2018/11/14

46 Features in Perl 5 Complex data structures Object oriented programming
2018/11/14 Features in Perl 5 Complex data structures Object oriented programming Reusable modules 2018/11/14

47 What is CGI? CGI: Common Gateway Interface Examples:
2018/11/14 What is CGI? CGI: Common Gateway Interface A CGI program takes a user’s input (in an HTML fill-out form) and sends it to the server for further processing Examples: Search engines, guestbook, web-based BBS, on-line registration forms CGI Programming Languages: Perl, C, C++, VB, Pascal, batch files, shell scripts, ... 2018/11/14

48 Simplest Perl CGI Program
2018/11/14 Simplest Perl CGI Program HTML page: This is my first <a href=“/jang/cgi-bin/first.pl>CGI program</a>. first.pl print <<END_OF_HTML; Content-type: text/html <html> <body> Greeting from my first CGI program! </body> </html> END_OF_HTML 2018/11/14

49 CGI Program with Options
2018/11/14 CGI Program with Options CGI program with options: <a href=“/jang/cgi-bin/first.pl?opt=1&case=4>CGI program</a>. Equivalent at command line: perl first.pl opt=1&case=4 2018/11/14

50 CGI Program with More Options
2018/11/14 CGI Program with More Options Options are provided by form widgets: Radio buttons, text fields, checkboxes, pop-up menus, scrolling lists, etc. A form with widgets: <form method=“post” action=“/jang/cgi-bin/form.pl”> Your name: <input type=“text” name=“username”><p> Your gender: <input type=“radio” name=gender value=“M”> Male <input type=“radio” name=gender value=“F”> Female <p><input type=“submit”> </form> 2018/11/14

51 CGI Program: Parameter Passing
2018/11/14 CGI Program: Parameter Passing Options are provided by form widgets: Radio buttons, text fields, checkboxes, pop-up menus, scrolling lists, etc. Parameter string: username=R%3d+Jang&gender=M URL encoding: Convert all spaces into “+”. Concert each offending character into “%xx”, where “xx” is its ascii value in hex. For example, convert “=“ into “%3d”. 2018/11/14

52 Two Ways to Send Param. String
2018/11/14 Two Ways to Send Param. String method = POST It is sent via standard input, where the exact no. of bytes to read is in the environment variable. CONTENT_LENGTH. read(STDIN, $param_str, $ENV{‘CONTENT_LENGTH’}); method = GET It is in the environment variable “QUERY_STRING”. $param_str = $ENV{‘QUERY_STRING}; Disadvantage: limited length 2018/11/14

53 Extract Name-Value Pairs
2018/11/14 Extract Name-Value Pairs How to extrat name-value pairs from URL-encoded parameter string? $param_str = “user=R%3d+Jang&gender=M”; Perl code: @pairs = split(/&/, $stdin_string); foreach $pair { ($name, $value) = split(/=/, $pair); $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/chr(hex($1))/eg; $FORM{$name} = $value; } 2018/11/14

54 Send Back the Response (I)
2018/11/14 Send Back the Response (I) Put MIME-type at the “Content-type” line: Send back HTML: Content-type: text/html <html> . . . </html> Send back non-HTML: Content-type: image/gif GIF89a&%--- binary contents of GIF file here 2018/11/14

55 Send Back the Response (II)
2018/11/14 Send Back the Response (II) Redirect to an existing file: Location: Non-Parsed Header (NPH) Scripts: Name your CGI program something like “nph-*.pl”. 2018/11/14

56 CGI Examples Random backgrounds Environment variables dump Guestbooks
2018/11/14 CGI Examples Random backgrounds Environment variables dump Guestbooks URL verifier IP to domain name conversion Domain name to IP conversion Link extraction Simple search 2018/11/14

57 Userful CGI Environment Variables
2018/11/14 Userful CGI Environment Variables REQUESTED_METHOD: GET or POST HTTP_REFERER: URL of the form submitted PATH_INFO: Extra “path” info SERVER_NAME: Web server’s hostname or IP SERVER_PORT: Web server’s port SCRIPT_NAME: Local URL of the script being executed REMOTE_HOST: IP for the remote machine HTTP_USER_AGENT: Type of the remote browser HTTP_ACCEPT: Browser’s capability 2018/11/14

58 Collecting Web Info with Perl
2018/11/14 Collecting Web Info with Perl Two approaches: Socket library A low-level programmer’s interface that allows clients to set up a TCP/IP connection and communicate directly to servers. LWP library A set of modules for Perl 5 that encapsulate common functions for a web client, which is much cleaner and faster than using the socket library. 2018/11/14

59 LWP Library Also known as Credits: Availability: Latest Version:
2018/11/14 LWP Library Also known as libwww-perl-5 Credits: Main driving force: Gisle Aas Based on the libwww library developed for perl 4 by Roy Fielding Availability: All CPAN archives Latest Version: Version 5.31, released on April 10, 1998 2018/11/14

60 LWP Library: Resources
2018/11/14 LWP Library: Resources Homepage: Mailing list: Hypermail Archive: Books: Web Client Programming with Perl by Clinton Wong. Web Programming With Perl5 by Bill Middleton, Brian Deng, Chris Kemp. 2018/11/14

61 Examples of Using LWP Library (I)
2018/11/14 Examples of Using LWP Library (I) A simple cookbook is at C:\perl\html\lib\site\lwpcook.html Retrieveing a file: use LWP::Simple; $doc = get ' Retrieveing a file (from shell): perl -MLWP::Simple -e 'getprint " 2018/11/14

62 Examples of Using LWP Library (II)
2018/11/14 Examples of Using LWP Library (II) Removing HTML tags: use LWP::Simple; foreach (get $ARGV[0]) { s/<[^>]*>//g; print; } Formatting HTML pages: use HTML::Parse; print parse_html(get ($ARGV[0]))->format; 2018/11/14

63 Examples of Using LWP Library (III)
2018/11/14 Examples of Using LWP Library (III) Extracting links (tlink1.pl): use LWP::Simple; use HTML::Parse; use HTML::Element; $html = get $ARGV[0]; $parsed_html = HTML::Parse::parse_html($html); for $parsed_html->extract_links() }) { $link = $_->[0]; print “$link\n”; } 2018/11/14

64 Examples of Using LWP Library (IV)
2018/11/14 Examples of Using LWP Library (IV) dos> perl tlink1.pl sandbox/html/autoload.htm /jang/cgi-bin/rand_image.pl graphics/myname.jpg graphics/animgif/flare.gif graphics/course.gif graphics/research.gif ... 2018/11/14

65 Examples of Using LWP Library (V)
2018/11/14 Examples of Using LWP Library (V) Expanding relative URLs (tlink2.pl): use LWP::Simple; use HTML::Parse; use HTML::Element; use URI::URL; $html = get $ARGV[0]; $parsed_html = HTML::Parse::parse_html($html); for $parsed_html->extract_links() }) { $link = $_->[0]; $url = new URI::URL $link; $full_url = $url->abs($ARGV[0]); print “$full_url\n”; } 2018/11/14

66 Examples of Using LWP Library (VI)
2018/11/14 Examples of Using LWP Library (VI) dos> perl tlink2.pl ... 2018/11/14

67 Examples of Using LWP Library (VII)
2018/11/14 Examples of Using LWP Library (VII) Extract Specific URLs (tlink3.pl): use LWP::Simple; use HTML::Parse; use HTML::Element; use URI::URL; $html = get $ARGV[0]; $parsed_html = HTML::Parse::parse_html($html); for $parsed_html->extract_links((“a”)) }) { $link = $_->[0]; $url = new URI::URL $link; $full_url = $url->abs($ARGV[0]); print “$full_url\n”; } 2018/11/14

68 Examples of Using LWP Library (VIII)
2018/11/14 Examples of Using LWP Library (VIII) dos> perl tlink3.pl ... 2018/11/14

69 LWP Modules Listing of LWP Modules File:: Font:: HTML:: HTTP:: LWP::
2018/11/14 LWP Modules Listing of LWP Modules File:: Font:: HTML:: LWP:: MIME:: URI:: WWW:: 2018/11/14

70 Classes in LWP Modules 10 classes in LWP module: Debug IO MediaTypes
2018/11/14 Classes in LWP Modules 10 classes in LWP module: Debug IO MediaTypes MemberMixin Protocol RobotUA Simple Socket TKIO UserAgent 2018/11/14

71 Classes in HTTP Modules
2018/11/14 Classes in HTTP Modules 8 classes in HTTP module: Daemon Date Headers Message Negotiate Request Response Status 2018/11/14

72 Classes in HTML Modules
2018/11/14 Classes in HTML Modules 11 classes in HTML module: AsSubs Element Entities FormatPS formatText Formatter HeadParser LinkExtor Parse Parser TreeBuilder 2018/11/14

73 More Sophisticated Web Clients (I)
2018/11/14 More Sophisticated Web Clients (I) Advantages of using LWP::Simple No status code returned: 200: OK 404: Not found 408: Request time-out 500: Internal server error No identification with itself in the header No support for proxy servers 2018/11/14

74 More Sophisticated Web Clients (II)
2018/11/14 More Sophisticated Web Clients (II) Returning retrieving status (tstatus.pl): use LWP::UserAgent; use use $ua = new LWP::UserAgent; $request = new $ARGV[0]); $response = $ua->request($request); if ($response->is_success) { print $response->content; } else { print $response->error_as_HTML; } 2018/11/14

75 More Sophisticated Web Clients (III)
2018/11/14 More Sophisticated Web Clients (III) Adding proxy server support (tproxy.pl): use LWP::UserAgent; use use $ua = new LWP::UserAgent; $ua->proxy(‘http’, ‘ $ua->no_proxy(‘nthu.edu.tw’); $request = new $ARGV[0]); $response = $ua->request($request); if ($response->is_success) { print $response->content; } else { print $response->error_as_HTML; } 2018/11/14

76 More Sophisticated Web Clients (IV)
2018/11/14 More Sophisticated Web Clients (IV) Other advanced features: Get headers only Identify the client with itself Mirror 2018/11/14

77 Web Client Applications
2018/11/14 Web Client Applications Applications: Mirroring Link verifier Search engines News collecting FexEx or UPS package tracking Web statistics suck-rule-o-meter ( Guestbook automatic signer 2018/11/14

78 WWW/Databases Integration
2018/11/14 WWW/Databases Integration Perl & odbc.pm & ODBC compliant databases Perl & dbm ASP & MS databases Java & JDBC & mSQL or mySQL Perl & DBI & DBD (mSQL or MySQL or PostgreSQL) Msql & W3-mSQL Perl & gdbm.pm Perl & Sprite.pm Perl & Msqlperl & mSQL Perl & Postperl & postgreSQL 2018/11/14

79 Perl and ODBC ODBC: Designed by: Open DataBase Connectivity X/Open
2018/11/14 Perl and ODBC ODBC: Open DataBase Connectivity Designed by: X/Open SQL Access Group ANSI ISO Microsoft Digital Sybase IBM Novell Oracle Lotus and others. 2018/11/14

80 2018/11/14 Facts about ODBC The ODBC standard was designed to work on any platform and has been ported to Win32, Unix, Macintosh, OS/2 and others. ODBC has become so accepted that some vendors like IBM, Informix and Watcom have designed their DBMS native programming interface based on ODBC. 2018/11/14

81 Why Use Win32::ODBC? Easy to use Interface similar to the ODBC API
2018/11/14 Why Use Win32::ODBC? Easy to use Interface similar to the ODBC API Most ODBC functions are supported Full error reporting Object oriented model Portability Platforms Data 2018/11/14

82 How Win32::ODBC works? Perl program Win32::ODBC ODBC Driver
2018/11/14 How Win32::ODBC works? The Perl program invokes methods in Win32::ODBC module to perform SQL queries. The module then talks to the driver to insert/delete/update database entries. Perl program Win32::ODBC ODBC Driver Database File 2018/11/14

83 DSN: Data Source Name Data source name contains the following info:
2018/11/14 DSN: Data Source Name Data source name contains the following info: Database information User ID Password Connection information 2018/11/14

84 Data Source Name: User vs. System
2018/11/14 Data Source Name: User vs. System User DSN: Only accessible by the user who created it (for personal use) System DSN: Accessible by any user including the system itself (for Internet use) 2018/11/14

85 How to Install Win32::ODBC?
2018/11/14 How to Install Win32::ODBC? Assuming Perl is installed in c:\perl 1. Create the directory: c:\perl\lib\auto\win32\odbc 2. Copy ODBC.PLL into the new directory 3. Copy ODBC.PM into c:\perl\lib\wi32 2018/11/14

86 How to Use Win32::ODBC Module?
2018/11/14 How to Use Win32::ODBC Module? Load the module: use Win32::ODBC; Work flow: 1. Connect to the database 2. Submit a query 3. Process the result 4. Close the database 2018/11/14

87 Connecting to a Database (I)
2018/11/14 Connecting to a Database (I) Make a new cnonection to a DSN: $DSN = “Address”; $db = new Win32::ODBC($DSN); You can specify user ID and passwords: $DSN = “DSN=Address;UID=Jang;PWD=my_passwd”; 2018/11/14

88 Connecting to a Database (II)
2018/11/14 Connecting to a Database (II) If the connection succeeds, the result will be an object otherwise it will be undef: if (! $db = new Win32:;ODBC($DSN)) { … process error … } 2018/11/14

89 Submitting a Query Use the sql() method to submit a query:
2018/11/14 Submitting a Query Use the sql() method to submit a query: (Note that sql() returns undef if the query is successful!) $sql = “SELECT * FROM table1”; if ($db->sql($sql)) { … process error … } 2018/11/14

90 Processing Results (I)
2018/11/14 Processing Results (I) Use the FetchRow() method to retrieve a row from a data set: (FetchRow() returns a 1 if a row is successfully retrieved.) while ($db->FetchRow()) { … process result … } 2018/11/14

91 Processing Results (II)
2018/11/14 Processing Results (II) Once a row has been fetched, you need to extract data with the DataHash(): (FetchRow() returns a 1 if a row is successfully retrieved.) %Data = $db->DataHash(); or %Data = $db->DataHash(“Name”, “Position”); 2018/11/14

92 2018/11/14 Processing Errors (I) A call to Win32::ODBC::Error() will return the last error that occurred regardless of what connection generated it: $Error = Win32::ODBC::Error(); You can also use Error() method to get an error message of the current connection: print “Error: “ . $db->Error(); 2018/11/14

93 Processing Errors (II)
2018/11/14 Processing Errors (II) The Error() method returns either an array or a string depending on the context of the return. In an array context: @Error = $db->Error(); @Error now contains (1) ODBC error number (2) Tagged text (3) Connection number (4) SQL state In a string context $Error = $db->Error(); $Error now contains “[ErrorNo] [Connection] [SQLState] [Text]”. 2018/11/14

94 Common Gotcha’s (I) Escape the apostrophe (‘): Wrong: Right:
2018/11/14 Common Gotcha’s (I) Escape the apostrophe (‘): Wrong: $sql = “select * from t1 where NAME = ‘Roger’s fish’”; Right: $sql = “select * from t1 where NAME = ‘Roger’’s fish’”; 2018/11/14

95 Common Gotcha’s (II) Escape the vertical bar (|):
2018/11/14 Common Gotcha’s (II) Escape the vertical bar (|): (This is much more tricker …) Wrong: $sql = “select * from t1 where NAME = ‘P|Q’”; Right: where NAME = ‘P’ & chr(124) & ‘Q’”; 2018/11/14

96 Use with a CGI Script Use system DSN. Give proper access to database
2018/11/14 Use with a CGI Script Use system DSN. Give proper access to database Give proper permission on files Choose suitable databases MS Access for low-hit pages MS SQL server for industry-strength setup 2018/11/14

97 Applications of Perl+Database
2018/11/14 Applications of Perl+Database News collecting On-line shopping WWW-based BBS Erasable guestbooks On-line exams 2018/11/14

98 More Info on Win32::ODBC Win32::ODBC homepage: Win32::ODBC FAQ:
2018/11/14 More Info on Win32::ODBC Win32::ODBC homepage: Win32::ODBC FAQ: Roth Consulting: 2018/11/14

99 What’s Missing? (I) Bar chart Pie chart
2018/11/14 What’s Missing? (I) Data Analysis and Visualization (GD library) Bar chart Pie chart 2018/11/14

100 What’s Missing? (II) Flowchart Dynamical rep. Dynamical Visualization
2018/11/14 What’s Missing? (II) Dynamical Visualization Flowchart Dynamical rep. Start Get new x Evaluate y = f(x) y > 0? yes no Stop 2018/11/14

101 For More Information ... The Perl Journal The Perl Institute
2018/11/14 For More Information ... The Perl Journal The Perl Institute The Apache/Perl Integration Project The latest version of my Perl slides: My CGI/Perl page: 2018/11/14


Download ppt "Collecting/Reorganizing Web Info Using Perl: A Tutorial"

Similar presentations


Ads by Google