Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming for WWW (ICE 1338) Lecture #4 Lecture #4 July 2, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

Similar presentations


Presentation on theme: "Programming for WWW (ICE 1338) Lecture #4 Lecture #4 July 2, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT."— Presentation transcript:

1 Programming for WWW (ICE 1338) Lecture #4 Lecture #4 July 2, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT. icu.ac.kr

2 July 2, 2004 2 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Announcements Our TA Our TA Name: Mr. Trinh Minh Cuong Name: Mr. Trinh Minh Cuong Email: minhcuong.AT. icu.ac.kr Email: minhcuong.AT. icu.ac.kr Office: F641 Office: F641 Office Hours: Tuesday 11-12PM, Thursday 2-4PM Office Hours: Tuesday 11-12PM, Thursday 2-4PM Please send the instructor your team information Please send the instructor your team information Please send the instructor your information for creating a Unix account Please send the instructor your information for creating a Unix account Submit your homework#1 (a URL or HTML source) by tomorrow Submit your homework#1 (a URL or HTML source) by tomorrow

3 July 2, 2004 3 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Review of the Previous Lecture Cascading Style Sheet Cascading Style Sheet Web-based Information Integration Web-based Information Integration Examples Examples Information Mediators Information Mediators Information Wrappers (Web Wrappers) Information Wrappers (Web Wrappers)

4 July 2, 2004 4 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Contents of Today’s Lecture Basic UNIX Commands Basic UNIX Commands More on Web-based Information Integration More on Web-based Information Integration JavaScript JavaScript

5 July 2, 2004 5 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University UNIX Operating System A multi-user, multi-tasking operating system A multi-user, multi-tasking operating system Developed by Ken Thompson and Dennis Ritchie at the Bell Lab in early 70’s Developed by Ken Thompson and Dennis Ritchie at the Bell Lab in early 70’s Success factors of UNIX Success factors of UNIX Written in a high-level language (C language) – improving readability and portability Written in a high-level language (C language) – improving readability and portability Support of primitives (system calls) – permitting complex programs to be built efficiently Support of primitives (system calls) – permitting complex programs to be built efficiently A hierarchical file system – easy maintenance A hierarchical file system – easy maintenance Hiding the machine architecture from the user – allowing programs to be run on different machines Hiding the machine architecture from the user – allowing programs to be run on different machines http://www.unix-systems.org/ http://www.unix-systems.org/ http://www.unix-systems.org/

6 July 2, 2004 6 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Architecture of UNIX Systems Other application programs cc Other application programs Hardware Kernel sh who a.out date we grep ed vi ld as comp cpp nroff

7 July 2, 2004 7 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Basic UNIX Shell Commands cd - Changes directories to the one named cd - Changes directories to the one named pwd - Displays the current working directory pwd - Displays the current working directory ls - Lists the contents of the current directory ls - Lists the contents of the current directory ls -l - Same as above, but it lists with more information ls -l - Same as above, but it lists with more information mkdir - Make a directory mkdir - Make a directory rmdir - Remove a directory rmdir - Remove a directory cat - Concatenate or show a files contents cat - Concatenate or show a files contents cp - Copy a file cp - Copy a file mv - Rename or move a file to a different name or directory mv - Rename or move a file to a different name or directory rm - Remove a file rm - Remove a file logout - Terminates a Unix Shell session logout - Terminates a Unix Shell session man - Access manual pages man - Access manual pages http://infohost.nmt.edu/tcc/help/unix/unix_cmd.html

8 July 2, 2004 8 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Publishing Web Pages on the Server Copy your files to the ‘public_html’ directory under your home directory in the server Copy your files to the ‘public_html’ directory under your home directory in the server Use FTP to copy your files in a local directory to the server directory Use FTP to copy your files in a local directory to the server directory ftp vega.icu.ac.kr (login with your user ID) cd public_html lcd d:\myweb put index.html (mput *.html) quit Your homepage is now accessible from Your homepage is now accessible fromhttp://vega.icu.ac.kr/~yourid

9 July 2, 2004 9 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Connections Between Web Clients and Servers A Web Browser A Web Server Listen 80 Accept A Web server is a daemon process that executes in the background waiting for some event to occur Process Return Connect Write Read

10 July 2, 2004 10 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Sockets A Web Browser A Web Server Listen 80 Accept Process Return Connect Write Read Sockets A socket is an end point for communication between two machines A socket is an association of a protocol, address and process to an end point of communication

11 July 2, 2004 11 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Accessing Web Contents from Java Programs via Sockets import java.net.*; import java.io.*; … Socket sk = new Socket(www.icu.ac.kr, 80); OutputStream os = sk.getOutputStream(); PrintWriter pw = new PrintWriter(os); pw.println("GET /index.html"); pw.println();pw.flush(); InputStream is = sk.getInputStream(); InputStreamReader ips = new InputStreamReader(is); BufferedReader in = new BufferedReader(ips); String line; while ((line=in.readLine()) != null) { System.out.println(line);} Socket Creation Write Request Read Results

12 July 2, 2004 12 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Accessing Web Contents from Java Programs via URL Connections import java.net.*; import java.io.*; … URL url = new URL(“http://www.icu.ac.kr”); URLConnection urlc = url.openConnection(); InputStream is = urlc.getInputStream(); InputStreamReader ips = new InputStreamReader(is); BufferedReader in = new BufferedReader(ips); String line; while ((line=in.readLine()) != null) { System.out.println(line);} URL Object Creation URL Connection Creation Read Results

13 July 2, 2004 13 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Java String Manipulation Methods for Result Parsing int indexOf(String str, int fromIndex) int indexOf(String str, int fromIndex) int lastIndexOf(String str, int fromIndex) int lastIndexOf(String str, int fromIndex) boolean startsWith(String prefix) boolean startsWith(String prefix) boolean endsWith(String suffix) boolean endsWith(String suffix) boolean matches(String regex) boolean matches(String regex) String[] split(String regex) String[] split(String regex) String substring(int begineIndex, int endIndex) String substring(int begineIndex, int endIndex) String toLowerCase() String toLowerCase() String toUpperCase() String toUpperCase() http://java.sun.com/j2se/1.4.2/docs/api/index.html

14 July 2, 2004 14 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Web Wrapper for Naver.com URLSummary Title

15 July 2, 2004 15 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Result Parsing Strategies Structure-based Parsing Structure-based Parsing Analyzes Web pages based on tag hierarchies Analyzes Web pages based on tag hierarchies Cannot be used for ill-formed HTML documents Cannot be used for ill-formed HTML documents Pattern-based Parsing Pattern-based Parsing Search for a unique string pattern to locate a result item Search for a unique string pattern to locate a result item Needs to identify such unique string patterns first Needs to identify such unique string patterns first

16 July 2, 2004 16 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Structure-based Result Parsing

17 July 2, 2004 17 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Pattern-based Result Parsing 1.Find out a unique pattern to locate a result item e.g., “ <font” in the Naver result pages 2.Find the prefix and suffix patterns to extract an information piece (e.g., URL, title, summary) from the result item e.g., “a href=” to extract a URL from a result line e.g., “a href=” to extract a URL from a result line

18 July 2, 2004 18 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Java Implementation of Web Wrapper public void WebWrapper(String host, String path, String query, int startIndex, int pageSize) { try { String address = "http://" + host + path + "?where=webkr" + "&query=" + query + String address = "http://" + host + path + "?where=webkr" + "&query=" + query + "&start=" + startIndex + "1" + “&display=" + pageSize; URL url = new URL(address); URL url = new URL(address); URLConnection urlc = url.openConnection(); URLConnection urlc = url.openConnection(); urlc.setRequestProperty("Accept", "*/*"); urlc.setRequestProperty("Accept", "*/*"); urlc.setRequestProperty("User-Agent", "Mozilla/4.0"); urlc.setRequestProperty("User-Agent", "Mozilla/4.0"); InputStream is = urlc.getInputStream(); InputStream is = urlc.getInputStream(); InputStreamReader ips = new InputStreamReader(is); InputStreamReader ips = new InputStreamReader(is); BufferedReader in = new BufferedReader(ips); BufferedReader in = new BufferedReader(ips); String line; String line; while ((line=in.readLine()) != null) { while ((line=in.readLine()) != null) {//System.out.println(line);// } } catch(Exception e) { e.printStackTrace(); e.printStackTrace();} } Parsing Results Query Translation

19 July 2, 2004 19 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Web Robots A Web robot is a program (agent) that collects information while following all the links on a Web page A Web robot is a program (agent) that collects information while following all the links on a Web page Web Robots = Crawlers = Spiders Web Robots = Crawlers = Spiders Web search engines use Web robots to collect and index Web documents Web search engines use Web robots to collect and index Web documents A tag to tell Web robots not to index a page: A tag to tell Web robots not to index a page: Crawling methods: Crawling methods: Breadth-first crawling Breadth-first crawling Depth-first crawling Depth-first crawling

20 July 2, 2004 20 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Breadth First Crawlers http://ibook.ics.uci.edu/Slides/39

21 July 2, 2004 21 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Depth First Crawlers http://ibook.ics.uci.edu/Slides/39

22 July 2, 2004 22 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University For each map layer displayed, get the set of place names and classify the documents based on the place names Classify documents based on the disaster types mentioned Cross-product between place names and the disaster-type categories Plot the document clusters on the map to figure out the major flooding areas An Web document collection about ‘China disasters’ Web-based Information Management Applications (Example Scenario) Identify Recurring Disaster Areas in China, e.g. Locations of Floods

23 July 2, 2004 23 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Keyword Editor Keyword Extractor Search Engines Place Name Generator Place Name Extractor Product Categories Mapping Clusters Pipelined components : Sequential connection : Pipelined connection Generate multiple sets of place names Web-based Information Management Applications (Example App. Design)

24 July 2, 2004 24 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Problems in Composing Large-scale Information Management Applications Time-consuming to explore and test a large number of options Time-consuming to explore and test a large number of options Hard to choose appropriate services for collections Hard to choose appropriate services for collections Hard to quickly substitute and test a service within a sequence of steps Hard to quickly substitute and test a service within a sequence of steps Difficulties of capturing and reusing shared patterns of information management steps Difficulties of capturing and reusing shared patterns of information management steps Difficult to record and recurrently perform information management steps Difficult to record and recurrently perform information management steps Necessity of extracting abstract patterns of information management steps and reusing them Necessity of extracting abstract patterns of information management steps and reusing them Hard to cope with dynamic aspects of Web resources Hard to cope with dynamic aspects of Web resources

25 July 2, 2004 25 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Characteristics of Large-scale Information Management Tasks Incremental development of information management steps for an abstract task goal Incremental development of information management steps for an abstract task goal Recurrent executions of the steps Recurrent executions of the steps Evolving requirements of users Evolving requirements of users Shared patterns of management steps Shared patterns of management steps Collection-based information processing Collection-based information processing Dynamic aspects of information sources and services Dynamic aspects of information sources and services Large and growing number of component services Large and growing number of component services

26 July 2, 2004 26 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Improvement Goals Significantly reduce construction time, keeping costs low Significantly reduce construction time, keeping costs low Enable very rapid construction/adaptation of new applications Enable very rapid construction/adaptation of new applications Provide static and run-time diagnostic tools, facilitating debugging and performance tuning tasks Provide static and run-time diagnostic tools, facilitating debugging and performance tuning tasks Rapid Composition and Reconfiguration of Large-scale Custom Applications

27 July 2, 2004 27 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University JavaScript The goal of JavaScript is to provide programming capability at both the client and server ends of a Web connection The goal of JavaScript is to provide programming capability at both the client and server ends of a Web connection Originally developed by Netscape, as LiveScript Originally developed by Netscape, as LiveScript Became a joint venture of Netscape and Sun in 1995, renamed JavaScript Became a joint venture of Netscape and Sun in 1995, renamed JavaScript Now standardized by the European Computer Manufacturers Association as ECMA-262 (also ISO 16262) Now standardized by the European Computer Manufacturers Association as ECMA-262 (also ISO 16262) User interactions with HTML documents in JavaScript use the event-driven model of computation User interactions with HTML documents in JavaScript use the event-driven model of computation

28 July 2, 2004 28 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University <html> ICE1338 ICE1338 <!-- <!-- p { font-size: 12pt; color: blue; background-color: yellow } p { font-size: 12pt; color: blue; background-color: yellow } h2, h3 { font-size: 16pt; color: red; font-style: oblique } h2, h3 { font-size: 16pt; color: red; font-style: oblique } --> --> function displayDate() { function displayDate() { alert("Today's date is: " + alert("Today's date is: " + new Date() + "!!"); new Date() + "!!"); } <br/> Programming for WWW Programming for WWW A Popup Window

29 July 2, 2004 29 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University JavaScript vs. Java Both share similar syntax Both share similar syntax JavaScript is a scripting language, not a programming language JavaScript is a scripting language, not a programming language JavaScript is an interpreter-based language JavaScript is an interpreter-based language JavaScript is dynamically typed JavaScript is dynamically typed JavaScript does not support class-based inheritance JavaScript does not support class-based inheritance JavaScripts are usually embedded in HTML documents JavaScripts are usually embedded in HTML documents

30 July 2, 2004 30 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University General Syntax of JavaScript Direct embedding of a JavaScript code: Direct embedding of a JavaScript code: -- JavaScript script – -- JavaScript script –</script> Indirect JavaScript specification: Indirect JavaScript specification: Identifier form: begin with a letter or underscore, followed by any number of letters, underscores, and digits Identifier form: begin with a letter or underscore, followed by any number of letters, underscores, and digits Case sensitive Case sensitive 25 reserved words, plus future reserved words 25 reserved words, plus future reserved words Comments: both // and /* … */ Comments: both // and /* … */

31 July 2, 2004 31 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Document Object Model HTML “A platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents” <html><head> My Document My Document </head><body> Header Header Paragraph Paragraph </body></html> http://www.mozilla.org/docs/dom/technote/intro/ var header = document.getElementsByTagName("H1").item(0); header.firstChild.data = "A dynamic document";

32 July 2, 2004 32 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University DOM Specification http://www.w3.org/TR/DOM-Level-2-HTML/html.html http://www.w3.org/TR/DOM-Level-2-HTML/html.html http://www.w3.org/TR/DOM-Level-2-HTML/html.html e.g., e.g.,

33 July 2, 2004 33 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University Screen Outputs The model for the browser display window is the Window object The model for the browser display window is the Window object Properties: Properties: window.document window.document window.screenLeft window.screenLeft window.screenTop window.screenTop … Methods: Methods: alert: alert: confirm confirm prompt prompt http://devedge.netscape.com/central/javascript/


Download ppt "Programming for WWW (ICE 1338) Lecture #4 Lecture #4 July 2, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT."

Similar presentations


Ads by Google