Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bent Thomsen Department of Computer Science Aalborg University

Similar presentations


Presentation on theme: "Bent Thomsen Department of Computer Science Aalborg University"— Presentation transcript:

1 Bent Thomsen Department of Computer Science Aalborg University
Languages and Compilers (SProg og Oversættere) Concurrency and distribution Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to John Mitchell whose slides this lecture is based on.

2 Concurrency, distributed computing, the Internet
Traditional view: Let the OS deal with this => It is not a programming language issue! End of Lecture Wait-a-minute … Maybe “the traditional view” is getting out of date?

3 Languages with concurrency constructs
Maybe the “traditional view” was always out of date? Simula Modula3 Occam Concurrent Pascal ADA Linda CML Facile Jo-Caml Java C#

4 Categories of Concurrency:
Physical concurrency - Multiple independent processors ( multiple threads of control) Uni-processor with I/O channels (multi-programming) Multiple CPU (parallel programming) Network of uni- or multi- CPU machines (distributed programming) Logical concurrency - The appearance of physical concurrency is presented by time-sharing one processor (software can be designed as if there were multiple threads of control) Concurrency as a programming abstraction Def: A thread of control in a program is the sequence of program points reached as control flows through the program

5 Introduction Reasons to Study Concurrency
It involves a different way of designing software that can be very useful—many real-world situations involve concurrency Control programs Simulations Client/Servers Mobile computing Games 2. Computers capable of physical concurrency are now widely used High-end servers Game consoles Grid computing

6 The promise of concurrency
Speed If a task takes time t on one processor, shouldn’t it take time t/n on n processors? Availability If one process is busy, another may be ready to help Distribution Processors in different locations can collaborate to solve a problem or work together Humans do it so why can’t computers? Vision, cognition appear to be highly parallel activities

7 Challenges Concurrent programs are harder to get right
Folklore: Need an order of magnitude speedup (or more) to be worth the effort Some problems are inherently sequential Theory – circuit evaluation is P-complete Practice – many problems need coordination and communication among sub-problems Specific issues Communication – send or receive information Synchronization – wait for another process to act Atomicity – do not stop in the middle and leave a mess

8 Why is concurrent programming hard?
Nondeterminism Deterministic: two executions on the same input it always produce the same output Nondeterministic: two executions on the same input may produce different output Why does this cause difficulty? May be many possible executions of one system Hard to think of all the possibilities Hard to test program since some may occur infrequently

9 Traditional C Library for concurrency
System Calls - fork( ) - wait( ) - pipe( ) - write( ) - read( ) Examples

10 Process Creation Fork( ) NAME fork() – create a new process SYNOPSIS
# include <sys/types.h> # include <unistd.h> pid_t fork(void) RETURN VALUE success parent- child pid child- 0 failure -1

11 Fork()- program structure
#include <sys/types.h> #include <unistd.h> #include <stdio.h> Main() { pid_t pid; if((pid = fork())>0){ /* parent */ } else if ((pid==0){ /*child*/ else { /* cannot fork* exit(0);

12 Wait() system call Wait()- wait for the process whose pid reference is passed to finish executing SYNOPSIS #include<sys/types.h> #include<sys/wait.h> pid_t wait(int *stat)loc) The unsigned decimal integer process ID for which to wait RETURN VALUE success- child pid failure- -1 and errno is set

13 Wait()- program structure
#include <sys/types.h> #include <unistd.h> #include <stdlib.h> #include <stdio.h> Main(int argc, char* argv[]) { pid_t childPID; if((childPID = fork())==0){ /*child*/ } else { /* parent* wait(0); exit(0);

14 Pipe() system call Pipe()- to create a read-write pipe that may later be used to communicate with a process we’ll fork off. SYNOPSIS Int pipe(pfd) int pfd[2]; PARAMETER Pfd is an array of 2 integers, which that will be used to save the two file descriptors used to access the pipe RETURN VALUE: 0 – success; -1 – error.

15 Pipe() - structure /* first, define an array to store the two file descriptors*/ Int pipes[2]; /* now, create the pipe*/ int rc = pipe (pipes); if(rc = = -1) { /* pipe() failed*/ Perror(“pipe”); exit(1); } If the call to pipe() succeeded, a pipe will be created, pipes[0] will contain the number of its read file descriptor, and pipes[1] will contain the number of its write file descriptor.

16 Write() system call Write() – used to write data to a file or other object identified by a file descriptor. SYNOPSIS #include <sys/types.h> Size_t write(int fildes, const void * buf, size_t nbyte); PARAMETER fildes is the file descriptor, buf is the base address of area of memory that data is copied from, nbyte is the amount of data to copy RETURN VALUE The return value is the actual amount of data written, if this differs from nbyte then something has gone wrong

17 Read() system call Read() – read data from a file or other object identified by a file descriptor SYNOPSIS #include <sys/types.h> Size_t read(int fildes, void *buf, size_t nbyte); ARGUMENT fildes is the file descriptor, buf is the base address of the memory area into which the data is read, nbyte is the maximum amount of data to read. RETURN VALUE The actual amount of data read from the file. The pointer is incremented by the amount of data read.

18 Solaris 2 Synchronization
Implements a variety of locks to support multitasking, multithreading (including real-time threads), and multiprocessing. Uses adaptive mutexes for efficiency when protecting data from short code segments. Uses condition variables and readers-writers locks when longer sections of code need access to data. Uses turnstiles to order the list of threads waiting to acquire either an adaptive mutex or reader-writer lock.

19 Windows 2000 Synchronization
Uses interrupt masks to protect access to global resources on uniprocessor systems. Uses spinlocks on multiprocessor systems. Also provides dispatcher objects which may act as wither mutexes and semaphores. Dispatcher objects may also provide events. An event acts much like a condition variable.

20 Basic question Maybe the library approach is not such a good idea?
How can programming languages make concurrent and distributed programming easier?

21 Language support for concurrency
Help promote good software engineering Allowing the programmer to express solutions more closely to the problem domain No need to juggle several programming models (Hardware, OS, library, …) Make invariants and intentions more apparent (part of the interface and/or type system) Allows the compiler much more freedom to choose different implementations Base the programming language constructs on a well-understood formal model => formal reasoning may be less hard and the use tools may be possible

22 What could languages provide?
Abstract model of system abstract machine => abstract system Example high-level constructs Communication abstractions Synchronous communication Buffered asynchronous channels that preserve msg order Mutual exclusion, atomicity primitives Most concurrent languages provide some form of locking Atomicity is more complicated, less commonly provided Process as the value of an expression Pass processes to functions Create processes at the result of function call

23 Basic issue: conflict between processes
Critical section Two processes may access shared resource Inconsistent behavior if two actions are interleaved Allow only one process in critical section Deadlock Process may hold some locks while awaiting others Deadlock occurs when no process can proceed

24 Concurrency Def: A task is disjoint if it does not communicate with or affect the execution of any other task in the program in any way Task communication is necessary for synchronization Task communication can be through: 1. Shared nonlocal variables 2. Parameters 3. Message passing

25 Synchronization Kinds of synchronization: 1. Cooperation
Task A must wait for task B to complete some specific activity before task A can continue its execution e.g., the producer-consumer problem 2. Competition When two or more tasks must use some resource that cannot be simultaneously used e.g., a shared counter Competition is usually provided by mutually exclusive access (approaches are discussed later)

26 Design Issues for Concurrency:
How is cooperation synchronization provided? How is competition synchronization provided? How and when do tasks begin and end execution? Are tasks statically or dynamically created? Are there any syntactic constructs in the language? Are concurrency construct reflected in the type system?

27 Concurrent Pascal: cobegin/coend
Limited concurrency primitive Example x := 0; cobegin begin x := 1; x := x+1 end; begin x := 2; x := x+1 end; coend; print(x); execute sequential blocks in parallel x := 1 x := x+1 x := 0 print(x) x := 2 x := x+1 Atomicity at level of assignment statement

28 Mutual exclusion Sample action Problem with parallel execution
procedure sign_up(person) begin number := number + 1; list[number] := person; end; Problem with parallel execution cobegin sign_up(fred); sign_up(bill); bill fred bob fred

29 Need atomic operations to implement wait
Locks and Waiting <initialze concurrency control> cobegin begin <wait> sign_up(fred); // critical section <signal> end; sign_up(bill); // critical section Need atomic operations to implement wait

30 Mutual exclusion primitives
Atomic test-and-set Instruction atomically reads and writes some location Common hardware instruction Combine with busy-waiting loop to implement mutex Semaphore Avoid busy-waiting loop Keep queue of waiting processes Scheduler has access to semaphore; process sleeps Disable interrupts during semaphore operations OK since operations are short

31 Monitor Brinch-Hansen, Dahl, Dijkstra, Hoare
Synchronized access to private data. Combines: private data set of procedures (methods) synchronization policy At most one process may execute a monitor procedure at a time; this process is said to be in the monitor. If one process is in the monitor, any other process that calls a monitor procedure will be delayed. Modern terminology: synchronized object

32 Java Concurrency Threads Communication
Create process by creating thread object Communication shared variables method calls Mutual exclusion and synchronization Every object has a lock (inherited from class Object) synchronized methods and blocks Synchronization operations (inherited from class Object) wait : pause current thread until another thread calls notify notify : wake up waiting threads

33 Java Threads Thread Java thread objects
Set of instructions to be executed one at a time, in a specified order Java thread objects Object of class Thread Methods inherited from Thread: start : method called to spawn a new thread of control; causes VM to call run method suspend : freeze execution interrupt : freeze execution and throw exception to thread stop : forcibly cause thread to halt

34 Example subclass of Thread
class PrintMany extends Thread { private String msg; public PrintMany (String m) {msg = m;} public void run() { try { for (;;){ System.out.print(msg + “ “); sleep(10); } } catch (InterruptedException e) { return; } (inherits start from Thread)

35 Interaction between threads
Shared variables Two threads may assign/read the same variable Programmer responsibility Avoid race conditions by explicit synchronization!! Method calls Two threads may call methods on the same object Synchronization primitives Each object has internal lock, inherited from Object Synchronization primitives based on object locking

36 Synchronization example
Objects may have synchronized methods Can be used for mutual exclusion Two threads may share an object. If one calls a synchronized method, this locks object. If the other calls a synchronized method on same object, this thread blocks until object is unlocked.

37 Synchronized methods Marked by keyword Provides mutual exclusion
public synchronized void commitTransaction(…) {…} Provides mutual exclusion At most one synchronized method can be active Unsynchronized methods can still be called Programmer must be careful Not part of method signature sync method equivalent to unsync method with body consisting of a synchronized block subclass may replace a synchronized method with unsynchronized method

38 Join, another form of synchronization
Wait for thread to terminate class Future extends Thread { private int result; public void run() { result = f(…); } public int getResult() { return result;} } Future t = new future; t.start() // start new thread t.join(); x = t.getResult(); // wait and get result

39 Aspects of Java Threads
Portable since part of language Easier to use in basic libraries than C system calls Example: garbage collector is separate thread General difficulty combining serial/concur code Serial to concurrent Code for serial execution may not work in concurrent sys Concurrent to serial Code with synchronization may be inefficient in serial programs (10-20% unnecessary overhead) Abstract memory model Shared variables can be problematic on some implementations

40 C# Threads Basic thread operations
Any method can run in its own thread A thread is created by creating a Thread object Creating a thread does not start its concurrent execution; it must be requested through the Start method A thread can be made to wait for another thread to finish with Join A thread can be suspended with Sleep A thread can be terminated with Abort

41 C# Threads Synchronizing threads Evaluation The Interlock class
The lock statement The Monitor class Evaluation An advance over Java threads, e.g., any method can run its own thread Thread termination cleaner than in Java Synchronization is more sophisticated

42 Polyphonic C# An extension of the C# language with new concurrency constructs Based on the join calculus A foundational process calculus like the p-calculus but better suited to asynchronous, distributed systems A single model which works both for local concurrency (multiple threads on a single machine) distributed concurrency (asynchronous messaging over LAN or WAN) It is different But it’s also simple – if Mort can do any kind of concurrency, he can do this

43 In one slide: Objects have both synchronous and asynchronous methods.
Values are passed by ordinary method calls: If the method is synchronous, the caller blocks until the method returns some result (as usual). If the method is async, the call completes at once and returns void. A class defines a collection of chords (synchronization patterns), which define what happens once a particular set of methods have been invoked. One method may appear in several chords. When pending method calls match a pattern, its body runs. If there is no match, the invocations are queued up. If there are several matches, an unspecified pattern is selected. If a pattern containing only async methods fires, the body runs in a new thread.

44 Extending C# with chords
Classes can declare methods using generalized chord-declarations instead of method-declarations. chord-declaration ::= method-header [ & method-header ]* body method-header ::= attributes modifiers [return-type | async] name (parms) Interesting well-formedness conditions: At most one header can have a return type (i.e. be synchronous). Inheritance restriction. “ref” and “out” parameters cannot appear in async headers.

45 A Simple Buffer class Buffer { String get() & async put(String s) {
return s; } Calls to put() return immediately (but are internally queued if there’s no waiting get()). Calls to get() block until/unless there’s a matching put() When there’s a match the body runs, returning the argument of the put() to the caller of get(). Exactly which pairs of calls are matched up is unspecified.

46 OCCAM Program consists of processes and channels
Process is code containing channel operations Channel is a data object All synchronization is via channels Formal foundation based on CSP

47 Channel Operations in OCCAM
Read data item D from channel C D ? C Write data item Q to channel C Q ! C If reader accesses channel first, wait for writer, and then both proceed after transfer. If writer accesses channel first, wait for reader, and both proceed after transfer.

48 Concurrent ML Threads Communication Synchronization Atomicity
New type of entity Communication Synchronous channels Synchronization Channels Events Atomicity No specific language support

49 Threads Thread creation Example code Result
spawn : (unit  unit)  thread_id Example code CIO.print "begin parent\n"; spawn (fn () => (CIO.print "child 1\n";)); spawn (fn () => (CIO.print "child 2\n";)); CIO.print "end parent\n“ Result child 1 begin parent child 2 end parent

50 Channels Channel creation Communication Example Result
channel : unit  ‘a chan Communication recv : ‘a chan  ‘a send : ( ‘a chan * ‘a )  unit Example ch = channel(); spawn (fn()=> … <A> … send(ch,0); … <B> …); spawn (fn()=> … <C> … recv ch; … <D> …); Result <A> <B> send/recv <C> <D>

51 CML programming Functions Events Sample Application
Can write functions : channels  threads Build concurrent system by declaring channels and “wiring together” sets of threads Events Delayed action that can be used for synchronization Powerful concept for concurrent programming Sample Application eXene – concurrent uniprocessor window system

52 A CML implementation (simplified)
Use queues with side-effecting functions datatype 'a queue = Q of {front: 'a list ref, rear: 'a list ref} fun queueIns (Q(…)) = (* insert into queue *) fun queueRem (Q(…)) = (* remove from queue *) And continuations val enqueue = queueIns rdyQ fun dispatch () = throw (queueRem rdyQ) () fun spawn f = callcc (fn parent_k => ( enqueue parent_k; f (); dispatch())) Source: Appel, Reppy

53 Language issues in client/server programming
Communication mechanisms RPC, Remote Objects, SOAP Data representation languages XDR, ASN.1, XML Parsing and deparsing between internal and external representation Stub generation

54 Client/server example
A major task of most clients is to interact with a human user and a remote server. The basic organization of the X Window System

55 Client-Side Software for Distribution Transparency
A possible approach to transparent replication of a remote object using a client-side solution.

56 The Stub Generation Process
Compiler / Linker Server Program Server Stub Server Source Interface Specification Common Header RPC LIBRARY Stub Generator RPC LIBRARY Client Stub Client Source Client Program Compiler / Linker

57 RPC and the OSI Reference Model
Application Layer Presentation Layer (XDR) Session Layer (RPC) Transport Layer (UDP)

58 Representation Data must be represented in a meaningful format.
Methods: Sender or Receiver makes right (NDR). Network Data Representation (NDR). Transmit architecture tag with data. Represent data in a canonical (or standard) form XDR ASN.1 Note – these are languages, but traditional DS programmers don’t like programming languages, except C

59 XDR - eXternal Data Representation
XDR is a universally used standard from Sun Microsystems used to represent data in a network canonical (standard) form. A set of conversion functions are used to encode and decode data; for example, xdr_int( ) is used to encode and decode integers. Conversion functions exist for all standard data types Integers, chars, arrays, … For complex structures, RPCGEN can be used to generate conversion routines.

60 RPC Example client client.c date_proc.c date_svc date.x date_clnt.c
gcc RPC library -lnsl client client.c date_proc.c date_svc date.x RPCGEN date_clnt.c date_svc.c date_xdr.c date.h

61 XDR Example #include <rpc/xdr.h> ..
XDR sptr; // XDR stream pointer XDR *xdrs; // Pointer to XDR stream pointer char buf[BUFSIZE]; // Buffer to hold XDR data xdrs = (&sptr); xdrmem_create(xdrs, buf, BUFSIZE, XDR_ENCODE); int i = 256; xdr_int(xdrs, &i); printf(“position = %d. \n”, xdr_getpos(xdrs)); xdrs sptr buf

62 Abstract Syntax Notation 1 (ASN.1)
ASN.1 is a formal language that has two features: a notation used in documents that humans read a compact encoded representation of the same information used in communication protocols. ASN.1 uses a tagged message format: < tag (data type), data length, data value > Simple Network Management Protocol (SNMP) messages are encoded using ASN.1.

63 Distributed Objects CORBA Java RMI SOAP and XML

64 Distributed Objects Proxy and Skeleton in Remote Method Invocation
object A object B skeleton Request proxy for B Reply Communication Remote Remote reference module reference module for B’s class & dispatcher remote client server

65 CORBA Common Object Request Broker Architecture
An industry standard developed by OMG to help in distributed programming A specification for creating and using distributed objects A tool for enabling multi-language, multi-platform communication A CORBA based-system is a collection of objects that isolates the requestors of services (clients) from the providers of services (servers) by an encapsulating interface

66 CORBA objects They are different from typical programming objects in three ways: CORBA objects can run on any platform CORBA objects can be located anywhere on the network CORBA objects can be written in any language that has IDL mapping.

67 A request from a client to an Object implementation within a network
IDL IDL IDL IDL ORB ORB NETWORK A request from a client to an Object implementation within a network

68 IDL (Interface Definition Language)
CORBA objects have to be specified with interfaces (as with RMI) defined in a special definition language IDL. The IDL defines the types of objects by defining their interfaces and describes interfaces only, not implementations. From IDL definitions an object implementation tells its clients what operations are available and how they should be invoked. Some programming languages have IDL mapping (C, C++, SmallTalk, Java,Lisp)

69 IDL File IDL Compiler ORB Client Stub File Server Skeleton File Object
Implementation Client Implementation ORB

70 The IDL compiler It will accept as input an IDL file written using any text editor (fileName.idl) It generates the stub and the skeleton code in the target programming language (ex: Java stub and C++ skeleton) The stub is given to the client as a tool to describe the server functionality, the skeleton file is implemented at the server.

71 IDL Example module katytrail { module weather { struct WeatherData { float temp; string wind_direction_and_speed; float rain_expected; float humidity; }; typedef sequence<WeatherData> WeatherDataSeq interface WeatherInfo { WeatherData get_weather( in string site ); WeatherDataSeq find_by_temp( in float temperature

72 Both interfaces will have Object Implementations.
IDL Example Cont. interface WeatherCenter { register_weather_for_site ( in string site, in WeatherData site_data ); }; Both interfaces will have Object Implementations. A different type of Client will talk to each of the interfaces. The Object Implementations can be done in one of two ways. Through Inheritance or through a Tie.

73 Stubs and Skeletons In terms of CORBA development, the stubs and skeleton files are standard in terms of their target language. Each file exposes the same operations specified in the IDL file. Invoking an operation on the stub file will cause the method to be executed in the skeleton file The stub file allows the client to manipulate the remote object with the same ease with each a local file is manipulated

74 Java RMI Overview History Supports remote invocation of Java objects
Key: Java Object Serialization Stream objects over the wire Language specific History Goal: RPC for Java First release in JDK 1.0.2, used in Netscape 3.01 Full support in JDK 1.1, intended for applets JDK 1.2 added persistent reference, custom protocols, more support for user control.

75 Java RMI Advantages Disadvantages
True object-orientation: Objects as arguments and values Mobile behavior: Returned objects can execute on caller Integrated security Built-in concurrency (through Java threads) Disadvantages Java only Advertises support for non-Java But this is external to RMI – requires Java on both sides

76 Java RMI Components Base RMI classes Java compiler – javac
Extend these to get RMI functionality Java compiler – javac Recognizes RMI as integral part of language Interface compiler – rmic Generates stubs from class files RMI Registry – rmiregistry Directory service RMI Run-time activation system – rmid Supports activatable objects that run only on demand

77 RMI Implementation Client Host Server Host Java Virtual Machine Client
Object Java Virtual Machine Remote Object Stub Skeleton

78 Java RMI Object Serialization
Java can send object to be invoked at remote site Allows objects as arguments/results Mechanism: Object Serialization Object passed must inherit from serializable Provides methods to translate object to/from byte stream Security issues: Ensure object not tampered with during transmission Solution: Class-specific serialization Throw it on the programmer

79 Building a Java RMI Application
Define remote interface Extend java.rmi.Remote Create server code Implements interface Creates security manager, registers with registry Create client code Define object as instance of interface Lookup object in registry Call object Compile and run Run rmic on compiled classes to create stubs Start registry Run server then client

80 Parameter Passing Primitive types Remote objects Non-remote objects
call-by-value Remote objects call-by-reference Non-remote objects use Java Object Serialization

81 Java Serialization Writes object as a sequence of bytes
Writes it to a Stream Recreates it on the other end Creates a brand new object with the old data Objects can be transmitted using any byte stream (including sockets and TCP).

82 Codebase Property Stub classpaths can be confusing
3 VMs, each with its own classpath Server vs. Registry vs. Client The RMI class loader always loads stubs from the CLASSPATH first Next, it tries downloading classes from a web server (but only if a security manager is in force) java.rmi.server.codebase specifies which web server

83 CORBA vs. RMI CORBA was designed for language independence whereas RMI was designed for a single language where objects run in a homogeneous environment CORBA interfaces are defined in IDL, while RMI interfaces are defined in Java CORBA objects are not garbage collected because they are language independent and they have to be consistent with languages that do not support garbage collection, on the other hand RMI objects are garbage collected automatically

84 SOAP Introduction SOAP is simple, light weight and text based protocol
SOAP is XML based protocol (XML encoding) SOAP is remote procedure call protocol, not object oriented completely SOAP can be wired with any protocol SOAP is a simple lightweight protocol with minimum set of rules for invoking remote services using XML data representation and HTTP wire. Main goal of SOAP protocol – Interoperability SOAP does not specify any advanced distributed services.

85 Why SOAP – What’s wrong with existing distributed technologies
Platform and vendor dependent solutions (DCOM – Windows) (CORBA – ORB vendors) (RMI – Java) Different data representation schemes (CDR – NDR) Complex client side deployment Difficulties with firewall Firewalls allows only specific ports ( port 80 ), but DCOM and CORBA assigns port numbers dynamically. In short, these distributed technologies do not communicate easily with each other because of lack of standards between them.

86 Base Technologies – HTTP and XML
SOAP uses the existing technologies, invents no new technology. XML and HTTP are accepted and deployed in all platforms. Hypertext Transfer Protocol (HTTP) HTTP is very simple and text-based protocol. HTTP layers request/response communication over TCP/IP. HTTP supports fixed set of methods like GET, POST. Client / Server interaction Client requests to open connection to server on default port number Server accepts connection Client sends a request message to the Server Server process the request Server sends a reply message to the client Connection is closed HTTP servers are scalable, reliable and easy to administer. SOAP can be bind any protocol – HTTP , SMTP, FTP

87 Extensible Markup Language (XML)
XML is platform neutral data representation protocol. HTML combines data and representation, but XML contains just structured data. XML contains no fixed set of tags and users can build their own customized tags. <student> <full_name>Bhavin Parikh</full_name> </student> XML is platform and language independent. XML is text-based and easy to handle and it can be easily extended.

88 Architecture diagram

89 Parsing XML Documents Remember: XML is just text
Simple API for XML (SAX) Parsing SAX is typically most efficient No Memory Implementation! Left to the Developer Document Object Model (DOM) Parsing “Parsing” is not fundamental emphasis. A “DOM Object” is a representation of the XML document in a binary tree format. One thing important to remember; XML is simple text, delimited by a tag language that follows standard rules. In the early days of developing the SAX API (a community effort managed from the XML-DEV mailing list), the common stated goal was to create an API that could be implemented by a “Desperate Perl Hacker” in a weekend. While the goal likely fell far short of that (ask the developers of Xerces Perl!), simple text processing forms the basis of all XML parsing and handling techniques. (As an aside, one of the advantages of using XML over other data formats, especially formats that are designed to be system-portable, is the elimination of the “newline” problem. Anyone who has dealt with files on a UNIX system created by a Windows text editor knows that Windows text processing follows the “newline, carriage return” format “/r/n”, while UNIX (and now, Macintosh) systems use the simple newline character: “\n”. This has lead to hours upon hours of wasted developer time.) SAX is typically the most efficient method of processing XML, in terms of machine and memory utilization. The advantage gained here is largely due to the fact that in-memory data representation is left to developers, and can be tuned to whatever the task is at hand. SAX parsing is likely the best choice for large datasets, and applications where memory is a limited asset. (Such as mobile devices!) Example: SaxParseExample DOM parsing methods are simpler to implement, as the parsing steps are left to the implementation. In DOM processing, parsing is not the fundamental emphasis; rather, the parser creates a memory representation of a document that generically represents an XML document. DOM methods are best suited to typical XML development; relatively small datasets where parsing performance is not the largest ‘hot spot’ in the program. (A good example: using XML documents as data formats, or for ‘servlet configuration’ documents.)

90 Parsing: Examples SaxParseExample DomParseExample
“Callback” functions to process Nodes DomParseExample Use of JAXP (Java API for XML Parsing) Implementations can be ‘swapped’, such as replacing Apache Xerces with Sun Crimson. JAXP does not include some ‘advanced’ features that may be useful. SAX used behind the scenes to create object model

91 Web-based applications today
Presentation: HTML, CSS, Javascript, Flash, Java applets, ActiveX controls Application server Web server Content management system Business logic: C#, Java, VB, PHP, Perl, Python,Ruby … Beans, servlets, CGI, ASP.NET,… Operating System Database: SQL File system Sockets, HTTP, , SMS, XML, SOAP, REST, Rails, reliable messaging, AJAX, … Replication, distribution, load-balancing, security, concurrency

92 Languages for distributed computing
Motivation Why all the fuss about language and platform independence? It is extremely inefficient to parse/deparse to/from external/internal representation 95% of all computers run Windows anyway There is a JVM for almost any processor you can think of Few programmers master more than one programming language anyway Develop a coherent programming models for all aspects of an application

93 Facile Programming Language
Integration of Multiple Paradigms Functions Types/complex data types Concurrency Distribution/soft real-time Dynamic connectivity Implemented as extension to SML Syntax for concurrency similar to CML

94

95 Facile implementation
Pre-emptive scheduler implemented at the lowest level Exploiting CPS translation => state characterised by the set of registers Garbage collector used for linearizing data structures Lambda level code used as intermediate language when shipping data (including code) in heterogeneous networks Native representation is shipped when possible i.e. same architecture and within same trust domain Possibility to mix between interpretation or JIT depending on usage

96 Conclusion Concurrency may be an order of magnitude more difficult to handle Programming language support for concurrency may help make the task easier Which concurrency constructs to add to the language is still a very active research area If you add concurrency construct, be sure you base them on a formal model!

97 The guiding principle Put important features in the language itself, rather than in libraries Provide better level of abstraction Make invariants and intentions more apparent (part of the interface) Give stronger compile-time guarantees (types) Enable different implementations and optimizations Expose structure for other tools to exploit (e.g. static analysis)


Download ppt "Bent Thomsen Department of Computer Science Aalborg University"

Similar presentations


Ads by Google