Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Whirlwind Tour Chapter 1a. 2 Transactions: Where It All Started [Cuneiform] documents now number about half a million, three- quarters of them more.

Similar presentations

Presentation on theme: "The Whirlwind Tour Chapter 1a. 2 Transactions: Where It All Started [Cuneiform] documents now number about half a million, three- quarters of them more."— Presentation transcript:

1 The Whirlwind Tour Chapter 1a

2 2 Transactions: Where It All Started [Cuneiform] documents now number about half a million, three- quarters of them more or less directly related to the history of law - dealing, as they do, with contracts, acknowledgment of debts, receipts, inventories, and accounts, as well as containing records and minutes of judgments rendered in courts, business letters, administrative and diplomatic correspondence, laws, international treaties, and other official transactions. The total evidence enables the historian to reach back as far as the beginnings of writing, to the dawn of history.[... ] Moreover, because of the inconvenience of writing in stone or clay, Mesopotamians wrote only when economic or political necessity demanded it. (Encyclopaedia Britannica, 1974 edition)

3 3 From Transactions to Transaction Processing Systems - I n Database. An abstract system state, represented as marks on clay tablets, was maintained. Today, we would call this the database. n Transactions. Scribes recorded state changes with new records (clay tablets) in the database. Today, we would call these state changes transactions. The Sumerian way of doing business involved two components:

4 4 From Transactions to Transaction Processing Systems - II The real state is represented by an abstraction, called the database, and the transformation of the real state is mirrored by the execution of a program, called a transaction, that transforms the database.

5 5 Transactions Are In... Each time you make a phone call, there is a call setup transaction that allocates some resources to your conversation; the call teardown is a second transaction, freeing those resources. The call setup increasingly involves complex algorithms to find the callee (800 numbers could be anywhere in the world) and to decide who is to be billed (800 and 900 numbers have complex billing). The system must deal with features like call forwarding, call waiting, and voice mail. After the call teardown, billing may involve many phone companies. Communications:

6 6 Transactions Are In... Each time you purchase gas using a credit card, the point-of-sale terminal connects to the credit card company's computer. In case that fails, it may alternatively try to debit the amount to your account by connecting to your bank. This generalizes to all kinds of point-of-sale terminals such as cash registers, ATMs, etc. When banks balance their accounts with each other (electronic fund transfer), they use transactions for reliability and recoverability. Finance:

7 7 Transactions Are In... Making reservations for a trip requires many related bookings and ticket purchases from airlines, hotels, rental car companies, and so on. From the perspective of the customer, the whole trip package is one purchase. From the perspective of the multiple systems involved, many transactions are executed: One per airline reservation (at least), one for each hotel reservation, one for each car rental, one for each ticket to be printed, on for setting up the bill, etc. Along the way, each inquiry that may not have resulted in a reservation is a transaction, too. Travel:

8 8 Transactions Are In... Order entry, job and inventory planning and scheduling, accounting, and so on are classical application areas of transaction processing. Computer integrated manufacturing (CIM) is a key technique for improving industrial productivity and efficiency. Just-in-time inventory control, automated warehouses, and robotic assembly lines each require a reliable data storage system to represent the factory state. Manufacturing:

9 9 Transactions Are In... This application area includes all kinds of physical machinery that needs to interact with the real world, either as a sensor, or as an actor. Traditionally, such systems were custom made for each individual plant, starting from the hardware. The usual reason for that was that 20 years ago off-the-shelf systems could not guarantee real-time behavior that is critical in these applications. This has changed, and so has the feasibility of building entire systems from scratch. Standard software is now used to ensure that the application will be portable. Real-Time Systems:

10 10 A Transaction Processing System A transaction processing system (TP-system) provides tools to ease or automate application programming, execution, and administration of complex, distributed applications. Transaction processing applications typically support a network of devices that submit queries and updates to the application. Based on these inputs, the application maintains a database representing some real-world state. Application responses and outputs typically drive real-world actuators and transducers that alter or control the state. The applications, database, and network tend to evolve over several decades. Increasingly, the systems are geographically distributed, heterogeneous (they involve equipment and software from many different vendors), continuously available (there is no scheduled downtime), and have stringent response time requirements.

11 11 ACID Properties: First Definition n Atomicity: A transactions changes to the state are atomic: either all happen or none happen. These changes include database changes, messages, and actions on transducers. n Consistency: A transaction is a correct transformation of the state. The actions taken as a group do not violate any of the integrity constraints associated with the state. This requires that the transaction be a correct program. n Isolation: Even though transactions execute concurrently, it appears to each transaction T, that others executed either before T or after T, but not both. n Durability: Once a transaction completes successfully (commits), its changes to the state survive failures.

12 12 Structure of a Transaction Program n The application program declares the start of a new transaction by invoking BEGIN_WORK(). n All subsequent operations will be covered by the transaction. Eventually, the application program will call COMMIT_WORK(), if a new consistent state has been reached. This makes sure the new state becomes durable. n If the application program cannot complete properly (violation of consistency constraints), it will invoke ROLLBACK_WORK(), which appeals to the atomicity of the transaction, thus removing all effects the program might have had so far. n If for some reason the application fails to call either commit or rollback (there could be an endless loop, a crash, a forced process termination), the transaction system will automatically invoke ROLLBACK_WORK() for that transaction.

13 13 The End Users View of a Transaction Processing System

14 14 The Administrator's/Operators View of a TP System

15 15 Performance Measures of Interactive Transactions Performance/Small/Simple MediumComplex Transaction ________________________________________________________________ Instr./transaction100k 1M100M Disk I/O / TA Local msgs. (B)10 (5KB) 100 (50KB) 1000 (1MB) Remote msgs. (B)2 (300B) 2 (4KB)100 (1MB) Cost/TA/second10k$/tps 100k$/tps1M$/tps Peak tps/site

16 16 Client-Server Computing: The Classical Idea

17 17 Client-Server Computing: The CORBA Idea Client on WS Presentation Services etc IDL Stub IDL Skeleton Object Implementation: Jim´s Mailbox Request: Delete Object Request Broker

18 18 Client-Server Computing: The WWW Idea WWW- Browser Java-Applet + Java Database Connection (JDBC) Driver Code HTTP Server Java- applet JDBC- driver code Database Server proprietary protocol JDBC-ODBC- bridge ODBC driver prop. protocol JDBC network driver public protocol (e.g. TCP/IP) JDBC driver

19 19 Using Transactional Remote Procedure Calls (TRPCs)

20 20 Terms We Have Introduced So Far n Resource manager: The system comes with an array of transactional resource managers that provide ACID operations on the objects they implement. Database systems, persistent programming languages, and queue managers are typical examples. n Durable state: Application state represented as durable data stored by the resource managers. n TRPC: Transactional remote procedure calls allow the application to invoke local and remote resource managers as though they were local. They also allow the application designer to decompose the application into client and server processes on different computers. n Transaction program: Inquiries and state transfor-mations are written as programs in conventional or specialized programming languages. The programmer brackets the successful execution of the program with a Begin-Commit pair and brackets a failed execution with a Begin-Rollback pair.

21 21 Terms We Have Introduced So Far n Atomicity: At any point before the commit, the application or the system may abort the transaction, invoking rollback. If the transaction is aborted, all of its changes to durable objects will be undone (reversed), and it will be as though the transaction never ran. n Consistency: The work within a Begin-Commit pair must be a correct transformation. n Isolation: While the transaction is executing, the resource managers ensure that all objects the transaction reads are isolated from the updates of concurrent transactions. n Durability: Once the commit has been successfully executed, all the state transformations of that transaction are made durable and public.

22 22 The World According to the Resource Manager

23 23 Where To Split Client/Server? Presentation Flow Control Application Logic (=business objects) Data Access Server Thin Fat

24 24 Client/Server Infrastructure Client Server Middleware GUI OOUI System Mgmt. OS Objects Group- ware TP-Mon. DBMS OS SQL ORB TRPC Security Transport Mail WWW Files etc.

25 25 Transactional Core Services

26 26 The X/Open TP-Model

27 27 The X/Open Distributed Transaction Processing Model

28 28 The OTS Model transaction originator TA- context TA- context TA- context recoverable server Transaction service transmitted with request creation termination invocationcommit coordination

29 29 Transaction Processing System Feature List n Application development features Application generators; graphical programming interfaces; screen painters; compilers; CASE tools; test data generators; starter system with a complete set of administrative and operations functions, security, and accounting. n Repository features Description of all components of the system, both hardware and software. Description of the dependencies among components (bill-of-material). Description of all changes to all components to keep track of different versions. The repository is a database. Its role in the system must be complete, extensible, active and allow for local autonomy. n TP-Monitor Features Process management; server classes; transactional remote procedure calls; request-based authentication and authorization; support for applications and resource managers in implementing ACID operations on durable objects.

30 30 Transaction Processing System Feature List n Data communications features Uniform I/O interfaces; device independence; virtual terminal; screen painter support; support for RPC and TRPC; support for context-oriented communication (peer-to-peer). n Database features Data independence; data definition; data manipulation; data control; data display; database operations. n Operations features Archiving; reorganization; diagnosis; recovery; disaster recovery; change control; security; system extension. n Education and testing features Imbedded education; online documentation; training systems; national language features; test database generators; test drivers.

31 31 Data Communications Protocols

32 32 Presentation Management

33 33 SQL Data Definition

34 34 SQL Data Manipulation

35 35 Summary of Chapter 1 n A transaction processing system is a large web of application generators, system design and operation tools, and the more mundane language, database, network, and operations software. n The repository and the applications that maintain it are the mechanisms needed to manage the TP system. The repository is a transaction processing application. n It represents the system configuration as a database and supplies change control by transactions that manipulate the configuration and the repository. n The transaction concept, like contract law, is intended to resolve the situation when exceptions arise. The first order of business in designing a system is, therefore, to have a clear model of system failure modes. What breaks? How often do things break?

36 Chapter 1b Basic Terminology

37 37 A Word About Words (Chapter 2) Humpty Dumpty: When I use a word, it means exactly what I chose it to mean; nothing more nor less. Alice: The question is, whether you can make words mean so many different things. Humpty Dumpty: The question is, which is to be master, thats all. Lewis Carroll

38 38 Basic Computer Terms To get any confusion that might be caused by the many synonyms in our field out of the way, let us adopt the following conventions for the rest of this class: domain = data type =... field = column = attribute =... record = tuple = object = entity =... block = page = frame = slot =... file = data set = table =... process = task = thread = actor =... function=request=method=... All the other terms and definitions we need will be briefly introduced and explained during the session.

39 39 Basic Hardware Architecture I In Bell and Newells classic taxonomy, hardware consists of three types of modules: Processors, memory, and communications (switches or wires). Processors execute instructions from a program, read and write memory, and send data via communication lines. Computers are generally classified as supercomputers, mainframes, minicomputers, workstations, and personal computers. However, these distinctions are becoming fuzzy with current shifts in technology.

40 40 Basic Hardware Architecture II Todays workstation has the power of yesterdays mainframe. Similarly, todays WAN (wide area network) has the communications bandwidth of yesterdays LAN (local area network). In addition, electronic memories are growing in size to include much of the data formerly stored on magnetic disk. These technology trends have deep implications for transaction processing.

41 41 Basic Hardware Architecture III n Distributed processing: Processing is moving closer to the producers and consumers of the data (workstations, intelligent sensors, robots, and so on). n Client-server: These computers interact with each other via request-reply protocols. One machine, called the client, makes requests to another, called the server. Of course, the server may in turn be a client to other machines. n Clusters: Powerful servers consist of clusters of many processors and memories, cooperating in parallel to perform common tasks.

42 42 Basic Hardware Architecture IV

43 43 Memories - The Economic Perspective I n The processor executes instructions from virtual memory, and it reads and alters bytes from the virtual memory. The mapping between virtual memory and real memory includes electronic memory, which is close to the processor, volatile, fast, and expensive, and magnetic memory, which is "far away" from the processor, non-volatile, slow, and cheap. The mapping process is handled by the operating system with some hardware assistance. n Memory performance is measured by its access time: Given an address, the memory presents the data at some later time. The delay is called the memory access time. Access time is a combination of latency (the time to deliver the first byte), and transfer time (the time to move the data). Transfer time, in turn, is determined by the transfer size and the transfer rate. This produces the following overall equation: memory access time = latency + ( transfer size / transfer rate )

44 44 Memories - The Economic Perspective II n Memory price-performance is measured in one of two ways: n Cost/byte. The cost of storing a byte of data in that media. n Cost/access. The cost of reading a block of data from that media. n This is computed by dividing the device cost by the number of accesses per second that the device can perform. n The actual units are cost/access/second, but the time unit is implicit in the metrics name. n These two cost measures reflect the two different views of a memorys purpose: n it stores data, and n it receives and retrieves data.

45 45 Memories- The Economic Perspective III Typical large system capacity

46 46 Memories- The Economic Perspective VI $ / MB

47 47 Magnetic Memory n There are two types of magnetic storage media: disk and tape. Disks rotate, passing the data in the cylinder by the electronic read-write heads every few milliseconds. This gives low access latency. The disk arm can move among cylinders in tens of milliseconds. Tapes have approximately the same storage density and transfer rate, but they must move long distances if random access is desired. Consequently, tapes have large random access latencieson the order of seconds. Disk Access Time = Seek_Time + Rotational_Latency + (Transfer_Size/ Transfer_Rate)

48 48 Magnetic Memory Compare the times required for two access patterns to 1MB stored in 1000 blocks on disk: n Sequential access: Read or write sectors [x, x + 1,..., x + 999] in ascending order. This requires one seek (10 ms) and half a rotation (5 ms) before the data in the cylinder begins transferring the megabyte at 10 MBps (the transfer takes 100 ms, ignoring one-cylinder seeks). The total access time is 115ms. n Random access: Read the 1000 sectors [x,..., x + 999] in random order. In this case, each read requires a seek (10 ms), half a rotation (5 ms), and then the 1 kb transfer (.1 ms). Since there are 1000 of these events, the total access time is 15.1 seconds.

49 49 Memory Hierarchies

50 50 Memory Hierarchies n The hierarchy uses small, fast, expensive cache memories to cache some data present in larger, slower, cheaper memories. n If hit ratios are good, the overall memory speed approximates the speed of the cache. n At any level of the memory hierarchy, the hit ratio is defined as: hit ratio = references satisfied by cache / all references to cache n Suppose a cache memory with access time C has hit rate H, and suppose that on a miss the secondary memory access time is S. Further, suppose that C =.01 S. The effective access time of the cache will be as follows: Effective memory access time = H C + (1 - H) S = H (.01 S) + ( 1 - H) S = ( H) S (1 - H) S

51 51 The Five Minute Rule n Assume there are no special response time (real-time) requirements; the decision to keep something in cache is, therefore, purely economic. n To make things simple, suppose that data blocks are 10 KB. n At 1995 prices, 10 KB of main memory cost about $1. Thus, we could keep the data in main memory forever if we were willing to spend a dollar. n With 10 KB of disk costing only $.10, we could save $.90 if we kept the 10 KB on disk. n In reality, the savings are not so great; if the disk data is accessed, it must be moved to main memory, and that costs something. How much, then, does a disk access cost? n A disk, along with all its supporting hardware, costs about $3,000 (in 1995) and delivers about 30 acc./sec.; the cost, therefore, is about $100. At this rate, if the data is accessed once a second, it costs $ to store it on disk (disk storage and disk access costs). That is considerably more than the $1 to store it in main memory. n The break-even point is about one access per 100 seconds. At that rate, the main memory cost is about the same as the disk storage cost plus the disk access costs. At a more frequent access rate, diskstorage is more expensive. At a less frequent rate, disk storage is cheaper. Anticipating the cheaper main memory that will result from technology changes, this observation is called the five-minute rule rather than the two-minute rule.

52 52 The Five Minute Rule Keep a data item in electronic memory if its access frequency is five minutes or higher; otherwise keep it in magnetic memory. Similar arguments apply to objects stored on tape and cached on disk. Given the object size, the cost of cache, the cost of secondary memory, and the cost of accessing the object in secondary memory once per second, the frequency at the break-even point in units of accesses per second (a/s) is given by the following formula: Frequency ((Cache_Cost/Byte - Secondary_Cost/Byte). Object_Bytes) / (Object_Access_Per_Second_Cost) a/s

53 53 The Rules of Exponential Growth Electronic memory: MemoryChipCapacity(year) = 4 Kb/chip for year in [ ] Moores Law Magnetic memory: MagneticAreaDensity(year) = 10 Mb/inch2 for year [ ]Hoaglands Law Processors: SunMips(year) = 2 MIPS for year in [ ]Joys Law ((year-1970)/3) ((year-1970)/10) (year-1984)

54 54 Communication Hardware The definition of the four kinds of networks by their diameters. These diameters imply certain latencies (based on the speed of light). In 1990, Ethernet (at 10 Mbps) was the dominant LAN. Metropolitan networks typically are based on 1 Mbps public lines. Such lines are too expensive for transcontinental links at present; most long- distance lines are therefore 50 Kbps or less. As you will get from the news, these things are changing fast. The early 90s

55 55 Communication Hardware Point-to-point bandwidth likely to be common among computers by the year Scenario 2000

56 56 Processor Architectures

57 57 Processor Architectures n Shared nothing: In a shared-nothing design, each memory is dedicated to a single processor. All accesses to that data must pass through that processor. Processors communicate by sending messages to each other via the communications network. n Shared global: In a shared-global design, each processor has some private memory not accessible to other processors. There is, however, a pool of global memory; shared by the collection of processors. This global memory is usually addressed in blocks (units of a few kilobytes or more) and is RAM disk or disk. n Shared memory: In a shared-memory design, each processor has transparent access to all memory. If multiple processors access the data concurrently, the underlying hardware regulates the access to the shared data and provides each processor a current view of the data.

58 58 Address Spaces

59 59 Address Spaces • Memory segmentation and sharing: A process executes in an address spacea paged, segmented array of bytes. Some segments may be shared with other address spaces. The sharing may be execute-only, read-only, or read-write. Most of the segment slots are empty (lightly shaded boxes), and most of the occupied segments are only partially full of programs or data. • To simplify memory addressing, the virtual address space is divided into fixed-size segment slots, and each segment partially fills a slot. • Typical slot sizes range from 2**24 to 2**32 bytes. This gives a two-dimensional address space, where addresses are {segment_number, byte}. Again, segments are often partitioned into virtual memory pages, which are the unit of transfer between main and secondary memory. If an object is bigger than a segment, it can be mapped into consecutive segments of the address.

60 60 Processes n A process is a virtual processor. It has an address space that contains the program the process is executing and the memory the process reads and writes. One can imagine a process executing Java programs statement by statement, with each statement reading and writing bytes in the address space or sending messages to other processes. n Processes provide an ability to execute programs in parallel; they provide a protection entity; and they provide a way of structuring computations into independent execution streams. So they provide a form of fault containment in case a program fails. n Processes are building blocks for transactions, but the two concepts are orthogonal. A process can execute many different transactions over time, and parts of a single transaction may be executed by many processes. n Each process executes on behalf of some user, or authority, and with some priority. The authority determines what the process can do: which other processes, devices, and files the process can address and communicate with. The process priority determines how quickly the processs demand for resour-ces will be serviced if other processes make competing demands. Short tasks typically run with high priority, while large tasks are given lower priority.

61 61 Protection Domains n There are two ways to provide protection : n Process = protection domain: Each subsystem executes as a separate process with its own private address space. Applications execute subsystem requests by switching processes, that is, by sending a message to a process. n Address space = protection domain: A process has many address spaces: one for each protected subsystem and one for the application. Applications execute subsystem requests by switching address spaces. The address space protection domain of a subsystem is just an address space that contains some of the callers segments; in addition, it contains program and data segments belonging to the called subsystem. A process connects to the domain by asking the subsystem or OS kernel to add the segment to the address space. Once connected, the domain is callable from other domains in the process by using a special instruction or kernel call.

62 62 Protection Domains A process may have many protection domains.

63 63 Threads There is a need for multiple processes per address space: n For example, to scan through a data stream, one process is appointed the producer, which reads the data from an external source, while the second process processes the data. Further examples of cooperating processes are file read-ahead, asynchronous buffer flushing, and other housekeeping chores in the system. n Processes can share the same address space simply by having all their address spaces point to the same segments. Most operating systems do not make a clean distinction between address spaces and processes. Thus a new concept, called a thread or a task, is introduced. n But note: Several operating systems do not use the term process at all. For example, in the Mach operating system, thread means process, and task means address space; in MVS, task means process, and so on.

64 64 Threads n The term thread often implies a second property: inexpensive to create and dispatch. Threads are commonly provided by some software that found the operating system processes to be too expensive to create or dispatch. The thread software multiplexes one big operating system process among many threads, which can be created and dispatched hundreds of times faster than a process. n The term thread is used in the following to connote these light- weight processes. Unless this light-weight property is intended, process is used. Several threads usually share a common address space. Typically, all the threads have the same authorization identifier, since they are part of the same address space domain, but they may have different scheduling priorities.

65 65 Messages and Sessions There are two styles of communication among processes: n Datagrams: The sender of a message determines the recipient's address (e.g. the process name) and constructs an envelope consisting of the sender's name and address, the recipient's name and address, and the message text. This envelope is delivered to the capable hands of the communication system. It is analogous to sending letters by mail. n Sessions: Before any messages are sent, a fixed connection is established between sender and receiver, a so-called session. Once it has been established, both parties can send and receive messages via this session. This symmetry is often referred to as "peer-to-peer". Establishing a session requires a datagram. A session must at some point be closed down explicitly. It is analogous to a phone conversation.

66 66 Advantages of Sessions n Shared state: A session represents shared state between the client and the server. A datagram might go to any process with the designated name, but a session goes to a particular instance of that name. n Authorization: Processes do not always trust each other. The server often checks the clients credentials to see that the client is authorized to perform the requested function. The authentication protocols require multi-message exchanges. Once the session key is established, it is shared state. n Error correction: Messages flowing in each session direction are numbered sequentially. These sequence numbers can detect lost messages and duplicate messages. n Performance: The operations described are fairly costly. Each of the steps often involves several messages. By establishing a session, this information is cached.

67 67 Clients and Servers n The question of how computations consisting of many interacting processes should be structured has no simple answer. Currently, two styles are particularly popular: peer-to-peer and client-server. n The debate about which style is "better" often creates the impression that they are radically different. But in reality, peer-to-peer is more general and more complex, and it subsumes client-server. Here is a brief characterization: n Peer-to-peer: The two processes are independent peers, each executing its computation and occasionally exchanging data with the other. n Client-server: The two processes interact via request-reply exchanges in which one process, the client, makes a request to a second process, the server, which performs this request and replies to the client.

68 68 Clients and Servers n The limitation of the client-server model lies in the fact that it implies a synchronous pattern of one request/one response. n There are, however, cases in which one request generates thousands of replies, or where thousands of requests generate one reply. Operations that have this property include transferring a file between the client and server or bulk reading and writing of databases. In other situations, a client request generates a request to a second server, which, in turn, replies to the client. Parallelism is a third area where simple RPC is inappropriate. Because the client-server model postulates synchronous remote procedure calls, the computation uses one processor at a time. However, there is growing interest in schemes that allow many processes to work on problems in parallel. The RPC model in its simplest form does not allow any parallelism.

69 69 Remote Procedure Calls (RPCs)

70 70 Naming n Naming has to do with the problem of how a client denotes a server it wants to invoke. Typical naming schemes distinguish between an object's name, its address, and its location. The name is an abstract identifier for the object, the address is the path to the object, and the location is where the object is. n An object can have several names. Some of these names may be synonyms, called aliases. Let us say that Bruce and Lindsay are two aliases for Bruce Lindsay. For this to be explicit, all names, addresses, and locations must be interpreted in some context, called a directory. For example, in our RPC context, Bruce means Bruce Nelson, and in our publishing context, Bruce means Bruce Spatz. Within the 408 telephone area, Bruce Lindsays address is , and outside the United States it is

71 71 Name Servers n Names are grouped into a hierarchy called the name space. An international commission has defined a universal name space standard, X.500, for computer systems. The commission administers the root of that name space. Each interior node of the hierarchy is a directory. A sequence of names delimited by a period (.) gives a path name from the directory to the object. n No one stores the entire name spaceit is too big, and it is changing too rapidly. Certain processes, called name servers, store parts of the name space local to their neighborhood; in addition, they store a directory of more global name servers.

72 72 Authentication Techniques n Passwords are the simplest technique. The client has a secret password, a string of bytes known only to it and the server. The client sends his password to the server to prove the clients identity. A second password is then needed to authenticate the server to the client. Thus, two passwords are required, and they must be sent across the wire. n Challenge-response uses only one password or key. In this scheme, the client and the server share a secret encryption key. The server picks a random number, N, and encrypts it with the key as EN. The server sends EN to the client and challenges the client to decrypt it using the secret key. If the client responds with N, the server believes the client knows the secret encryption key. The client can also authenticate the server by challenging it to decrypt a second random number. The shared secret is stored at both ends, but random numbers are sent across the wire.

73 73 Authentication Techniques n Public key system: Each authid has a pair of keysa public encryption key, EK, and a private decryption key, DK. The keys are chosen so that DK(EK(X)) = X, but knowing only EK and EK(X) it is hard to compute X. Thus, a processs ability to compute X from EK(X) is proof that the process knows the secret DK. Each authid publishes its public key to the world. Anyone wanting to authenticate the process as that authid goes through the challenge protocol: The challenger picks a random number X, encrypts it with the authids public key EK, and challenges the process to compute X from EK(X). Secrets are stored in one place only, and they do not go across the wire.

74 74 Scheduling The purpose of scheduling is to make sure all requests get processed, i.e. are assigned to a specific server process. There are basically two additional constraints: n Short response times: The requests should not wait longer than necessary before they get serviced. n Economic usage of resources: The required throughput should be achieved with the minimum number of resources (processors, nodes, links, etc.). n Throughput and response time at resource utilization r are related by the following formula: Average_Response_Time(r) = (1/ (1 - r)) Service_Time

75 75 The Scheduling Problem

76 76 File Organizations

77 77 SQL in a Distributed Environment

78 78 Software Performance

79 79 message formats protocol machine Client Machine Operating System Server Operating System Unix VMS API compiler Portable Program linker/loader "local" compiled program Porting and Installation Steps Client process FAP Server Machine Operation and Inter-Operation Protocol Standards

80 80 Relevant FAP-Standards n CSMA/CD, Token Ring, etc.: Low-level protocols that specify how bits are physically transmitted across a shared medium. n IP/TCP, NetBIOS, HTTP: Transport level protocols. n LU6.2: SNA´s peer-to-peer protocol that allows both session oriented and client-server-style communication under transaction protection. n OSI-TP: ISO´s rendering of a protocol that provides a functionality very similar to LU6.2. n ASN.1: Protocol for exchanging data formatting and structuring information. Required for RPCs in a heterogeneous environment. n DRDA: Interoperability standard for IBM SQL-systems. n ODBC, JDBC: Interoperability standards for general SQL-systems.

81 81 Relevant API-Standards n SQL: Portability standard for accessing relational databases (lots of proprietary extensions). n APPC, CPI-C: Two of IBM´s APIs for the LU6.2 protocol. n X/Open-XA, X/Open-XA+, etc.: APIs by the X/Open consortium on ISO´s OSI-TP protocols. n IDL: OMG´s interface definition language to let objects be integrated through an object request broker. n STDL: Language for programming TP-applications; based on the ACMS TP-monitor. n Java: The web´s favorite programming language; comes with its own FAP-component.

82 82 OSI Standards and X/Open APIs

83 83 A Last Glance at TP-Standards Each resource manager (RM) registers with its local transaction manager (TM). Applications start and commit transactions by calling their local TM. At commit, the TM invokes every participating RM. If the transaction is distributed, the communications manager informs the local and remote TM about the incoming or outgoing transaction, so that the two TMs can use the OSI-TP protocol to commit the transaction.

84 84 Summary n Transaction processing systems comprise all parts of a system, software and hardware. n Building such a system requires to consider end-to-end arguments at all levels of abstraction. n The performance of distributed TP systems is influenced by the hardware architecture (what is shared), by software issues (which protocols are used), and by configuration aspects (what limits scaleability). n The multitude of those influences gives rise to a constant dilemma: Should one restrict the variety to few (proprietary) components for better tuning and performance, or should one embrace all the standards for openness - at the risk of poor scaleability and performance?

Download ppt "The Whirlwind Tour Chapter 1a. 2 Transactions: Where It All Started [Cuneiform] documents now number about half a million, three- quarters of them more."

Similar presentations

Ads by Google