Presentation on theme: "2 Motivation Distributed Systems Notoriously difficult to build without appropriate assistance. First ones were based on low-level message-passing mechanisms."— Presentation transcript:
2 Motivation Distributed Systems Notoriously difficult to build without appropriate assistance. First ones were based on low-level message-passing mechanisms. Less abstractions meant simpler debuggers. Not much difference between the code manipulated by the user and what actually got executed. Nowadays: Object Oriented Middleware Abstraction frameworks introduce code. not meant to be seen by the user at development time. little care is taken at runtime. Result: large discrepancies Middleware: development made easier; or Middleware: debugging made harder?
3 Motivation (cont.) Related tasks Testing Debuggers help fix errors. Testing is paramount for finding them. Easier to do if support is provided by the environment. Setting up test scenarios is also a part of the debugging process. Launching remote processes. Simulating failure and observing system behavior. A basis for automated testing. Remote interaction & data collection.
4 Our approach Simplest idea possible OO Middleware allows developers to think of their Distributed Systems as if they were multithreaded, Object-Oriented systems (abstraction framework). We want to: preserve that illusion at debug time. level interactivity with what’s provided by today’s source-level debuggers. Involves capturing key points of system execution and displaying them to the user, hiding middleware-related code execution (debug time complexity-hiding).
5 Our approach (cont.) Also involves… Implementing the distributed thread abstraction Threads that span multiple machines. Core element for causal relation tracking in synchronous- call, Distributed Object Systems (DOSes) [Li, 2003], [Netzer, 1993]. Distributed thread visualization Our hypothesis Expected extension for symbolic debuggers; could lead to valuable insight; possible source of information for other algorithms
6 Our approach (cont.) However Arguably useful [Burnett, 1997] Maybe it doesn’t lead to valuable insight after all. Scales poorly Could be an issue for some applications. Difficult to implement Little support from runtime environments. Could be unworkable in some situations Too much interference. The “timeout” issue. 6
7 Initial implementation Targeted at Java/CORBA environments. Allows (*will be available shortly) : dynamic distributed thread tracking and visualization; fine grained control over the flow of execution (stopping, resuming and inspecting distributed threads); arbitrary state inspection; launching/killing remote processes; on-the-fly data visualization; stable property detection*. Would be perfect if could replay execution and; detect some unstable predicates [Garg, 1997; Tomlinson, 1993].
8 Eclipse Provides the model Sophisticated interface and interaction set Perfect environment for such a tool Extensive support from the framework; open source implementation; widely adopted. Phase 1: Implement GUI extensions for accommodating distributed threads and controlling remote applications. Other visual aids (call graphs, dynamic distributed thread visualization).
13 Limitations/considerations Works only with synchronous-call models; timeouts must be disabled for state inspections to work; currently tied to Java/JDI; could produce too much overhead; could fail miserably with some middleware thread handling schemes; limited support for remote process management; must yet be improved before attaining acceptable performance levels.
14 Availability More information (including source code) can be found at the project web site: http://eclipse.ime.usp.br/projects/DistributedDebugging Contributions are more than welcome. Thank you!
15 References [Burnett, 1997] M. Burnett et. al., A Bug's Eye View of Immediate Visual Feedback in Direct-Manipulation Programming Systems. In Empirical Studies of Programmers, Washington, D.C., October, 1997. [Garg, 1997] V. K. Garg. Methods for Observing Global Properties in Distributed Systems. IEEE Concurrency, Vol. 5, pages 69-77. October, 1997. [Li, 2003] J. Li. Monitoring and Characterization of Component-Based Systems with Global Causality Capture. 23rd ICDCS. Providence, Rhode Island, May 19-22, 2003. [Netzer, 1993] R. H. B. Netzer. Optimal tracing and replay for debugging shared-memory parallel programs. In Proc. Workshop on Parallel and Distributed Debugging, pages 1-10. 1993 [Tomlison, 1993] I. Tomlinson and V.K. Garg. Detecting Relational Global Predicates in Distributed Systems. In Proceedings of the 1993 ACM/ONR Workshop on Parallel and Distributed Debugging. pages 21-31, USA