Presentation on theme: "The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway."— Presentation transcript:
The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway
But First… What is our purpose as a community? Produce (wonderful) new ideas Structure the field Educate the workforce
A Brief History of Federation Multibase @1980 Many attempts since Functional Relational Object-oriented Logic-based XML Still not solved (think of last night) And never will be?
Number 10: Robustness Systems fail Sources slow or unavailable In a distributed system, more pieces => more failures Users don’t like failures
Number 9: Security Different systems have different security mechanisms Hard to create a single coherent view of permissions Distributed systems are more vulnerable More points of failure Hard to make security guarantees Data is often the corporate jewels It must be protected
Number 8: Updates Recording change isn’t always an UPDATE Application semantics must be accounted for Application APIs must be reckoned with ACIDity isn’t always achievable Not all data sources display ACID properties Varying degrees of support Strong transaction semantics not always possible or appropriate And always painful Changes to multiple sources must be coordinated Requirements for consistency vary
Number 7: Configurability Many architectures possible Even with pre-existing sources, many choices Little or no guidance on tradeoffs Lots of code to install Federation engine, data source clients Often choices here Lots of connections to define Need tooling to support
Number 6: Administration Monitoring is hard Not all sources have facilities to track events Variety of mechanisms for different events, and different sources Not always APIs Tuning is difficult Need to understand what must change Need to take appropriate actions Repairing is painful Distributed debugging Different vendors to deal with for fixes
Number 5: Semantic heterogeneity Hard to identify commonalities Same terms, different meanings Different terms, same meaning Different structures representing different interpretations Can’t integrate data effectively without them Can’t make sensible queries
Number 4: Insufficient Metadata Need metadata to integrate, configure, administer and query Every data source has different metadata No uniform standard Not always collected Tools to examine and exploit missing
Number 3: Performance (Data Movement) Distributed queries involve moving data Geographic distribution is common WAN is slow Large data volumes common Large numbers of objects Large objects Caching isn’t a complete answer Changes can be frequent and hard to track Storage is not unlimited
Number 2: Performance (Complexity) Decision-support appls do complex queries Many choices for how to execute Big differences in performance among choices Need data from diverse sources May not have enough power in source Performance at sources may vary Need expensive functions of data Function may not be implemented everywhere Flowing the data to the function expensive
Number 1: Performance (Pathlength) Simple queries (OLTP-like) incur huge overheads Processing and networking costs Simple queries are common Easier to write Automatically produced Workflows
So Why Will Federated Succeed? It has to Integration one of the top IT issues And it’s not going away Alternatives are expensive and/or painful Write it by hand EAI/Workflow Consolidation (warehouse, data marts…)
So Why Will Federated Succeed? (2) Simple scenarios exist Don’t need OLTP, high security, great robustness, … for all applications Customers know their data, or must learn anyway Needs are so great, compromise is possible
So Why Will Federated Succeed? (3) Progress on technology being made 20 years of distributed query processing Plumbing in place Commit protocols Reliable messaging Connectivity infrastructure XML (basic community agreement) XML data format XML schema Web services We’re getting closer
What would we do if it ever did work? Retire Integrate the web? Data grids Data Google P2P database?
For Discussion Is research in this area warranted? What are the most important research topics? Did we miss any?