Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Similar presentations


Presentation on theme: "Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl."— Presentation transcript:

1 Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl. Workshop on Incorporating COTS Software into Software Systems (IWICSS 2007)

2 Agenda Research Problem and Importance Our Approach –Classification –Selection –Analysis Evaluation –Precision, Recall, Accuracy Measurements Related Work Conclusion & Future Work

3 Research Problem and Importance Content repositories are growing rapidly in size At the same time, we expect more immediate dissemination of this data How do we distribute it… –In a performant manor? –Fulfilling system requirements? ?

4 Software Architecture The definition of a system in the form of its canonical building blocks –Software Components: the computational units in the system –Software Connectors: the communications and interactions between software components –Software Configurations: arrangements of components and connectors and the rules that guide their composition

5 Data Distribution Systems Data Producer Data Consumer data ??? data Connector Insight: Use Software Connectors to model data distribution technologies Component

6 Data Movement Technologies Wide array of available OTS “large- scale” connector technologies –GridFTP, Aspera software, HTTP/REST, RMI, CORBA, SOAP, XML-RPC, Bittorrent, JXTA, UFTP, FTP, SFTP, SCP, Siena, GLIDE/PRISM- MW, and more Which one is the best one? How do we compare them –Given our current architecture? –Given our distribution scenarios & requirements?

7 Research Question What types of software connectors are best suited for delivering vast amounts of data to users, that satisfy their particular scenarios, in a manner that is performant, scalable, in these hugely distributed data systems?

8 Data Distribution Problem Space

9 Broad variety of distribution connector families P2P, Grid, Client/Server, and Event- based Though each connector family varies slightly in some form or fashion –They all share 3 common atomic connector constituents Data Access, Stream, Distributor Adapted from Mehta et al.’s Connector Taxonomy

10 Connector Tradeoff Space Surveyed properties of 13 representative distribution connectors, across all 4 distribution connector families and classified them –Client/Server SOAP, RMI, CORBA, HTTP/REST, FTP, UFTP, SCP, Commercial UDP Technology –Peer to Peer Bittorrent –Grid GridFTP, bbFTP –Event-based GLIDE, Sienna

11 Large Heterogeneity in Connector Properties

12 How do experts make these decisions? Performed survey of 33 “experts” Experts defined to be –Practitioners in industry, building data-intensive systems –Researchers in data distribution –Admitted architects of data distribution technologies General consensus? –They don’t the how and the why about which connector(s) are appropriate –They rely on anecdotal evidence and “intuition” 45% of respondents claimed to be uncomfortable being addressed as a data distribution expert.

13 Our Approach: DISCO Develop a software framework for: –Connector Classification Build metadata profiles of connector technologies, describing their intrinsic properties (DCPs) –Connector Selection Adaptable, extensible algorithm development framework for selecting the “right” connectors (and identifying wrong ones) –Connector Selection Analysis Measurement of accuracy of results –Connector Performance Analysis

14 DISCO in a Nutshell

15 Building DCPs of all 13 connectors (Classification) Rely on Mehta et al. metadata to describe data distribution connectors Carefully select metadata to include/exclude

16 Develop complementary selection algorithms

17 Preliminary Evaluation We developed 13 connector profiles –Based on literature, expert reviews, and our own development experience 30 distribution scenarios 24 score functions (white box) and Bayesian domain profiles with 100 conditional probabilities (black box) Connector Profiles Distribution Scenarios Answer KeyScoreBayesian DISCO Precision-Recall Analysis Clustering

18 Precision-Recall Results Error Rate –Probability of incorrectly labeling a connector as appropriate for a scenario Precision –The fraction of selected connectors appropriate for a scenario Recall –Probability of detecting a connector as appropriate for a scenario BayesianScored-based True Positive (TP)10163 False Positive (FP)25200 True Negative (TN)24567 False Negative (FN)1960 BayesianScored-based Error Rate11.28%32.56% Precision80.16%48.46% Recall25.90%16.15%

19 Related Work

20 Conclusions & Future Work Conclusions –Domain experts (gurus) rely on tacit knowledge and often cannot explain design rationale –Disco provides a quantification of & framework for understanding an ad hoc process –Bayesian algorithm has a higher precision rate Future Work –Explore the tradeoffs between white-box and black- box approaches –Investigate the role of architectural mismatch in connectors for data system architectures

21 Thank You! Questions?


Download ppt "Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl."

Similar presentations


Ads by Google