Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.

Similar presentations


Presentation on theme: "1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University."— Presentation transcript:

1 1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University

2 2 Need to bring data together To achieve breadth (e.g. coverage of more organisms) To achieve depth (e.g. more complete data on individual species)

3 3 Merging 1.The original databases are physically copied into a new combined database. 2.The user interacts with the new combined database.

4 4 Linking 1.Original databases remain separate, but accessed via a single system such as a portal 2.The user interacts with an access system which does not itself contain data. When the user requests data, it is fetched from the appropriate database.

5 5 Basic problems to solve How to deal with data that is: On various database management systems Distributed across various machines Distributed across various machines of various types Based on various schemata (i.e. not all data expressed in the same form) Also, how to resolve data quality problems: –taxonomists vary in their opinions –large taxonomic treatments are generally inconsistent –individual databases generally have mistakes –(So we need tools to help biologists detect and resolve such problems, such as LITCHI – not today’s topic!)

6 6 Essential elements of solution Ways of setting up communication between components Ways of expressing data suitably for it to be communicated between components Ways of describing and finding components such as data sources

7 7 Setting up communication Possibilities include: CGI (Common Gateway Interface) – style HTTP requests (A standard for communicating requests to Web servers) Z39.50 (A standard for digital libraries) Web Services DiGIR

8 8 CGI-style HTTP requests Simple way of passing parameters in one variant (GET), parameters expressed as part of the URL, e.g. http://www.ildis.org/LegumeWeb?genus~Sabinea&species~florida (NB: POST preferred) Result: an HTML page (see next slide)

9 9

10 10 “CGI” approach: strengths & limitations Easy to set up Not good for complex data HTML is basically a formatting language, for saying how documents should be displayed, not what they contain But we can pass around XML too –E.g. SPICE –Also HTTP is the basis of SOAP (see later)

11 11 SPICE Species 2000 Interoperability Cooperation Environment Allows choice between –HTTP GET/XML response (essentially the CGI approach, but retrieving XML) –CORBA Uses wrappers to transform to common data model & SPICE protocols

12 12 SPICE architecture User (Web browser).......... Wrapper (e.g. JDBC) Wrapper (e.g. CGI).......... GSD CAS (Common Access System) User server module (HTTP) CAS knowledge repository ‘Query’ co-ordinator CORBA (In some cases, generic) CORBA ‘wrapping’ element of GSD wrapper

13 13 Z39.50 A standard for digital libraries (Most library systems are built around this standard) For interoperability in client-server architectures Standardised sets of attributes (items of data)

14 14 Z39.50 strengths & limitations Standard for digital libraries Works well for certain widespread, agreed data standards (‘profiles’) Very restrictive if you want to add on things like extra security Useless in cases where a data standard doesn’t yet exist

15 15 Web Services Web services provide a simple way of making software available on the Internet. All the communications in this diagram are SOAP messages Service Provider Service Consumer Service directory (e.g. UDDI) Register service description (WSDL) Query responses (WSDL) Directory query XML service request, based on WSDL XML service response, based on WSDL

16 16

17 17

18 18

19 19 DiGIR Proprietary approach, especially designed for specimen records Uses Darwin Core data model The following slide is the DiGIR team’s high-level architecture diagram...

20

21 21 XML (eXtensible Mark-up Language) Flexible mark-up language Like HTML, but tags describe the document’s contents, not how it’s to be displayed. XML is the basis of SOAP: ‘language independent’, i.e. a good data interchange format.

22 22 <!DOCTYPE SPECIESLIST[ ]> Vicia Faba Sabinea punicea Urban Simplified Species 2000 example

23 23 Dimensions of interoperability System Syntactic Structural Semantic

24 24 Syntactic interoperability – some problems Differences in machine-readable aspects of data representation (formatting), e.g. Vicia faba Faba faba … GenusEpithet ViciaFaba …

25 25 Syntactic interoperability – some solutions Typically fairly easy to write converters between formats “Wizards” (if we’re going to do data preparation first) XSLT (transforming between XML documents holding same information in different formats)

26 26 Structural interoperability – some problems Representational heterogeneity that involves data modelling constructs Schematic heterogeneity For example …

27 27 Structural heterogeneity example IdName :: 25Vicia faba 26Faba faba :: IdGenu s Epithet ::: 25429 26449 ::: IdGenusName :: 42Vicia 43Abrus 44Faba :: IdEpithetName :: 8vulgaris 9faba :: Database 1 Database 2

28 28 Structural interoperability – some solutions Database views XSLT (to some extent) Metadata & ontologies (associate terms in data sources with those in a shared vocabulary) “Wrapping” to map between heterogeneous data sources and a shared representation (common data model)

29 29 Semantic interoperability – some problems Specimen distribution data example –Database A holds data for Vicia faba –Database B holds data for Faba faba Descriptive data example –Database A: leaf length varies from 25.4 to 76.2 mm –Database B: average leaf length 2 in

30 30 Semantic interoperability – some solutions It’s not an entirely solved problem! Useful general techniques: –Use of ontologies (defining relationships between terms, e.g. units) –Mapping functions –Attached metadata –… Domain-specific techniques –“Synonymy server” –LITCHI (as an integration tool) –…

31 31 Ontologies Agreed terminology Relationships between terms Example use: integrator can associate terms in a source database schema with those in an agreed federation schema

32 32 Summary Interoperation among distributed resources is essential for ‘added value’ Techniques exist for dealing with –communication between heterogeneous systems (e.g. Web Services; wrapping) –communication between systems with heterogeneous data (e.g. ontologies) But not all the problems are solved!


Download ppt "1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University."

Similar presentations


Ads by Google