Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,

Similar presentations


Presentation on theme: "Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,"— Presentation transcript:

1 Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester, UK @cs.man.ac.uk

2 Structure of Presentation Current integration proposals. What they support. What they dont support, and why. Requirements for integration. What could be useful, and why. Grid opportunities. Relevant Grid technologies. Absent Grid technologies.

3 Current Integration Proposals

4 Classification FeatureValues Data LocationIn-situ, Replicated, Reorganised Integration ModelNone, Relational, Semi- Structured, Object-Oriented ArchitectureThin Client, Client-Server, Multi-Tier Analysis SupportFunction Call, Query, Workflow

5 SRS Sequence Retrieval System http://srs.ebi.ac.uk/

6 SRS In Use List of Databases Search Interfaces Selected Databases

7 SRS Results Links to Result Records

8 Classification of SRS FeatureValues Data LocationReplicated Integration ModelNone ArchitectureThin Client Analysis SupportFunction Call, Query

9 BioNavigator BioNavigator combines data sources and the tools that act over them. As tools act on specific kinds of data, the interface makes available only tools that are applicable to the data in hand. Online trial from: https://www.bionavigator.com/

10 Initiating Navigation Select database Enter accession number

11 Viewing Selected Data Relevant display options Navigate to related programs

12 Chaining Analyses in Macros Chained collections of navigations can be saved as macros and restored for later use.

13 Classification of BioNavigator FeatureValues Data LocationReplicated Integration ModelNone ArchitectureThin Client Analysis SupportFunction Call, Workflow

14 Current Public Integration Systems Location: data is replicated – under control. Integration model: often minimal. Architecture: The architecture is often two- tier. Analysis support: Query and analysis access is carefully contained. Only very careful instantiation of the classification yields sufficiently predictable performance.

15 GIMS

16 GIMS – recent experience FeatureValues Data LocationReorganised Integration ModelObject-Oriented ArchitectureMulti-tier Analysis SupportFunction Call

17 Example Analysis Data: Yeast genome sequence. Protein-protein interaction data. 350 transcriptome experiments. Overall database ~350Mb. Analysis: Correlate transcription of interacting proteins.

18 Features of Experience Challenging to conduct single runs of analyses – must break into bits. These are modest data sets compared with what is coming. Environment has been designed with analysis in mind. These analyses will never make it into the public release!

19 Requirements for Integration

20 Location: replication is transparent. Integration model: standards. Architecture: Flexible, multiple tier. Analysis support: Arbitrary analyses over diverse data sets. True integration in bioinformatics should not just be data oriented, but involve integration of analyses.

21 Three Tier Architecture Clients handle user interaction and presentation. Application servers perform computation and analysis. Data servers manage and query databases. Client Application Server Data Server

22 Three Tier Architecture Scaleability: Replace/Upgrade components as needed. Replace/Upgrade layers independently. Flexibility: Application server layer protects clients from changes in database layer. Classical three tier architectures are configured statically, and are adapted slowly as needs evolve.

23 Grid Opportunities

24 Necessary and Missing Necessary: Directory services. Discovery services. Co-allocation. Data replication. Workload management. Accounting and payment. Missing: Databases. Data models. Heterogeneity resolution. Personalisation. Web services. Standards.

25 Dynamic Multi-Tier Client Application Server Data Server Application Server Application Server Data Server Resources need to be identified, selected and scheduled dynamically.

26 Grid Classification FeatureValues Data LocationIn-situ, Replicated Integration ModelNone ArchitectureMulti-Tier Analysis SupportFunction Call, Workflow The current Grid is not the answer, but the answer subsumes the current facilities of the Grid.

27 Summary Current integration facilities in biology: Are cunningly restrictive. Make the most of limited distributed computational architectures. The Grid is bringing to the table: Resource description facilities. Resource scheduling and workflow management facilities. The Grid does not directly address current needs in biology, but its descendents may.


Download ppt "Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,"

Similar presentations


Ads by Google