Extending FuGE into other domains Andrew Jones School of Computer Science, University of Manchester

Slides:



Advertisements
Similar presentations
XML III. Learning Objectives Formatting XML Documents: Overview Using Cascading Style Sheets to format XML documents Using XSL to format XML documents.
Advertisements

XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
® IBM Software Group © 2006 IBM Corporation Rational Software France Object-Oriented Analysis and Design with UML2 and Rational Software Modeler 04. Other.
Publishing Workflow for InDesign Import/Export of XML
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Understanding Networked Applications: A First Course Chapter 15 by David G. Messerschmitt.
©The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4 th Ed Chapter Software Development Software Life Cycle UML Diagrams.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Common Mechanisms in UML
Developing a Basic Web Page with HTML
EER vs. UML Terminology EER Diagram Entity Type Entity Attribute
4/20/2017.
Digital Images Chapter 8 Exploring the Digital Domain.
1//hw Cherniak Software Development Corporation ARM Features Presentation Alacrity Results Management (ARM) Major Feature Description.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
1 A pattern language for security models Eduardo B. Fernandez and Rouyi Pan Presented by Liping Cai 03/15/2006.
Peoplesoft XML Publisher Integration with PeopleTools -Jayalakshmi S.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 20 Object-Oriented.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
What is MOF? The Meta Object Facility (MOF) specification provides a set of CORBA interfaces that can be used to define and manipulate a set of interoperable.
Software Measurement & Metrics
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
N P O E S S I N T E G R A T E D P R O G R A M O F F I C E NPP/ NPOESS Product Data Format Richard E. Ullman NOAA/NESDIS/IPO NASA/GSFC/NPP Algorithm Division.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
IBM Software Group ® Overview of SA and RSA Integration John Jessup June 1, 2012 Slides from Kevin Cornell December 2008 Have been reused in this presentation.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Conceptual Modelling – Behaviour
FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica.
A language to describe software texture in abstract design models and implementation.
The european ITM Task Force data structure F. Imbeaux.
Virtual Medical Record Aziz Boxwala, MD, PhD March 12, 2013.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
ECSE Software Engineering 1I HO 4 © HY 2012 Lecture 4 Formal Methods A Library System Specification (Continued) From Specification to Design.
1 Class Diagrams. 2 Overview Class diagrams are the most commonly used diagrams in UML. Class diagrams are for visualizing, specifying and documenting.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Semantic Phyloinformatic Web Services Using the EvoInfo Stack Speaker: John Harney LSDIS Lab, Dept. of Computer Science, University of Georgia Mentor(s):
Representing data with XML SE-2030 Dr. Mark L. Hornick 1.
ModTransf A Simple Model to Model Transformation Engine Cédric Dumoulin.
Domain Model A representation of real-world conceptual classes in a problem domain. The core of object-oriented analysis They are NOT software objects.
Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta.
Games: XML Presented by: Idham bin Mat Desa Mohd Sharizal bin Hamzah Mohd Radzuan bin Mohd Shaari Shukor bin Nordin.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.
07 - OODCSC4071 OOA/OOD/OOP Example example OODCSC4072 Requirements See eg/req.htmleg/req.html Want a program to help a software company plan new.
TTCN-3 Testing and Test Control Notation Version 3.
ArrayExpress Ugis Sarkans EMBL - EBI
Semantic metadata in the Catalogue Frédéric Houbie.
Chapter 6 The Traditional Approach to Requirements.
Accomplishments RSM v0.7 First draft XML Schema completed: VOResource.xsd NVO: Working prototype resource using VOResource as format for metadata exchange.
Microsoft Access 2003 Illustrated Complete
Chapter 20 Object-Oriented Analysis and Design
Analysis models and design models
Semantic Markup for Semantic Web Tools:
Lightweight tools for on-line course development
NIEM Tool Strategy Next Steps for Movement
Presentation transcript:

Extending FuGE into other domains Andrew Jones School of Computer Science, University of Manchester

FuGE Status Functional Genomics Experiment-[OM/ML] Milestone 1 release: Sept 2005 –UML (Object Model) –XML Schema Milestone release being tested by MGED for MAGE-2 We are looking at testing it in PSI context –As a basis for GelML and Gel IML –For Investigation structure, sample prep etc.

mzData mzData v 1.05 is the current release Question: How do we integrate mzData into proteome workflows? –Details of sample prep –Other separation techniques FuGE could be used in different ways: –Giving mzData a context within a workflow OR –As a basis for future mzData versions

Experiment retrofitting mzData FuGE can store Software, Equipment, Contact mzData has similar concepts FuGE has Protocol / ProtocolApplication –This is the mechanism for storing parameters and parameter values –Slightly different from parameter definition in mzData There is not always a 1:1 mapping between mzData concept and FuGE concept

An mzData–FuGE hybrid format? mzData could sub-class from FuGE elements Would require some changes in structure –Some would appear relatively arbitrary Few major benefits at this stage, given that current structure is being implemented –May be worth considering for mzData v 2, if FuGE is successful in other domains

Referencing an mzData file with FuGE Protocol ProtocolApplication MaterialExternalData Protocol definition says “See ExternalData file for parameters” (rather than storing params in Protocol) mzData file File format definition Parser will exist to extract data / parameters from mzData file Material can be used to describe the sample. This connects the MS data with a separation workflow inputMaterialoutputData

Referencing an mzData file with FuGE

Referencing an mzData file with FuGE Reference to the Data object that lives elsewhere in the document

Referencing an mzData file with FuGE Reference to the Protocol object:

Referencing an mzData file with FuGE etc…

mzData conclusions Referencing an mzData file from FuGE will give mass spec data a context in a proteome workflow Retrofitting the model is likely to require structural changes –Not likely to be welcomed if it is already being implemented May be benefits for major future revisions using some FuGE concepts –Will be demonstrated by successful deployment of FuGE in other domains

Using FuGE for GelML, Gel IML

Existing Gel Models PEDRo Frank Gibson’s gel extension GELI Gla-PSI model (Jones 2003) AGML ( Swiss 2D PAGE In-house models from software companies … and various others that I’ve forgotten! What features do we want for GelML?

Extending FuGE into a gel model Concepts of FuGE to understand –Protocol / ProtocolApplication –Material –Data Should be extended with more specific types Software and Equipment could be extended –But use of ontology terms may suffice Demo models of gel separation and gel data –Mainly structural, fine details must be worked out in PSI context

Stage 1 - replace FuGE Root … This allows GelML to inherit all FuGE functionality

Stage 2 - represent a gel …

Stage 2 - represent a gel … Extend from FuGE Material type Add whatever attributes required

Stage 3 - represent gel separation A reference to the Gel material on which the separation takes place … closing tags

Example schema for a gel separation A reference to the Gel material on which the separation takes place … closing tags Extend from ProtocolApplication This allows definition of input and output samples

Example schema for a gel separation A reference to the Gel material on which the separation takes place … closing tags Reference the details of the gel used

Example XML for gel separation

Example XML for gel separation Define input materials for the separation Reference to the Material used as input specified elsewhere: etc.

Example XML for gel separation Reference the Protocol that defines this gel separation

Example XML for gel separation Reference the details of the gel used KDa 50000KDa

… Example XML – define output spots Could also define output materials e.g. PhysicalGelSpot

1D Gels Gel1D and GelLane defined separately GelLane has own ProtocolApplication to reference its inputMaterials The separation within a lane, produces PhysicalBands –These can go on for further treatment

Gel data: Gel IML Demo model of gel images and spots –Will demo very simple example here –More complex gel spot model can be discussed Some issues with representing data in XML

Representing a gel image … FuGE.ExternalData allows a reference to an external file Could give additional attributes by extending this base type

Example Gel Image XML Note: Pixel sizes are in fact redundant (this could be deduced from image)

<GelDataSet xmlns=" identifier="PSI.GelDataSet.1001" storage=" Matrix of values goes here or matrix could be in an external file "> Sample XML for Gel Spots

Ways to take this forward For separation (GelML) –Decide what additional attributes and associations required –Extend ProtocolApplication and Material For Gel IML –Extend Data, Dimension, DimensionElement –These give facilities for linking to Investigation structure –We have some demo models of these parts –Important – Involve software manufacturers!

UML or XSD? Depends on expertise of developers FuGE exists as UML and XML Schema XML Schema auto-generated from UML –Can build UML and output XML Schema OR –Extend directly from XML Schema Need weekly conference call involving interested participants

Benefits of using FuGE Format gets model of Investigation for free –Method for linking experimental factors to Data dimensions Protocol system is very flexible –E.g. for storing sample prep details Ontology reference system works for simple term list and complex concepts Future integration with other ‘omics

…. some other issues for discussion

Issues with multidimensional data in XML 2 ways of representing multi-dimensional data in XML: Option 1: Define data dimensions in XML and store data in a non-XML multi-dimensional array –Need more than just an XML parser to “understand” the data values Option 2: Define data dimensions and store values with the leaf elements ( DimensionElements ) –For large data sets, XML is a highly inefficient structure for representing multi-dimensional data

Option 1: Data Dimensions for gel electrophoresis Data Dimension (Variables) Dimension (Gel Spots) Dimension (Spot Measures) DE (2h) DE (4h) DE (1h) DE (Spot1) DE (Spot2) DE (YCoord) DE (Area) DE (XCoord) DE (Intensity) Matrix of values ordered DE (Spot3000) … DE (Spot3) ordered

Option 1: Data Dimensions for gel electrophoresis Data Dimension (Variables) Dimension (Gel Spots) Dimension (Spot Measures) DE (2h) DE (4h) DE (1h) DE (Spot1) DE (Spot2) DE (YCoord) DE (Area) DE (XCoord) DE (Intensity) Matrix of values ordered DE (Spot3000) … DE (Spot3) ordered Result is 3011 references 3 dimensions + 3 variables spots +4 spot measures +1 to matrix of values Coordinates define where in matrix to find value: e.g. 1 hour time point, spot 255, Area is at position [ ] Note: This does not imply that spots with the same ID number on different gels correspond with each other. We need a different mechanism for this! ordered

Option 2: Data Dimensions for gel electrophoresis Data DE (Spot3) DE (Spot1) DE (Spot2) DE (Spot3000) DE (2h) DE (4h) DE (1h) DE (Area) Value = [ ] DE (XCoord) Value = [ ] DE (Intensity) Value = [ ] DE (YCoord) Value = [ ]

Option 2: Data Dimensions for gel electrophoresis Data DE (Spot3) DE (Spot1) DE (Spot2) DE (Spot3000) DE (2h) DE (4h) DE (1h) DE (Area) Value = [ ] DE (XCoord) Value = [ ] DE (Intensity) Value = [ ] DE (YCoord) Value = [ ] Result is references 3 variables X3000 spots X4 spot measures Values defined with the DimensionElements As data volumes become larger, or the number of dimensions increase, using pure XML is a highly inefficient structure for representing this kind of data. Imagine extra variables are being tested: Time course (4 time points), drug dose (5 doses) and strain (5), average of 3000 gel spots, say 4 spot measures This is 4 X 5 X 5 X 3000 X 4 = 1.2 million references The XML parser would be very slow!! Compared with 3018 references in the first case… This works because the data matrix can be loaded into memory and can have direct access to positions.

Restriction vs. Extension Note FuGE is supposed to be used by extending core classes –i.e. giving additional attributes or associations May be cases where we want to restrict parts of FuGE –E.g. may not want a type of ProtocolApplication to have input or output data Problem is that there is no good way of doing this in XML or UML or code! –xs:restriction is not well supported for complex types –May be best to do this with documentation

Use of general relationships vs. specific Example: ProtocolApplication has input and output materials Is a Gel2D an “inputMaterial” or do we want a specific association from GelSeparation? Do we want a typed output – SeparatedGel, for use with a defined scanning event etc. Do we want to use outputMaterials association for SeparatedGel or a new specific association? –Should outputMaterials include gel spots… These are all discussion issues

Some other issues to solve How to relate spots across gels –May also be measures associated with these values (ratios, relative abundance, confidence indicators etc.) –This could be another Data object that contains some of the same Dimensions How to relate different channels on a DIGE gel

Data Dimension (RelatedSpots) Dimension (SingleSpots) Dimension (Spot Measures) RelatedSpotGroup 2CompVolume CompositeArea Matrix of values ordered RelatedSpotGroup 1 RelatedSpotGroup 3 SingleSpot1 SingleSpot2 SingleSpot3 Related Spots across gels ordered Typical value of SingleSpot = SpotGroup1.123 i.e. this is a reference to how to find the single spot in the other data set

Related Spots across gels For the microarray case, is it implied that the features match across dimensions – this is not true for gel electrophoresis One way to define it is to specify a list of composite spots

DIGE data in FuGE Data Dimension (DIGE Conditions) Dimension (DIGESingleSpots) Dimension (Spot Measures) Condition 2 Matrix of values ordered Condition 1 Spot1 Spot2 Spot3 Area Volume Composite SpotID Ratio