Presentation is loading. Please wait.

Presentation is loading. Please wait.

May 8, 2006 MAGE v1 and MAGE v2 Michael Miller Lead Software Developer Rosetta Biosoftware NCI MAGE Jamboree.

Similar presentations


Presentation on theme: "May 8, 2006 MAGE v1 and MAGE v2 Michael Miller Lead Software Developer Rosetta Biosoftware NCI MAGE Jamboree."— Presentation transcript:

1 May 8, 2006 MAGE v1 and MAGE v2 Michael Miller Lead Software Developer Rosetta Biosoftware NCI MAGE Jamboree

2 May 8, 2006 Overview Effective MAGE—using MAGE v1 –Perception vs. Reality –Parsing –Import MAGE v2 –FuGE –MAGE v2 Links Acknowledgements

3 May 8, 2006 Effective MAGE—using MAGE v1 Perception vs. Reality Perception –MAGE is too complicated to be used Reality –Gene expression experiments are complex –Any attempt to fully exchange the data and annotation for gene expression will be complex –Any attempt will need to gather together the information and export it and the receiving application will have to import it –These are not MAGE problems –There are many ways MAGE can be used effectively, some of which follow

4 May 8, 2006 Effective MAGE—using MAGE v1 Parsing Design Principles –UML class  Java class –XML Instances  Row in Java Instance Attributes are typed lists Associations are lists of lists Lists are backed by primitive arrays (not Object arrays) –Parsing dirt simple –Data Handling Abstract the cube Concrete the mapping –Parsing application neutral

5 May 8, 2006 Effective MAGE—using MAGE v1 Parsing Design Principles UML class  Java class –All classes derive from a common abstract class, MAGEcache –All associations derive from a common interface, MAGEobject

6 May 8, 2006 Java Classes Effective MAGE—using MAGE v1 Parsing Design Principles XML Instances  Rows in Java Instance –Each element representing an instance of a UML class becomes a row in the Java class –The attributes for that class are each set in the appropriate list on the attributes List –Each nested UML class element, when it is finished being parsed, passes its current row up through the association class where it is added to the appropriate associations List –The associations List is a List of Lists UML Classes

7 May 8, 2006 // standard SAX interfaces public void startElement( String namespaceURI, String localName, String qName, Attributes atts) { // check for ref boolean isRef = false; int index = -1; if (-1 != (index = localName.lastIndexOf("_ref"))) { isRef = true; // look up the actual class localName = localName.substring(0, index); } MAGEobject curObject = (MAGEobject) caches.get(localName); curObject.startElement(atts, (MAGEobject) stack.get(stack.size() - 1), isRef); stack.add(curObject); } public void endElement(String namespaceURI, String localName, String qName) throws SAXException { String pcData = chars.toString(); if (0 < pcData.trim().length()) { // Have #PCData so simulate startElement() and endElement() for the // container and class PCData AttributesImpl atts = new AttributesImpl(); startElement("", "PCData_assn", "PCData_assn", atts); atts.addAttribute("", "pcData", "pcData", "CDATA", pcData); startElement("", "PCData", "PCData", atts); // Do this now or the endElement() call will fall in here again! chars.setLength(0); endElement("", "PCData", "PCData"); endElement("", "PCData_assn", "PCData_assn"); } MAGEobject curObject = (MAGEobject) stack.remove(stack.size() - 1); curObject.endElement(); } Effective MAGE—using MAGE v1 Parsing Design Principles Parsing dirt simple –Implement SAX startElement() and endElement() startElement –Resolve Java class name and retrieve instance from the cache –Call MAGEcache.startElement() with attributes, the containing class and whether to treat the record as a reference. endElement –Special case the DataInternal class to treat #PCData –Call MAGEcache.endElement() to connect the containing class to the nested class.

8 May 8, 2006 Effective MAGE—using MAGE v1 Parsing Design Principles Data Handling –Abstract the DataCube as a linear list –Concrete a set of Lists grouped by BioAssays with a List per QuantitationType, with the each List’s size allocated to countDE –Depending on the BioDataCube.order attribute, set up two arrays, for example for order ‘DBQ’ dimSize[] = {countDE, countBA, countQT} dimIndices = {1, 0, 2} (where 0=BA, 1=DE, 2=QT} –Then loop for each value: values[] is the set of Lists per BioAssay per QuantitationType, nextValue is the next parsed value from the linearized DataCube (details left out) int counters[] = { 0, 0, 0 }; for( counters[0] = 0; counters[0] < countIndices[0]; counters[0]++ ) { for( counters[1] = 0; counters[1] < countIndices[1]; counters[1]++ ) { for( counters[2] = 0; counters[2] < countIndices[2]; counters[2]++ ) { values[counters[dimIndices[0]] * countIndices[2] + counters[dimIndices[2]]].set(nextValue,counters[dimIndices[1]]); } –Mathmagically the values will end up in the correct place no matter the order QT countQT DE countDE BA countBA

9 May 8, 2006 Effective MAGE—using MAGE v1 Parsing Design Principles Parsing application neutral –Well understood point –Can implement efficiencies such as sliding windows or delayed parse, as long as application logic remains separate from parsing

10 May 8, 2006 Effective MAGE—using MAGE v1 Import Design Principles –Mapping Between applications From MAGE to Application –Import Parsing produces in-memory MAGE Used in a similar way as a DOM interface From MAGE to MAGE –Pipelines MAGE tailored to source of data From small pieces comes completeness –Export Use startElement() and endElement() methods Methods per association to add an association to either a reference or an owned instance

11 May 8, 2006 Effective MAGE—using MAGE v1 Import Design Principles Mapping –Between applications For collaborative efforts or intradepartmental integration Known source and target First things first, determine where the source data and annotation will end up at the target WITHOUT considering how TableColumnTableColumn HYBRIDIZATIONSAMPLE_IDHYBPREP_ID PREFORMED_BY_IDOPERATOR_ID MACHINE_IDHARDWARE_ID SAMPLESAMPLE_NAMEPREPIDENTIFIER MACHINESERIAL_NUMBERHARDWAREUUID MAKEMODEL Mapping –From MAGE to Application Map between table/column and location in the MAGE file Problematic in those areas where choice is possible—mitigated by determining producer of the MAGE file TableColumnXPath HYBRIDIZATIONSAMPLE_ID//Hybridization/@identifier PREFORMED_BY_ID//Hybridization//ProtocolApplication//Perso n_ref/@identifier = //Person/@identifier and //Person/Roles_assnlist/OntologyEntry/@v alue = "Performed Hyb"? //Hybridization//ProtocolApplication//Per son_ref/@identifier MACHINE_ID//Hybridization//ProtocolApplication//Ha rdwareApplication//Hardware_ref/@iden tifier

12 May 8, 2006 Effective MAGE—using MAGE v1 Import Design Principles Import –Parsing produces in-memory MAGE Application neutral Can have various handlers defined: –Translate into a different representation in memory (see MAGE to MAGE) –Adjust contents to application specific requirements –Save to database by applying mapping rules –Used in a similar way as a DOM interface Not DOM but handlers can traverse the structure and obtain contents for any instance of a class that was in the XML document –From MAGE to MAGE Between a model where a single Java class for all instances and a model where there is one Java class per instance (the current STK) –Generate a method that takes the representation of the other Between a tab-delimited representation and Java representation (MAGE-TAB to MAGEstk)

13 May 8, 2006 Effective MAGE—using MAGE v1 Import Design Principles Pipelines –MAGE tailored to source of data The mapping then becomes between the source and the target Export is defined by the target MAGE mapping Developer on export side doesn’t need to know MAGE, it is an exercise in string formatting <Hybridization identifier = "{PREFIX}:Hyb:{HYB.IDENTIFIER})" name = "{HYB_NAME}"> <Protocol_ref identifier="{PROTOCOL.IDENTIFIER}" name="{PROTOCOL.COMMON_NAME}"/>

14 May 8, 2006 Effective MAGE—using MAGE v1 Import Design Principles Pipelines –From small pieces comes completeness Business rules for an application define the order of import and what place holders can be created for records that aren’t received yet. Pipelines for different packages/elements that make up an experiment, for example: One or more ArrayDesign XML documents An Experiment/ExperimentDesign XML document Multiple Array XML documents Multiple MeasuredBioAssay/MeasuredBioAssayData (from feature extraction) Multiple PhysicalBioAssay/Image XML documents Multiple BioMaterial XML documents Once pipelines for a particular experiment are all run, the experiment is completed Not even a necessity to use just MAGE format or just XML format, identifier attributes provide the glue


Download ppt "May 8, 2006 MAGE v1 and MAGE v2 Michael Miller Lead Software Developer Rosetta Biosoftware NCI MAGE Jamboree."

Similar presentations


Ads by Google