Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to maxdLoad2 – EnvGen / Presentation Overview Phase One: Introduction to the software Modelling microarray.

Similar presentations


Presentation on theme: "Introduction to maxdLoad2 – EnvGen / Presentation Overview Phase One: Introduction to the software Modelling microarray."— Presentation transcript:

1 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Presentation Overview Phase One: Introduction to the software Modelling microarray experiments The user interface Phase Two: Bulk data loading Customising the database MAGE-ML export

2 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / maxdLoad2 : An extensible, MIAME-compliant database for microarray experiments A database schema and a software application. The second-generation of maxdLoad. Integrated data loading, browsing, editing and searching. Written in Java™, runs on most computers… Supports any SQL92 database: Oracle, MySQL, Postgres, Sybase, Firebird

3 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Main Features Loading, browsing, editing and searching. Extensible: customisable attributes for each part of the schema. MIAME data capture. MAGE-ML data export.

4 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Evolution of maxdLoad2 The ‘maxd’ software has been in development since 2000. The analysis and visualisation suite ‘maxdView’: Is based on a modular design - new features can be added as ‘plugins’. Lots of normalisation, filtering and plotting features are provided. The database component, maxdLoad was based on the EBI’s original “ArrayExpress” reference model. In maxdLoad2, the database design has been modified to more closely correspond to MIAME and MAGE concepts. The major advance is the customisable/extensible attribute mechanism – this feature is being used for rapid prototyping by the MIAME/Env project

5 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / MIAME and the MGED Ontology The default configuration is designed to capture all of the meta-data required by the MIAME specification. Where possible, the terminology defined by the MGED Ontology is used, e.g: HardwareType: DNA_sequence, homogenizer, wash_station, hybridization_chamber, vortexer, … Numerical values use the units supported by the MAGE Object Model.

6 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / System Architecture maxdLoad2 is NOT accessed via a web- browser It is a stand-alone application, written in Java (this makes it very portable). maxdLoad2 and the database server can run on the same machine, no network connection or web server is needed. However, maxdLoad2 and the database server can be on separate machines connected via a network. maxdLoad2 Database Server (e.g. Oracle, MySQL) Data

7 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Presentation Overview Phase One: Introduction to the software Modelling microarray experiments The user interface Phase Two: Bulk data loading Customising the database MAGE-ML export

8 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Microarray Experiment Workflow A typical microarray experiment is a sequence of steps starting with one or more ‘BioMaterials’ and ending up with a big pile of numbers. These steps can be thought of as transformations: material A + treatment = material B and combinations: image + scanning = data Each of the steps needs to be recorded in the database. Many of the steps will be standardised, for example, the protocol used for labelling. They will only have to be defined once. Labelling Treatment(s) Material Hybridisation Scanning Data Labelling Treatment(s) Material time

9 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Why record everything? The more meta-data that is captured, the better the chance of explaining things when it all goes wrong! Expression of Gene X Expression of Gene Y Healthy Diseased Different protocol? Different person? Different hardware?

10 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Why all this structuring? What’s wrong with just describing what happened as a nice big document? It is very hard for software to understand the process and therefore difficult for the software to behave intelligently, or to assist the user in any way It makes reusing common bits of the description tricky – a general rule of thumb is “reuse is good, cut-and- paste is bad” Structured Objects hard to generate, easy to understand Free Text easy to generate, hard to understand

11 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / What is the database made of? BioMaterials Sources,Samples,TreatedSamples,Extracts,LabelledExtracts Protocols SamplingProtocol, ScanningProtocol, LabellingProtocol, etc. Arrays ArrayTypes, Features, Reporters & Genes Hybridisations Experiments, Measurements, Images & Hybridisations

12 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Bio-Materials Source the original organism, population or culture Sample some portion of a Source that has been isolated TreatedSample a Sample that has had something done to it Extract a portion of a TreatedSample selected for analysis LabelledExtract a TreatedSample that has been prepared for hybridisation These elements are generally constructed in the order shown above. The methods used in preparation and production are recorded using their associated ‘Protocol’ elements.

13 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Bio-Materials : Modelling An Experiment The various elements can be plugged together in different ways to represent the way the experiment is constructed. Components are wired together in ‘reverse’ order; connections are based on where things came from, rather than on the sequence in which they were generated. Pooling and splitting operations are represented by having one instance linked to more than one other instance, or vice versa. Source Sample TreatedSample Extract LabelledExtract

14 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Bio-Materials : Modelling An Experiment Source Sample TreatedSample Extract LabelledExtract Source Sample TreatedSample Source Sample TreatedSample A more complicated example; here three TreatedSamples are pooled prior to labelling:

15 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Bio-Materials : Modelling An Experiment Source TreatedSample Extract LabelledExtract Sample TreatedSample Extract LabelledExtract Extract LabelledExtract Extract LabelledExtract Extract LabelledExtract Extract LabelledExtract An even more complicated example; 3 Sample are taken from the same Source, and each split in 2:

16 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Protocols TreatedSample “Shocked” TreatedSample TreatmentProtocol “heat_shock” TreatmentProtocol “wait 20 minutes” TreatedSample Extract “Control +20 minutes” Extract “Control +40 minutes” Extract “Shocked +20 minutes” Extract “Shocked +40 minutes” TreatedSample “Control” TreatmentProtocol “wait 40 minutes” TreatmentProtocol “do nothing” Sample The Protocol links explain why the BioMaterial components have been connected in the way they are. A A A A A A A Represents the “application of a protocol”

17 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Arrays and ArrayTypes The hyridisation platform itself is modelled using the ArrayType, Array,Feature, Reporter and Gene elements. The ArrayType records information about the microarray design, such as what it is made of, and where it came from. Each spot on an array is modelled as a “Feature” element (which is linked to the ArrayType). The Array represents a particular instance of an array design. ArrayType Array Feature …

18 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Features, Reporters and Genes Each “Feature” can be linked to a “Reporter”, the “Reporter” represents the biological entity (such as an oligo, cDNA clone or EST) that has been synthesised or printed in that “Feature”. Each “Reporter” can be linked to any number of “Gene” elements. The “Gene” is a general purpose representation of the entity that is being detected by that “Reporter”. Gene Reporter ArrayType Feature Gene …

19 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Features etc. - Replicates and Composites More than one Feature can be linked to the same Reporter (replicate spots). Many Reporters can be linked to the same Gene (for composite sequences, i.e. many short oligos detecting the same sequence). The same Reporter can occur on more than one ArrayType. Reporter ArrayType Feature Row 34, Col 17 Feature Row 19, Col 28 Feature Row 3, Col 91 Gene Reporter

20 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Hybridisations (and beyond) A ‘Hybridisation’ is the combination of an Array, one or more LabelledExtracts and a HybridisationProtocol. A scanner generates one or more ‘Image’s from the ‘Hybridisation’ ‘Image’s are processed to generate ‘Measurement’s An ‘Experiment’ is a collection of related ‘Measurement’s Array LabelledExtract Hybridisation Measurement Image LabelledExtract … Experiment (protocols are omitted from this diagram) Measurement

21 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Hybridisations – Storing the data A ‘Measurement’ represents the collection of results from analysing the scanned image of microarray after hybridisation. ‘Measurements’ can have any number of ‘Property’s can be associated with them. Each ‘Property’ corresponds to one column in the file that came from the scanner (or to data generated by subsequent data analysis such as normalisation). Property Measurement Property

22 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Presentation Overview Phase One: Introduction to the software Modelling microarray experiments The user interface Phase Two: Bulk data loading Customising the database MAGE-ML export

23 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Connecting to a database Connecting to a database requires the following information: The ‘Database’, which identifies the machine and server that is hosting the database, and the name of the database (one server might be hosting more than one database). The ‘Driver File’ & ‘Driver Name’ which tells maxdLoad2 which database driver to use (these drivers are database specific). The ‘User Name’ and ‘Password’ identify which account on the database server should be used. Information about one or more connections can be saved and accessed from the list on the left-hand side. The built-in help system provides full details on how to set up a new database connection.

24 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / The User Interface

25 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / The User Interface These buttons control which mode the software is in (create, browse, find, edit or load) These buttons are used to open the form used to input or explore the data for each of the database components The arrows show how the components are interconnected These buttons access the other main features: import, export, options and the built-in help system.

26 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / The User Interface Clicking on one of the boxes opens a form in which the full details of an instance can be viewed (and edited)

27 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / The User Interface There are three types of field: names, links and attributes. Names are used to identify instances. Links are used to combine instances together, for example: Attributes are used to store all other data about the instance. LabellingProtocol LabelledExtract Extract

28 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / The User Interface The links between instances are defined either by selecting the item from lists (for an existing item) or by recursively filling in another form (for a new item). Attribute data is entered by typing directly into the fields. Useful information about attributes (a description, a list of legal values) can be found by clicking on the attribute name. Clicking on the attribute name also provides access to a ‘quick copy’ function in data entry modes. Name Links Attributes

29 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / The Navigator Tree A representation of the schema as a hierarchy can also be displayed (in a separate window). This view shows all of the links from one instance to all others. Instances can be selected by clicking on their name. When multiple links exist between instances (e.g. 1 ‘Extract’ linked to 5 ‘Sample’s), individual links are highlighted as the mouse passes over them.

30 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / The Navigator Tree The navigator view is useful during instance creation as an aid in keeping track of which instances have been provided and which have not. The red line shows the path taken to the current form. As instances are selected or created, they are tagged with green dots, and their names are shown. Instances which have not yet been specified are tagged with yellow dots. Current form Completed form Pending form

31 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Create Mode Required fields are coloured differently to optional fields. All of the required fields must be completed before the new instance can be created. Links to other instances which have recently been defined are chosen from pull down lists. Required field Optional field If a link to instance which has not yet been defined is required, the ‘Create New’ button opens a new form which is used to define the new instance… Pull-down list

32 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Create Mode A series of ‘Create Mode’ forms will be stacked on top on one another as each of the instances is defined. As each instance is created, it’s form is removed and the form underneath it re-appears (now including a link to the newly created instance). In this way, the user can navigate the schema in the order that suits them, without cluttering the screen with incomplete forms. (the stacking effect is not actually visible in the application)

33 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Linking instances together Step 1. Create a new ‘Source’: Choose “Create” mode. Press the “Source” button to open a new blank form. Fill in all of the relevant fields. Press “Create” to store the new instance. (the top-level panel reappears once the instance has been created)

34 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Step 2. Create a new ‘Sample’ (which will be linked to the ‘Source’) Press the “Sample” button to open a new blank form. Provide a name for the new ‘Sample’ Use the “Select” button to open a list of ‘Sources’ and pick the one that was created in the previous step. Now a “SamplingProtocol” needs to be created; press the “Create New” button to open a suitable form. Linking instances together (2)

35 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Step 3. Create a new ‘SamplingProtocol’ Fill in the ‘SamplingProtocol’ form and press “Create” to add it to the database. The ‘Sample’ form will reappear, and the link to the ‘SamplingProtocol’ will now contain the name of the newly created protocol. Fill in the rest of the ‘Sample’ form and press “Create” to add the new ‘Sample’ to the database. Linking instances together (3)

36 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Browse Mode ‘Browse Mode’ is used for exploring the database and for examining the links between instances. Instances can be examined by selecting them from a list. The list can filtered by name to make it easier to find a particular instance The list can be sorted in chronological or alphabetical order.

37 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Browse Mode: Find Linked The “Find linked” feature can find the connections between the current instance and instance(s) in any other table (no matter how indirect the connection). Example: Find the Scanning Protocol(s) linked to Submitter “Fred” 1. Find the Experiment(s) which were submitted by “Fred”, 2. Find the Measurement(s) in those Experiments, 3. Find the Image(s) used by those Measurements, 4. Find the ScanningProtocol(s) used by those Images.

38 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Find Mode ‘Find Mode’ is used to search for instance(s). Instances can be located by specifying any combination of: One or more of their linked instances One or more attribute values (partial values are allowed) All or part of the name This is done by filling in one or more fields in a form. The collection of matching instances is then displayed using a ‘Browse Mode’ form.

39 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Edit Mode ‘Edit Mode’ is a combination of ‘Create Mode’ and ‘Browse Mode’. The ‘Edit Mode’ is essentially exactly same as that of ‘Create Mode’. Names, links and attribute values can all be edited. (Note that changing the name of an instance will not break any links that have been established with that instance) Warning! No audit trail is kept, once a value is changed, the previous value is lost forever.

40 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Deleting instances Deletion is safe. Database integrity is preserved because deletion is not allowed if the instance is currently ‘in use’, i.e. it is linked to by some other instance. Deletion can be ‘cascaded’. Deleting a ‘parent’ enables deletion of it’s children, e.g. if an ‘Experiment’ is deleted, all instances in every other table linked to the ‘Experiment’ can also be deleted automatically.

41 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Presentation Overview Phase One: Introduction to the software Modelling microarray experiments The user interface Phase Two: Bulk data loading Customising the database MAGE-ML export

42 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Loading data from files In addition to entering data by hand using ‘Create Mode’, it is also possible to create instances by extracting data directly from a text file or Excel spreadsheet. Data is extracted by tagging which lines and columns of the data source are ‘interesting’. The ‘Load Mode’ forms are essentially the same as ‘Create Mode’ forms. However, instead of supplying final values for things, the column(s) containing the values are identified. As this process can be automated, it is useful for integrating maxdLoad2 with other lab software, especially LIM systems.

43 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Loading data from files – The User Interface File parser settings Identifies which lines of the data file should be ignored and which should be processed. Data value settings A set of Column Specification rules which indicate how to extract data from one or more columns, optionally applying filtering and translation effects. Presets Easy saving and reuse of settings

44 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Loading data from files – File Parser Settings File Parser Settings These settings describe how the file should be converted into a row/column matrix Text Encoding (most files are US-ASCII, but some are not…) Delimiter (how to split lines into columns) Ignore first, ignore last (skip header and footer lines) Ignore until, ignore after (regular expressions identifying start and end lines) Ignore beginning (to detect comment lines)

45 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Loading data from files – Data Value Settings Column Specification These settings describe how the data values are extracted from the row/column matrix Default and unwanted values can be specified (on a per-column basis) Values can be formed by combining multiple columns ‘Regular Expressions’ can be used to modify the data format (e.g. changing ‘11/31/02’ to ‘31-11-02’) Values can be ‘translated’ (substituting one value for another) Values can be converted to upper- or lower-case

46 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Loading data from files - Presets Once a set of column specifications and parser options have been determined, they can be saved as a “Preset”. These settings can then be easily recalled next time a file with the same format is encountered. “Presets” are stored as plain-text files which can be shared between users.

47 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Loading data from files : Examples $6 IDParentBirthSexWeightOwner 005a003a03/10/2002F45gJohn 006a09/09/2002M47gPaul 005b003b15/10/2002F61gMark 006b003b27/09/2002M56gLuke ……………… 12345 John Paul Mark Luke … 005a – 003a 006a – 005b – 003b 006b – 003b … Source Data FileColumn SpecificationExtracted Data $1 - $2 $2{default=“UNKNOWN”} 0003a UNKNOWN 003b … 6

48 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Loading data from files : More Examples female male female male … Source Data FileColumn SpecificationExtracted Data $4{“F”->”female”}{“M”->”male”} IDParentBirthSexWeightOwner 005a003a03/10/2002F45gJohn 006a09/09/2002M47gPaul 005b003b15/10/2002F61gMark 006b003b27/09/2002M56gLuke ……………… 12345 6 005a:003a/John 006a:?/Paul 005a:003b/Mark 006b:003b/Luke … $1:$2{default=“?”}/$6

49 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Loading data from files : Even More Examples 10 09 10 09 … Source Data FileColumn SpecificationExtracted Data $3{regex=“2,[0-9]+”} 45 47 61 56 … $5{regex=“1,[0-9]+”} IDParentBirthSexWeightOwner 005a003a03/10/2002F45gJohn 006a09/09/2002M47gPaul 005b003b15/10/2002F61gMark 006b003b27/09/2002M56gLuke ……………… 12345 6

50 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Automated Data Loading A data loading script (in an XML format) can be used to automate the process of loading data from one or more files. The format is exactly the same as the column specification method used in manual data loading. Existing preset settings can be used directly. Loading scripts could be generated automatically.

51 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Loading “Expression” Data The actual expression data (i.e. the results of the processing the scanned image) are also loaded using the bulk data loading system. A collection of values (one per Feature) is called a ‘Property’ All of the data manipulation methods described previously are available when loading the expression data (this is useful for handling missing values).

52 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / ‘Measurement’s & ‘Property’s A ‘Measurement’ represents the collection of results from analysing the scanned image of microarray after hybridisation. ‘Measurements’ can have any number of ‘Property’s can be associated with them. Each ‘Property’ corresponds to one column in the file that came from the scanner (or to data generated by subsequent data analysis such as normalisation). Extra ‘Property’s can be added to an existing ‘Measurement’ at any time. Property Measurement Property

53 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Measurement meta-data Quite a lot of information is required to define a ‘Property’: - QuantitationType (Signal, Ratio, PValue, Error, DerivedValue, etc…) - Scale (Linear, Log, FoldChange, etc…) - Unit (Pixels, Percent, etc…) - Origin (Feature or Background) The ‘Property’ can optionally be linked to a ‘LabelledExtract’ instance which indicates from which channel the data was measured. LabelledExtract Property Measurement 1..n 0..1

54 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / MAGE Data Format The ‘Property’ element is designed to be directly compatible with the corresponding entities in MAGE Object Model. When loading ‘Property’ data, the user is forced to provide sufficient detail for the data to be describable using the MAGE-OM. This ensures that the data can subsequently be uploaded to public repositories without further annotation.

55 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Presentation Overview Phase One: Introduction to the software Modelling microarray experiments The user interface Phase Two: Bulk data loading Customising the database MAGE-ML export

56 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Extensible Attributes maxdLoad2 provides fully customisable attributes for each table. The description of the attributes is stored in the database itself. Attribute descriptions can be referenced via URLs to facilitate sharing. Attribute description can be altered without ‘breaking’ the database.

57 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Attribute Descriptions The attributes that will appear for each schema element are defined using an XML-based syntax: ….

58 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Types of Attribute Free text string (multiple lines) List (multiple selection) Toggle Integer (value is type-checked) Grouping of fields Horizontal layout Free text string (single line) Comment

59 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Attribute Descriptions Attributes can be tagged as OPTIONAL or REQUIRED. A default value can be provided. Integer and double attributes are type checked, and illegal values are indicated. Comment text can be displayed alongside the attributes. <Integer name=“Weight” completion=“REQUIRED” minimum=“1” comment=“Specimen weight in grams” />

60 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / External References Attribute descriptions can refer to external elements using a HTTP URL: This makes it easy to access predefined ‘standard’ attribute definitions (such as the set of attributes which implement the MGED Ontology terms) … …

61 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Adding New Attributes A new attribute is added by indicating where it should appear relative to an existing attribute: Deleting attributes is easy:

62 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Presentation Overview Phase One: Introduction to the software Modelling microarray experiments The user interface Phase Two: Bulk data loading Customising the database MAGE-ML export

63 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / MAGE-ML Export MAGE-ML is the ‘official’ standard for microarray data files. maxdLoad2 has an extensible mechanism for describing how attributes in the database get converted into MAGE-ML. Each of the schema elements (tables) has an associated output template which is used to generate MAGE-ML data for that element. A set of output templates which match the MIAME attribute definitions is provided as standard. When new attributes are added to an element, the template can be modified so that they will be included in MAGE-ML outputs.

64 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Output Templates The output templates are a mixture of three components: The desired ‘raw’ output (MAGE-ML tags for example). Variables which correspond to instances and attributes in the database. Control flow ‘commands’ for conditional and iterative actions.

65 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Output Template Example This is portion of a template used to generate the MAGE-ML for describing an ‘Image’ instance. It instructs maxdLoad2 to generate a ‘BioMaterialMeasurement’ for each of the ‘LabelledExtract’ instances that are linked to the ‘Image’. The variables will be replaced with values extracted from the database as the output file is created. Control Flow ‘Raw’ Output Variable

66 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Output Templates Most users will never see (or want to see) the MAGE-ML output templates. They only need to be manipulated when new attributes have been added and these new attributes are required to appear in the MAGE-ML that maxdLoad2 creates. In most cases, extending the output templates will be a simple cut & paste operation. Completely different output templates could potentially be defined for exporting data in other formats (e.g. for import into a LIMS, or for automated web-page generation).

67 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Presentation Overview Phase One: Introduction to the software Modelling microarray experiments The user interface Phase Two: Bulk data loading Customising the database MAGE-ML export

68 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Availability maxdLoad2 is an open-source product, released under the Perl Artistic Licence. The latest version is available at: http://www.bioinf.man.ac.uk/microarray/ We would like to encourage people to contribute to testing and to future development work. Please join our mailing list for announcements maxd_info@ecartis.cs.man.ac.uk

69 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Useful Web Resources The MGED Society http://www.mged.org The MIAME Specification http://www.mged.org/Workgroups/MIAME/miame_1.1.html The MGED Ontology http://mged.sourceforge.net/ontologies/index.php

70 Introduction to maxdLoad2 – EnvGen http://www.bioinf.man.ac.uk/microarray / Acknowledgments ‘maxd’ development is currently funded as part of a large UK-wide project themed on environmental genomics. http://envgen.nox.ac.uk/ Microarray Bioinformatics Group at The University of Manchester http:// www.bioinf.man.ac.uk/microarray Prof. Andy Brass andy.brass@man.ac.uk


Download ppt "Introduction to maxdLoad2 – EnvGen / Presentation Overview Phase One: Introduction to the software Modelling microarray."

Similar presentations


Ads by Google