Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML and Bioinformatics Rajvi Shah. What is XML ? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to.

Similar presentations


Presentation on theme: "XML and Bioinformatics Rajvi Shah. What is XML ? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to."— Presentation transcript:

1 XML and Bioinformatics Rajvi Shah

2 What is XML ? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to describe data and focus on what data is.

3 Features Of XML XML is an easy and automatically parseable way to describe data More flexible and adaptable information identification. XML is extensible XML lets us design our own customized mark up language

4 Why XML ? Data in incompatible formats Difficulties in Exchanging data Software and hardware independent way of sharing data XML used to store and display data With XML data availabe to more users

5 Databases and XML Database content can be presented in XML –XML processor can access DBMS or file system and convert data to XML –Web server can serve content as either XML or HTML

6 Why XML For Bioinformatics ? Biology is a complex discipline Wide variety of data resources and repositories Biological data represented in multiple fomats eg. FASTA, agp,gff etc. No standard protocol exists to interrogate biological data stores

7 Why XML for bioinformatics No standard nomenclature for genomic, proteonomic,chemi-informatics and other biological data No standard data format exists to exchange biological data. No standard data model exists. Difficulties in using and exchanging data

8 Large no of sources

9 XML Syntax Elements & Attributes note date="12/11/2002"> > Tove Jani</from Reminder Don't forget me this weekend!

10 XML DTD File containing a formal definition of the permitted structure of the A document A DTD describes: –What names can be used for element types –Where element types can occur –How element types fit together –The attributes of any element

11 An Example XML DTD <!DOCTYPE seq [ <!ATTLIST seqidID#REQUIRED nameCDATA#IMPLIED lengthCDATA#IMPLIED >... ]>

12 XML Schema(A better DTD) Some developers dissatisfied with XML DTD –The description of a document’s structure should be a XML document, not have its own special syntax Could manipulate schema with regular XML editing tools –XML DTD doesn’t impose enough constraints on data

13 Case Study: BigLab BigLab is the research department of BigPharma Business Requirements –Get Data –Align and Analyze sequences –Send to BigPharma’s headquarters

14 A Piece of XML Schema SWISS-PROT P09651 SKSESPKEPEQLRKLFIGGLSFETTDESLRSHFEQWGTLTDCVVMRDPNTKRS RGFGFVTYATVEEVDAAMNARPHKVDGRVVEPKRAVSREDSQRPGAHLTVKKI FVGGIKEDTEEHHLRDYFEQYGKIEVIEIMTDRGSGKKRGFAFVTFDDHDSVD KIVIQKYHTVNGHNCEVRKALSKQEMASASSSQRGRSGSGNFGGGRGGGFGGN DNFGRGGNFSGRGGFGGSRGGGGYGGSGDGYNGFGNDGGYGGGGPGYSGGSRG YGSGGQGYGNQGSGYGGSGSYDSYNNGGGRGFGGGSGSNFGGGGSYNDFGNYN NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGYGGSSSSSSYGSGRRF

15 Biological XML Some DTD’s have been proposed publicly as XML formats for biological data –GAME Drosophila Genome Project/Celera –BIOML ProteoMetrics –BSML VisualGenomics –CML OMF –GEML Gene Expression Data

16 Summary 1. XML is highly flexible It is simple to modify a DTD. The XML and DTD files are human readable and then can be easily edited by people with only few computer skills 2. XML is Internet-oriented and has very rich capabilities for linking data -This can be used for interconnecting databases 3. XML provides an open framework for defining standard specifications. -This is an important point because bioinformatics clearly lacks standardization


Download ppt "XML and Bioinformatics Rajvi Shah. What is XML ? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to."

Similar presentations


Ads by Google