Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,

Similar presentations


Presentation on theme: "Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,"— Presentation transcript:

1 Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York, John A. Miller Presentation at International Symposium on Web Services For Computational Biology and Bioinformatics, VBI, Blacksburg, VA, May 26-27, 2005

2 2 Glycomics  Study of structure, function and quantity of ‘complex carbohydrate’ synthesized by an organism Glycosylation  Carbohydrates added to basic protein structure - Glycosylation Folded protein structure (schematic)

3 3  Genome (comprised of DNA) or Proteome (proteins) are not the only factors in life functions of an organism glycosylation  Carbohydrates attached to different protein structures (by glycosylation) are important for:  Identification of foreign entities by immune system cells  Markers to accurately diagnose diseases  Regulate signaling activities N-glycosylation  Categorization of glycosylation - the way carbohydrates are attached to proteins. Example: N-glycosylation Glycosylation – why is it important?

4 4 N-GlycosylationProcessNGP N-Glycosylation Process (NGP) Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction ms datams/ms data ms peaklist ms/ms peaklist Peptide listN-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning n 1 By N-glycosylation Process, we mean the identification and quantification of glycopeptides

5 5  This Resource was established by the National Center for Research Resources  The aim is to develop the tools and technology to analyze glycoprotein and glycolipid expression of embryonic stem cells  Our research provides bioinformatics support for four research groups:  Embryonic Stem Cell Culture Program  Glycomic Analysis of Glycoproteins  Glycomic Analyses of Glycosphingolipids and Sphingolipids  Transcript analysis by kinetic RT-PCR NGP – part of the Bioinformatics core Integrated Technology Resource for Biomedical Glycomics

6 6  Unlike proteomics or genomics, high-throughput experimental protocols are still being established in Glycomics  NGP involves a multitude of heterogeneous tasks, including human-mediated tasks Web Services  NGP attempts to encapsulate particular computational steps as platform-independent, scalable and Web-accessible tools – Web Services  Enables glycobiologists to integrate automated data generation tasks with data processing tools (Web Services) end- to-end experimental lifecycle NGP – need in Glycomics

7 7  Extremely difficult to identify glycosylated peptide sequences using standard analytical methods consensus sequences  N-glycosylation occurs at particular sites on the protein structure – consensus sequences N-Glycosylation identification - Problems XS/TN An example glycopeptide (schematic) Peptide Glycan Consensus Sequence PNGaseF DJ Asparagine Aspartate

8 8 NGP - implementation  NGP,currently,implements a Web Process constituted of two Web Services:  DB Modifier NJ  DB Modifier Web Service – modifies the search database by replacing N (in consensus sequences) by J  Collator  Collator Web Service – identifies a probable N-glycosylated peptide, using three parameters:  Calculated molecular mass J  Presence of ‘J’ in a peptide sequence  MASCOT* Score assigned to a hit  NGP also involves propriety Mass Spectrometer search engine service (MASCOT*) as an intermediate task  Hence, NGP Web Process identifies probable glycosylated peptides – enabling rapid processing of data from high throughput experiment *http://www.matrixscience.com/

9 9 NGP – Architecture (current) ms/ms raw data PEAK LIST FILE Primary Sequence Database ModifyDB Web Service Collator Web Service MASCOT* Mass Spectrometer Search Engine Deglycosylated peptide list MASCOT* output file (contains both glycosylated and non- glycosylated peptide sequences) *http://www.matrixscience.com/

10 10 NGP Results  A typical MASCOT output file is about 3MB!  High-throughput experiment protocol generate thousands of such files - manual identification is not feasible q1_p1=-1 q2_p1=0,626.349945,-0.023321,2,APGVAGR,18,000000000,1.49,00020000000000000,0,0;"gi|51465537":0:190:196:1 q2_p2=1,626.361191,-0.034567,2,APARGR,18,00000000,1.33,00020000000000000,0,0;"gi|10140845":0:2:7:2 q2_p3=0,626.349945,-0.023321,2,APAVGGR,18,000000000,1.33,00020000000000000,0,0;"gi|51470766":0:212:218:1,"gi|51470768":0:212:218:1 q3_p3=0,634.368973,0.006151,4,DIIFK,12,0000000,25.26,00010020000000000,0,0;"gi|47078238":0:364:368:2,"gi|47078240":0:328:332:2 q3_p4=0,634.351227,0.023897,4,MPLFK,12,0000000,25.24,00010020000000000,0,0;"gi|41197108":0:95:99:1,"gi|4557311":0:1:5:2 q3_p5=0,634.343811,0.031313,3,NNLFK,12,0000000,15.34,00010020000000000,0,0;"gi|31377725":0:539:543:1 q3_p6=0,634.368973,0.006151,3,LDIFK,12,0000000,15.34,00010020000000000,0,0;"gi|39725634":0:891:895:1 q3_p7=0,634.343811,0.031313,3,NNIFK,12,0000000,15.34,00010020000000000,0,0;"gi|7661646":0:212:216:1 q3_p8=0,634.368973,0.006151,3,LDLFK,12,0000000,15.34,00010020000000000,0,0;"gi|51474898":0:237:241:1 q3_p9=0,634.368958,0.006166,3,EVIFK,12,0000000,13.61,00010020000000000,0,0;"gi|28376662":0:67:71:1 q3_p10=0,634.368958,0.006166,3,VELFK,12,0000000,13.61,00010020000000000,0,0;"gi|51467300":0:493:497:1,"gi|51467535":0:99:103:1 q4_p1=-1 q5_p1=0,662.375122,0.004702,5,DLLFR,14,0000000,18.41,00020020000000000,0,0;"gi|21536369":0:84:88:1,"gi|21536367":0:17:21:1,"gi|4557871":0:647:651:1 q5_p2=0,662.375122,0.004702,3,DLFLR,14,0000000,12.81,00010020000000000,0,0;"gi|33695153":0:407:411:1,"gi|4504043":0:330:334:1,"gi|11968045":0:6:10:1 q5_p3=0,662.375122,0.004702,3,DIFIR,14,0000000,12.81,00010020000000000,0,0;"gi|4505725":0:924:928:1,"gi|29788751":0:1170:1174:1 q5_p4=0,662.349960,0.029864,3,NNFIR,14,0000000,11.84,00010020000000000,0,0;"gi|24416002":0:667:671:1 q5_p5=0,662.375122,0.004702,4,IDLFR,14,0000000,9.98,00020020000000000,0,0;"gi|12957488":0:602:606:1,"gi|41148707":0:536:540:1,"gi|51464463":0:646:650:1 q5_p6=0,662.375122,0.004702,4,LDLFR,14,0000000,9.98,00020020000000000,0,0;"gi|42657517":0:335:339:1 q5_p7=0,662.375107,0.004717,4,VELFR,14,0000000,9.98,00020020000000000,0,0;"gi|6912230":0:436:440:1 q5_p8=0,662.375122,0.004702,4,LDIFR,14,0000000,9.98,00020020000000000,0,0;"gi|8922081":0:2699:2703:1 q5_p9=0,662.349960,0.029864,4,NLNFR,64,0000000,5.89,00010020000000000,0,0;"gi|19923416":0:816:820:1 q5_p10=1,662.361191,0.018633,2,NRFAR,14,0000000,3.37,00010020000000000,0,0;"gi|4758704":0:97:101:1 q6_p1=0,674.359863,-0.006639,4,VSDNIK,35,00000000,11.27,00010020000000000,0,0;"gi|32130516":0:935:940:1 q6_p2=0,674.323456,0.029768,5,EGDLGGK,21,000000000,7.97,00020020000000000,0,0;"gi|13569928":0:1058:1064:1 q6_p3=0,674.359848,-0.006624,5,EATVAGK,21,000000000,7.88,00020020000000000,0,0;"gi|51475822":0:527:533:1 q6_p4=1,674.389740,-0.036516,3,QRMLK,14,0000000,7.46,00020010000000000,0,0;"gi|24307905":0:467:471:2,"gi|24307905":0:638:642:2 q6_p5=0,674.359863,-0.006639,5,LSSSPGK,56,000000000,7.38,00000020000000000,0,0;"gi|8922075":0:806:812:1 q6_p6=0,674.338730,0.014494,4,WDLGGK,42,00000000,6.40,00010020000000000,0,0;"gi|13375817":0:123:128:1 q6_p7=0,674.359879,-0.006655,4,QATDLK,56,00000000,6.21,00020010000000000,0,0;"gi|21361684":0:451:456:1 q6_p8=1,674.371094,-0.017870,3,QTNKGK,14,00000000,6.03,00020010000000000,0,0;"gi|41117716":0:85:90:1 q6_p9=1,674.389740,-0.036516,6,QMRIK,28,0000000,5.77,00020020000000000,0,0;"gi|28329439":0:269:273:1,"gi|28558993":0:278:282:1 q6_p10=1,674.389740,-0.036516,6,QMRLK,28,0000000,5.77,00020020000000000,0,0;"gi|40255096":0:300:304:1 q7_p1=0,695.348969,0.007855,4,YDASLK,14,00000000,8.86,00020020000000000,0,0;"gi|4758454":0:2761:2766:1

11 11  Two Ontologies developed as part of the NCRR-Glycomics project:  GlycO  GlycO: a domain Ontology embodying knowledge of the structure and metabolisms of glycans  Contains 770 classes – describe structural features of glycans  URL: http://lsdis.cs.uga.edu/projects/glycomics/glycohttp://lsdis.cs.uga.edu/projects/glycomics/glyco  ProPreO  ProPreO: a comprehensive process Ontology modeling experimental proteomics  Contains 296 classes  Models three phases of experimental proteomics* – Separation techniques, Analytical techniques and, Data analysis  URL: http://lsdis.cs.uga.edu/projects/glycomics/propreohttp://lsdis.cs.uga.edu/projects/glycomics/propreo NGP Web Services – Adding Semantics *http://pedro.man.ac.uk/uml.html (PEDRO UML schema)

12 12  ProPreO models the phases of proteomics experiment using five fundamental concepts:  Data  Data: (Example: a peaklist file from ms/ms raw data)  Data_processing_applications  Data_processing_applications: (Example: MASCOT* search engine)  Hardware  Hardware: embodies instrument types used in proteomics (Example: ABI_Voyager_DE_Pro_MALDI_TOF)  Parameter_list  Parameter_list: describes the different types of parameter lists associated with experimental phases  Task  Task: (Example: component separation, used in chromatography) ProPreO - Experimental Proteomics Process Ontology *http://www.matrixscience.com/

13 13  Formalize description and classification of Web Services using ProPreO concepts Service description using WSDL-S <wsdl:definitions targetNamespace="urn:ngp" ….. xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <schema targetNamespace="urn:ngp“ xmlns="http://www.w3.org/2001/XMLSchema"> ….. WSDL ModifyDBWSDL-S ModifyDB <wsdl:definitions targetNamespace="urn:ngp" …… xmlns: wssem="http://www.ibm.com/xmlns/WebServices/WSSemantics" xmlns: ProPreO="http://lsdis.cs.uga.edu/ontologies/ProPreO.owl" > <schema targetNamespace="urn:ngp" xmlns="http://www.w3.org/2001/XMLSchema"> …… <wsdl:message name="replaceCharacterRequest" wssem:modelReference="ProPreO#peptide_sequence"> ProPreO process Ontology data sequence peptide_sequence Concepts defined in process Ontology Description of a Web Service using: Web Service Description Language

14 14  There are no current registries that use semantic classification of Web Services in glycoproteomics Stargate  BUDDI classification based on proteomics and glycomics classification – part of integrated glycoproteomics Web Portal called Stargate  NGP to be published in BUDDI  Can enable other systems such as my Grid to use NGP Web Services to build a glycomics workbench Biological UDDI (BUDDI) WS Registry for Proteomics and Glycomics

15 15  As part of NCRR Integrated Technology Resource for Biomedical Glycomics, we implemented a Semantic Web Process for high throughput glycomics in open, web-centric environment  Large domain specific ontologies with process (ProPreO) and domain (GlycO) knowledge concepts was used to describe and classify Web Services – at Semantic level  Used proposed Semantic Web Service specification (WSDL-S) to add semantics to Web Service description Stargate  Biological UDDI (BUDDI) – part of Stargate is being developed as a single-window resource to discover and publish Web Services in glycoproteomics domain Conclusions

16 16 Resources  NCRR (Integrated Technology Resource for Biomedical Glycomics): http://cell.ccrc.uga.edu/world/glycomics/glycomics.php http://cell.ccrc.uga.edu/world/glycomics/glycomics.php  Bioinformatics core of Glycomics project: http://lsdis.cs.uga.edu/projects/glycomics/ http://lsdis.cs.uga.edu/projects/glycomics/  ProPreO process Ontology: http://lsdis.cs.uga.edu/projects/glycomics/propreo/ http://lsdis.cs.uga.edu/projects/glycomics/propreo/  GlycO domain Ontology: http://lsdis.cs.uga.edu/projects/glycomics/glyco/  Stargate – GlycoProteomics Web Portal: http://128.192.9.86/stargate  WSDL-S: joint UGA-IBM technical note http://lsdis.cs.uga.edu/library/download/WSDL-S-V1.pdf

17 17 Acknowledgement Special Thanks: James Atwood (CCRC, UGA) Meenakshi Nagarajan (LSDIS Lab, UGA) Blake Hunter (LSDIS Lab, UGA)

18 18  BUDDI  BUDDI – BioUDDI is envisioned as the ‘yellow pages’ for all WS in life sciences  The classification of WS uses biological taxonomy  Open resource for the worldwide community of life sciences research  Format Converter  Format Converter – Enables conversion of two available representation formats into a xml-based representation  IUPAC to LINUCS to GLYDE (a xml-based representation)  Web Service Generator  Web Service Generator – Enables existing java application to be exposed as Web Services  Generates required files from a java application to allow deployment as a Web Service  Enable the newly generated Web Service to be published on BioUDDI Extra Slides: Stargate subsystems – a bit of detail

19 19  Group Forum  Group Forum – Members of the research group use it to foster a sense of community  Schedule meetings, discuss issues, collaborate on papers…  Post papers for peer reviews, publications on relevant topic  Stargate Search  Stargate Search – is an integrated unit of the Stargate  Enables search for research publication within the research group  Enables search on the internet  Login  Login – Allows restrictions on accessibility of selected parts of Stargate Extra Slides: Stargate subsystems – a bit of detail

20 20 Extra Slides: The take home message… InternetForum BUDDI Search Web Service Generator


Download ppt "Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,"

Similar presentations


Ads by Google