Presentation is loading. Please wait.

Presentation is loading. Please wait.

O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 1 Genomic Data Platform Rat Genome Database (RGD)

Similar presentations


Presentation on theme: "O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 1 Genomic Data Platform Rat Genome Database (RGD)"— Presentation transcript:

1 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 1 Genomic Data Platform Rat Genome Database (RGD) Curation System Jian Lu jianlu@mcw.edu Bioinformatics Research Center Medical College of Wisconsin, Milwaukee, USA

2 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 2 RGD web site

3 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 3 What Is Data Curation? PROCESS OF … acquiring analyzing annotating updating

4 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 4 Genomic data identification and categorization Literature identification and extraction Conflict in data identification, nomenclature and description among sources Integration of data and relationships among data Continuous updates and new types of data Volume of data Management of all curation activities Efficiency and accuracy Major Issues in Data Curation

5 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 5 Objectives of Curation System 1.To develop an object-oriented approach to be used in a data curation strategy and to create a standard exchangeable data format for RGD object types 2.To create an integrated, cross-platform, multi-layer curation environment that will process individual and bulk data submissions 3.To provide web-based curation interfaces that will allow data to be easily accessible and searchable 4.To develop robust tools that will automatically download, extract and update data 5.To manage curation data for tracking, error-checking and reporting

6 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 6 GENES SSLPs Single sequence length polymorphisms ESTs PHENOTYPES QUANTITATIVE TRAIT LOCI STRAINS SEQUENCES - PRIMERS & CLONES Data Objects Heterogeneous Datasets Object Oriented Approach

7 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 7 Homologs Sequences Primers Phenotypes Primers Phenotypes Small Scale Datasets Genes & Descriptions Bulk Datasets Map Data Genes Genes & Descriptions Disease Relationships Strains Disease Relationships RGD Strains QTLs Data Integration Literature Reviews Submissions Public Databases Collaborators

8 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 8 Consideration of Applications Data volume: individual vs. bulk Data activity: load vs. edit Data operation: online vs. batch Data processing: manual vs. automatic Data management: one vs. many

9 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 9 Applications RGD::DB.pm RGD::HTML.pm JSP submissionbulk datamanagement Literature In XML RGD Oracle RDBMS RGD::Getrefs.pm RGD::Loadrefs.pm RGD::XML::Write.pm RGD::PreLoad.pm RGD::Load.pm Web Interfaces Literature curation Robust tools Perl/CGIJava Servlet Bulk data pipeline Data submission Online editing Curation management integrated, cross-platform, multi-layer

10 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 10 CVS client CVS server CVS Documents WINCVS Oracle JDeveloper Jav a JS P Pe rl Source codes on Unix Applications on PC Web interface HTM L Version Control

11 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 11 Technology System Environment Operating System: Sun Solaris 2.8 Database: Oracle 8i Web Server: Apache 1.3.12 JSP Engine: Tomcat 4.0.3 CVS: GNU CVS 1.11.2 Java: J2SE Development Software Rational Rose Oracle 9i JDeveloper Programming Perl XML Java JSP

12 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 12 Curation Applications Literature curation Bulk data pipeline Data submission Online editing Curation management

13 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 13 Work Flow Main Menu Online Editing User Login RGD DB Literature Curation XML Article download Article Curation Data Edit Bulk Data Pipeline Bulk Data DB Data Entry and Self Checking Curation Manager Manage DB Job Assignment Curation Management User Account Curation Status Data Logs Loading Data Submission Submission DB Data Entry Data Edit Loading Checking

14 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 14 Curation Applications Literature curation Bulk data pipeline Data submission Online editing Curation management

15 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 15 Literature Curation

16 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 16 Literature Curation robust handling all literature loaded into RGD to the present time user friendly web interface to load single or multiple articles through the use of comma separated lists of PubMed Ids

17 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 17 Literature Curation

18 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 18 Literature Curation

19 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 19 Literature Curation

20 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 20 GMOD Literature Curation http://www.gmod.org/pubfetch/ http://www.gmod.org/pubsearch.shtml http://www.gmod.org/pubtrack.shtml RGD TAIR

21 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 21 Curation Applications Literature curation Bulk data pipeline Data submission Online editing Curation management

22 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 22 Bulk Data Pipeline RGD DB Identify data object and format Blast sequences If existing Check each attribute against RGD New data load Wrong data object Incorrect format Duplicate symbols Switched primer pairs RGD existing sequences Duplicate sequences RGD data conflicts Curate conflicting data load Genes SSLPsMapsQTLsESTsHomologsStrains

23 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 23 Bulk Data Pipeline: Interface

24 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 24 Curation Applications Literature curation Bulk data pipeline Data submission Online editing Curation management

25 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 25 Data Submission

26 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 26 Data Submission

27 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 27 Data Submission

28 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 28 Data Submission

29 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 29 Data Submission

30 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 30 Curation Applications Literature curation Bulk data pipeline Data submission Online editing Curation management

31 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 31 Online Editing Gene Nomenclature Case 1: replacing Case 2: merging Case 3: splitting Case 4: withdrawn A B C A B C A B C A

32 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 32 Online Editing Gene Information product function description cellular localization disease details drugs expression mutations and over/under expression pathway regulation role details transcript structure physical interaction …

33 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 33 Online Editing: use-case study

34 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 34 Online Editing: use-case study EditCurrent use case process

35 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 35 Online Editing

36 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 36 Online Editing: merging

37 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 37 Curation Applications Literature curation Bulk data pipeline Data submission Online editing Curation management

38 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 38 Curation Management View system usage Activate/deactivate users Grant user privilege Remove user Assign new jobs Relocate jobs View reports View record status Modify record status Finalizing data processing View data history Data recovery User AccountJob AssignmentCuration StatusData LogsRobust Tools Report curation status Synchronize data

39 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 39 Curation Management

40 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 40 Curation Management

41 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 41 Curation Management

42 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 42 Curation Management

43 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 43 Curation Management Automatic Homolog Sync Automatic downloading Compare homolog symbol Mouse homolog File from MGD FTP Human homolog File from NCBI FTP Data file parser RGD DB homolog query Symbol change? update duplicate symbol? delete RGD Symbol not found in the files Reporting files Send out email to curators

44 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 44 Curation Management Automatic Homolog Sync

45 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 45 Curation Management Reporting of Curation DB Status

46 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 46 Curation Management Reporting of Curation DB Status

47 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 47 RGD Data Curation Statistics Over 1800 articles have been curated and the data transferred to the database including 5195 genes, 250 QTLs, and 321 strains. There are about 350 genes that contain information from the curated literature resulting in over 400 genes containing a product, function, or gene description, 1450 genes that have homologs, 1488 genes that have a curated map position, 334 genes with known microsatellite markers, and 55 candidate genes for QTLs. Over 350,000 SSLPs, genes, QTLs, strains, and homologs, and sequences have been processed for bulk data sets. Over 5,000 instances of data with potential conflict have been detected.

48 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 48 RGD Data Curation Statistics

49 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 49 RGD Data Curation Statistics

50 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 50 Acknowledgement This work has been supported by grant HL64541from the National Heart, Lung and Blood Institute (NHLBI) of National Institutes of Health (NIH).

51 O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 51 Acknowledgement Rat Genome Database (http://rgd.mcw.edu) Peter Tonellato -- P.I. RGD Bioinformatics Dean Pasko Jiali Chen Sheng-He Gu Hanping Long Jed Mathis Aubrey Hughes Norie Dela Cruz Henry Fan Simon Twigger -- co P.I. RGD Curation Mary Shimoyama Chin-Fu Chen Rajni Nigam Gopal Gopinathrao Susan Bromberg Jessica Ginster Nataliya Nenasheva Charles W. Wang Angela Zuniga-Meyer The Arabidopsis Information Resource (TAIR) (http://www.arabidopsis.org/) Sue Rhee -- P.I. Lukas Mueller Iris Xu


Download ppt "O’Reilly Bioinformatics Conference San Diego, February 2003 Genomic Data Platform: RGD Curation System 1 Genomic Data Platform Rat Genome Database (RGD)"

Similar presentations


Ads by Google