Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Integrated Molecular Analysis of Genomes and their Expression Consortium’s Data Mining Tools: Introducing the IQ Peg Folta Lawrence Livermore National.

Similar presentations


Presentation on theme: "The Integrated Molecular Analysis of Genomes and their Expression Consortium’s Data Mining Tools: Introducing the IQ Peg Folta Lawrence Livermore National."— Presentation transcript:

1 The Integrated Molecular Analysis of Genomes and their Expression Consortium’s Data Mining Tools: Introducing the IQ Peg Folta Lawrence Livermore National Laboratory 3/12/02 TRANSCRIPTOME 2002 Seattle, WA

2 I.M.A.G.E. maintains world’s largest publicly available cDNA collection 5,819,514 clones arrayed I.M.A.G.E. clones account for 64% of human ESTs in GenBank cumulative arrayed *

3 The I.M.A.G.E. collection has been shaped by projects (C-GAP, MGC…) Species Library Method Developmental state Tissue Clone sequence

4 Informatics focus this year was on tools to characterize and query the collection. IMAGEne – mature clustering tool IMAGEne Tissue – allows searching of tissue type dominance in clusters IQ – Intelligent Query tool allows mining of I.M.A.G.E. data Library/plate query – allows selective searching of libraries and plates Problem report and query – allows users to report or query problems related to I.M.A.G.E. clones Redesign of data management system

5 IMAGEne-Human Process 2,289,020 Quality I.M.A.G.E sequences 14,566 NCBI Ref Seq IMAGEne 1,676,516 Sequences 623,294 Sequences Remaining Sequences >50 basepairs of contiguous, non- repeat sequence Known Clusters 14,566 Candidate Clusters w/consensus 67,521 I.M.A.G.E. Singletons 268,472 279,262 Lower quality I.M.A.G.E Sequences

6 Initial query page, construct the query.

7 Clusters matching query results, chose your cluster.

8 Display of cluster

9 Known gene clusters with full length I.M.A.G.E. clones have doubled in number. Cluster coverage Avg. gene length 3392 2763 3380 1896 1578

10 Known Gene Cluster distribution of full length clones avg. length = 948

11 Candidate gene clusters consensus sequence and contigs are generated by CAP4 61,314 4,971 824 95 227 40

12 Candidate Gene cluster characteristics.

13 Singleton: Wheat within the chaff 305 full insert sequences are singletons. 62,143 singletons have a 3’ PolyA site. Avg. length is 547

14 IMAGEne Tissue query allows searching for tissue proportions within clusters.

15 Introducing the Intelligent Query - IQ For a given category (currently clone and library) a user can specify a query based on key database attributes. The user can specify the fields returned. Various result format options (HTML, text) Initial version was rolled out last summer New functionality to be added this year (additional categories, etc.)

16 Specify a clone-based query.

17 Next specify what clone centric results will be provided and in what format.

18 HTML version of clone-based query results.

19 Specify a library-based query.

20 Similarly specify what library centric results will be provided.

21 HTML version of library-based query results.

22 Other tools to mine I.M.A.G.E. information Query plates from libraries. Query for reported problems.

23 PlatesSourceWell Error Rate 1-3705Incyte13 LLNL Master10 Research Genetics12 Resource Center of HumanGenome Project 10 ATTC11 3,796-6000Incyte7 LLNL Master7 Research Genetics10 Resource Center of Human Genome Project 12 Quality control for historical collection

24 QC on-going MonthsWell error rate Plate Error Rate Well error rate Plate Error Rate 6/20001 (1,3)07 (4,11)2 10/20001 (0,3)0 2 12/000 (0,2)21 (0,3)2 1/012 (1,4)06 (4,11)3 2/011 (0,3)02 (1,5)2 3/012 (1,5)2 0 4/011 (0,3)22 (1,4)0 5/010 (0,1)02 (1,5)0 6/011 (0,3)01 (0,4)0 7/011 (0,4)02 (1,6)0 8/012 (1,3)03 (2,6)0 LLNL Replication Master vs. GenBank

25 Ongoing QC results On-going Comparing master to GenBank Error in replication @ LLNL

26 Next for I.M.A.G.E. Informatics Extensive expansion of query tools and data access IMAGEne non-species specific Analysis of human cluster candidate genes and singletons Redo of web site, easier to navigate MUCH influenced by public needs…..you have a say!

27 Acknowledgements LLNL –Christa Prange, I.M.A.G.E. PI –Tim Harsch, Amber Johnston, Julie Amundson Sponsors –DOE, Marv Stodolsky –NIH, Bob Strausberg This work was partially funded by the NIH and was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under contract no. W-7405-Eng-48. image.llnl.gov


Download ppt "The Integrated Molecular Analysis of Genomes and their Expression Consortium’s Data Mining Tools: Introducing the IQ Peg Folta Lawrence Livermore National."

Similar presentations


Ads by Google