Presentation is loading. Please wait.

Presentation is loading. Please wait.

ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.

Similar presentations


Presentation on theme: "ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB."— Presentation transcript:

1 ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB Review

2 ENCODE in a nutshell: Y3Q3 2 species (human, mouse) 3 years of production phase (2 years left) 17 grants, 27 labs 20 experiment types 27 browser tracks 140 cell & tissue types 1559 datasets

3 Topics: 1.DCC role in ENCODE 2.Progress since last SAB review 3.Challenges 4.Browser impact

4 DCC role in ENCODE Define data formats and submission process Load, display, curate, review & release data Collect metadata and documentation Provide public website, viz, tools User outreach & support Support consortium communications (wiki) Support analysis group

5 Data submission website: http://encodesubmit.ucsc.edu http://encodesubmit.ucsc.edu 2438 submissions as of September 2010

6 Lifecycle of a data submission sub.tar.gz lab uploads to submission website loaded uploaded pipeline validates and loads into database wrangler configures browser track data fails validation or loading validate failed, load failed 1 1 2 2 3 3 displayed (on test browser) lab approves track on test browser 4 4 5 5 Q/A reviews and releases approved, reviewing released (to public browser)

7 ENCODE track quality review Data format checks Description and metadata complete & correct Configurability Display at different zoom levels and visibilities Performance Does the data make biological sense ? Usability

8 Released Data: 27 tracks in hg18, representing 860 experiments as of September 2010

9 Features planned for Year 2: High-resolution wiggle (bigWig) DONE RNAseq display enhancements DONE NCBI accessioning of seq data IN PROGRESS Track search tool IN REVIEW Progress since last SAB review (Feb ‘09) Plus: Integrated regulatory track Hg19/GRCh37 migration BAM support (spec, validation, display, c-tracks) Mid-course review, 4 data freezes, 2 analysis workshops, DCC site review Mouse ENCODE

10 Integrated regulatory tracks UCSC-developed integrative ENCODE track – shows enrichment of histone modifications suggestive of enhancer and promoter activity, DNAse clusters indicating open chromatin, regions of transcription factor binding, and transcription levels, derived from ENCODE data collected in multiple cell lines.

11

12 ENCODE data at NCBI GEO: Caltech RNA-seq

13 Mouse ENCODE experiment matrix 4 grants funded by the ARRA, 3 are now submitting data more cell types more factors

14 Simple search looks at: Track names and labeling Tracl description Metadata terms (specifically ENCODE controlled vocabulary) Finding data in the browser: Simple free-text search

15 Advanced search allows selection by defined metadata terms. (Currently only for ENCODE tracks) This search finds histone modification H3K4me3 as seen in H1-hESC cells. Finding data in the browser: By metadata terms

16 The results of the search on the previous slide is a single track of histone modification H3K4me3 as seen in H1-hESC cells. Clicking ‘View in Browser’ will display this data. Results from track search

17 Challenges Number of labs, difficulty of some Metadata expansion, special handling beyond normal browser data Multiple customers: NHGRI, analysis group, labs, user community Production vs. research Mission expansion: GEO/SRA, standards, ARRA, year 5 Reporting overhead Engineering staff -> hire ‘wranglers’ Funding delays

18 Impact on browser Expanded data – mostly useful, some not so much Pushes development of viz, tools, formats for large datasets Competes for staff and mgmt resources

19 People at the DCC PI: Jim Kent Technical project manager: Kate Rosenbloom Engineering / Wrangling: Tim Dreszer, Venkat Malladi, Brian Raney, Cricket Sloan, Melissa Cline Outreach, usability: Melissa, OpenHelix (contractor) Submissions website: Galt Barber GEO tools: Krishna Roskin Quality assurance: Katrina Learned, Vanessa Swing Browser management: Donna Karolchik, Bob Kuhn, Ann Zweig

20 Reporting: Monthly Quarterly Annual

21 Additional slides

22 Plans Track search tool ENCODE tutorial Portal upgrade Complete GEO submissions Analysis tracks ARRA grants (protegenomics, epitope-tag) Release Mouse data

23 Browser features developed for ENCODE High resolution wiggle (bigWig) HTS formats (BAM and bigBed) BIG custom tracks View-based tracks Data selection matrix Metadata links Coming soon: Track Search

24 Initial tracks of mouse data (test browser)

25 GEO Submission Pipeline

26 http://encodeproject.orgENCODE Portal

27 ENCODE Outreach 2009-2010 Publication: NAR 2010 Database issue (2011 update in press) Presentations: CSHL Statistical Analysis course June 2010, Stanford Computational Systems Bioinformatics, Aug 2010 Posters: CSHL Biology of Genomes May 2009, CSB 2010

28 OpenHelix ENCODE tutorial


Download ppt "ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB."

Similar presentations


Ads by Google