Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advisory Board Meeting, Caltech 2004 Genome Sequence Updates. Paul Davis The Sanger Institute.

Similar presentations


Presentation on theme: "Advisory Board Meeting, Caltech 2004 Genome Sequence Updates. Paul Davis The Sanger Institute."— Presentation transcript:

1 Advisory Board Meeting, Caltech 2004 Genome Sequence Updates. Paul Davis The Sanger Institute

2 Advisory Board Meeting, Caltech 2004 The Finished Sequence ≈ Oct 2002: ≈Contiguous sequence (telomere – telomere) ≈Finishing error rate of 1:10,000 ≈~10,000 total. ≈Where to find these errors ≈Biased to Coding sequence ≈“Stable” Genome ≈Platform for research

3 Advisory Board Meeting, Caltech 2004 How Has the Genome Changed: ≈ New sequence data. ≈Variety of sources. ≈3rd party clone data. ≈Repeat assembly data. ≈Transcript data. ≈EST - Kohara Y. ≈OST - Vidal M. ≈cDNA - NDB mRNAs/cDNAs. ≈IST - Vidal M. ≈Resolution of N’s (St Louis). ≈95 –> 8 for WS119. ≈Further sequencing/PCR.

4 Advisory Board Meeting, Caltech 2004 Genomic Change Since Final Gap Closure Oct 2002 Unfolding of inverted repeat Resolution of repeat misassembly Latest Sequence update Sequence corrections WS95 WS120 WS115 WS110 WS105 WS100 ABM 2003

5 Advisory Board Meeting, Caltech 2004 How Errors Are Identified. ≈ Gene predictions may have an incorrect structure compared to available experimental data. ≈Use of incorrect splice donor/acceptors on intron exon boundaries to avoid problem. ≈Truncation of the prediction to avoid problems. ≈Splicing out of Internal stop codons. ≈Extra intron to allow for frame shift. ≈ Curators check through generated lists of potential problems: ≈Introns confirmed by transcript data but not in a prediction. ≈Small Introns. ≈30bp minimum cut-off for intron size (Proposed/implemented Jan 2004). ≈Transcript data matching introns.

6 Advisory Board Meeting, Caltech 2004 How errors are identified Cont. ≈ WormBase Users ≈ Identification of a single copy prediction that is a pseudogene that the user believes not to be a pseudogene through their research/observations. ≈ Or vice versa. ≈ Identification of a prediction that does not follow the “family” structure, missing out a motif/domain to avoid a problem region. ≈ Pseudogenes may be real or reflect a sequencing error. ≈ Each case is investigated individually.

7 Advisory Board Meeting, Caltech 2004 WS119 Sequence Update. ≈ A list of potential sequencing errors was compiled. ≈Archived projects were checked ≈New sequence files were created ≈Incorporated into clone linkage groups ≈Data base rebuilt ≈Sequencing errors resolved. ≈21 clones affected ≈16 indels ≈11 insertions ≈5 deletions ≈4 substitutions ≈1 assembly problem

8 Advisory Board Meeting, Caltech 2004 Example of an Additional Intron to Accommodate Frame-shift Single bp insertion into the genome causing a shift from frame 2 – 3. Base removed allowing original predictions to be corrected. Investigated and corrected 1 23 ESTs mRNA 1 23 ESTs mRNA

9 Advisory Board Meeting, Caltech 2004 Example of False Introns and Truncation to Represent All Data. Investigated and corrected ESTs A B Single EST highlighting problem. History objects Single prediction

10 Advisory Board Meeting, Caltech 2004 Expectations of Future Changes ≈ Known sequencing errors still exist in problematic clones. ≈ Why? ≈Incomplete archive. ≈Some early clone projects not available. ≈Poor quality, unfinished projects. ≈ Strategy for resolving this issue. ≈Genomic PCR of known problems. ≈Recently sourced facilities to do this. ≈Priority for future frozen release.

11 Advisory Board Meeting, Caltech 2004 Since The ABM 2003 ≈ Efforts to correct all known sequencing errors. ≈ Cater for needs of different users. ≈Within the worm community there are different needs from the sequenced genome. ≈Bioinformatics groups wanting stability to perform global analyses. ≈Researchers wanting the latest, accurate sets of gene predictions.

12 Advisory Board Meeting, Caltech 2004 How Are We Catering for Different Needs? ≈ WormBase 2 week release cycle. ≈Quick turnaround for corrections and data. ≈Good for research groups interested in subsets of genes. ≈Bad for global analysis groups as sequence changes throw out coordinates. ≈ Introduction of WormBase “Frozen” release versions. ≈These take place every 10 releases (~ 5 months). ≈1 st “Frozen” release was May 2003 (WS100). ≈Separate website (http://ws ★★★.wormbase.org/. ≈Remain available on ftp site. ≈Subsequent releases WS110 & WS120.

13 Advisory Board Meeting, Caltech 2004 Frozen Release Effects ≈ User benefits. ≈Allows bioinformatics groups to coordinate analyses. ≈Can reference a specific release. ≈Continued use of release through separate web server. ≈ Effects on WormBase. ≈Requires more resources. ≈Correction update cycle.

14 Advisory Board Meeting, Caltech 2004 Genomic Change and Data freezes. Frozen Release WS95 WS120 WS115 WS110 WS105 WS100 Data incorporated into frozen release. Prior to data freezes Out of sync.

15 Advisory Board Meeting, Caltech 2004 The End!


Download ppt "Advisory Board Meeting, Caltech 2004 Genome Sequence Updates. Paul Davis The Sanger Institute."

Similar presentations


Ads by Google