Presentation is loading. Please wait.

Presentation is loading. Please wait.

Future directions in genetic epidemiology, impact on IT and Data requirements Nic Timpson MRC CAiTE Centre Department of Social Medicine.

Similar presentations


Presentation on theme: "Future directions in genetic epidemiology, impact on IT and Data requirements Nic Timpson MRC CAiTE Centre Department of Social Medicine."— Presentation transcript:

1 Future directions in genetic epidemiology, impact on IT and Data requirements Nic Timpson MRC CAiTE Centre Department of Social Medicine

2 Step change Larsen

3 Why? -Technology -Paradigm shift -Genomic properties EUCCONET Data Management Workshop

4 Raw data Clinical meaning ???????

5 EUCCONET Data Management Workshop Two of the driving technologies: Chip based genotyping Next Generation Sequencing (NGS)

6 EUCCONET Data Management Workshop Basic flat Illumina output…

7 EUCCONET Data Management Workshop Derivation of flat file data from image based intensity reads:

8 CHR16_HAPMAP.recode.map red_test_run_assoc.txtgenetic_map_chr16.txt HAPMAP - Illumina - Affymetrix CHR16_HAPMAP.recode.ped EUCCONET Data Management Workshop

9

10 Position (Mb) EUCCONET Data Management Workshop NOD2 Crohns association

11 IndividualPlatform Read Length Base coverage Genomic coverage Cost ($US) J. Craig Venter Automated Sanger 8007.5N/A70,000,000 James D. WatsonRoche/4542507.4951,000,000 Yoruban male Illumina/ Solexa 3540.699.9250,000 Yoruban male Life/APG5017.998.660,000 EUCCONET Data Management Workshop

12 Data (bytes) ~20Tb Based on n~5000 ~$5 + billion ~$70 million ~$1 million ~$ 60 000 Per genome HGP Venter & Watson NGS 1- Candidate 2- CHIP (designer) 3- Affy 500 4- Intensity data 5- NGS data (*LC) ~10Gb ~2Mb Consequent shifting budgets… EUCCONET Data Management Workshop

13 Based on the storage of re-sequence data, one can consider storage requirements for a next generation sequencing effort: Assuming a storage cost of about 1.5byte per bp of sequence reads for a low coverage ~2000 samples (as per UK10K for example) x 3 billion bp x 1.5 = 10 terabytes. That doesn't include any subsequent parsed data Double this just to have the data in all formats one might be able to use meaningfully. Yields ~20Tb 20 Tb is pretty small these days if buying new storage capacity just to do this alone one may therefore be better accounting for up to 50-100Tb if buying bespoke. Cost – service costs can be as high as £1500 per Tb NGS project on some 2000 individuals can be as much as 40-50k on computing alone. EUCCONET Data Management Workshop

14 Also receiving data on: Copy number variation across the genome Expression data (e.g. records of messenger RNA to track gene activity) Methylome (markers of the epigenome) Not to mention phenotype data (a retrospective effort and an ever increasing pool) Raises the issue of linkage and data USE… EUCCONET Data Management Workshop

15 Not just storage… EUCCONET Data Management Workshop

16

17

18 Varying matrix properties and overlaid ribbon plots: (here MAF) Male vs Female D vs r^2 EUCCONET Data Management Workshop

19 CDKAL Combinations of data processing/visualisation methods: e.g. follow-up of the dissection of the TCF2 locus and the counter results for T2D and prostate cancer - other T2D loci? See: Amundadottir et al Nature Genetics 2007 EUCCONET Data Management Workshop

20 Not to mention iterative approaches! Generation of empirical distributions for the purpose of comparison, e.g. expression data Gene X Gene (and possibly environment) interation analysis which may span the genome

21 Overall EUCCONET Data Management Workshop As would expect, data requirements are increasing Genetic epidemiology has been boosted into a realm of real findings and Exciting capability by the existence of new technology Increases may (or may not) be more rapid than once thought Storage and manipulation of large data sets present new challenges A new breed of analysts is emerging The computer scientist with a passion for biology Perhaps windows is dead…


Download ppt "Future directions in genetic epidemiology, impact on IT and Data requirements Nic Timpson MRC CAiTE Centre Department of Social Medicine."

Similar presentations


Ads by Google