Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging Andrew Singleton, Chief of the.

Similar presentations


Presentation on theme: "Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging Andrew Singleton, Chief of the."— Presentation transcript:

1 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Andrew Singleton, Chief of the Molecular Genetics Section; Acting Chief Laboratory of Neurogenetics, National Institute on Aging singleta@mail.nih.gov Professional challenges and rewards of open data sharing

2 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Rewards and Challenges: I.Persuading collaborators of the pros of open data sharing II.Ensuring sample collection and consent allows data posting and sharing III.Striving to use standardized phenotype (and genotype) measures that can be applied across past, present and future studies IV.Deciding what, where and how to post V.Continued support of the data and reasonable responses to queries I.Persuading collaborators of the pros of open data sharing II.Ensuring sample collection and consent allows data posting and sharing III.Striving to use standardized phenotype (and genotype) measures that can be applied across past, present and future studies IV.Deciding what, where and how to post V.Continued support of the data and reasonable responses to queries

3 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Data sharing This is my data…..it took my time, effort and resources to generate – why should I share the spoils My post-docs and junior investigators need these data for their careers My future funding is dependent on the findings from this dataset – why should I jeopardize this? Others will not analyze the data appropriately This is my data…..it took my time, effort and resources to generate – why should I share the spoils My post-docs and junior investigators need these data for their careers My future funding is dependent on the findings from this dataset – why should I jeopardize this? Others will not analyze the data appropriately

4 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Pros of data sharing There is simply too much for one group to do Spreads the analytical burden –Billions of genetic data points x thousands of phenotypes ‘Good will’ – the open access plan encourages collaboration Generally - no single investigator can prove association unequivocally There is simply too much for one group to do Spreads the analytical burden –Billions of genetic data points x thousands of phenotypes ‘Good will’ – the open access plan encourages collaboration Generally - no single investigator can prove association unequivocally

5 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Persuading collaborators of the pros: 2 1.3 1.5 1.2 Wang et al., NatRevGen, 2005 Size matters

6 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Persuading collaborators of the pros: Follow-up size matters

7 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Persuading collaborators of the pros: Size matters

8 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Emerging technologies Sharing promotes collaboration – critical for survival in the coming era EMERGING TECHNOLOGIES GW assay of epigenetic variation Digital expression GW assay of allelic expression High density resequencing GW sequencing Sharing promotes collaboration – critical for survival in the coming era EMERGING TECHNOLOGIES GW assay of epigenetic variation Digital expression GW assay of allelic expression High density resequencing GW sequencing

9 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Rewards and Challenges: I.Persuading collaborators of the pros of open data sharing II.Ensuring sample collection and consent allows data posting and sharing III.Striving to use standardized phenotype (and genotype) measures that can be applied across past, present and future studies IV.Deciding what, where and how to post V.Continued support of the data and reasonable responses to queries I.Persuading collaborators of the pros of open data sharing II.Ensuring sample collection and consent allows data posting and sharing III.Striving to use standardized phenotype (and genotype) measures that can be applied across past, present and future studies IV.Deciding what, where and how to post V.Continued support of the data and reasonable responses to queries

10 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Sample collection and consent Covered well already – however, to reinforce: –Intellectual property –Tiered consents Important to seek counsel on what is a real barrier and what is an artificial barrier Funding decisions likely to be highly influenced by ability to share Covered well already – however, to reinforce: –Intellectual property –Tiered consents Important to seek counsel on what is a real barrier and what is an artificial barrier Funding decisions likely to be highly influenced by ability to share

11 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Rewards and Challenges: I.Persuading collaborators of the pros of open data sharing II.Ensuring sample collection and consent allows data posting and sharing III.Striving to use standardized phenotype (and genotype) measures that can be applied across past, present and future studies IV.Deciding what, where and how to post V.Continued support of the data and reasonable responses to queries I.Persuading collaborators of the pros of open data sharing II.Ensuring sample collection and consent allows data posting and sharing III.Striving to use standardized phenotype (and genotype) measures that can be applied across past, present and future studies IV.Deciding what, where and how to post V.Continued support of the data and reasonable responses to queries

12 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Standardized genotype Different platforms; (affymetrix and illumina) Different chips and different versions Different calling algorithms Ensuring measures of genotyping quality across labs Different platforms; (affymetrix and illumina) Different chips and different versions Different calling algorithms Ensuring measures of genotyping quality across labs Seek advice from an experienced user who has made all the mistakes

13 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Standardized genotype Different platforms; (affymetrix and illumina) Different chips and different versions Different calling algorithms Ensuring measures of genotyping quality across labs Different platforms; (affymetrix and illumina) Different chips and different versions Different calling algorithms Ensuring measures of genotyping quality across labs Release of raw unfiltered data helps resolve these problems

14 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Standardized phenotype Maximising impact of data (phenotypes!) – digital nature of data means pooling is achievable Requires that phenotypic measures be directly compatible –Association is likely to be complex, so reducing complexity is a must Decide on a measure and stick to it Maximising impact of data (phenotypes!) – digital nature of data means pooling is achievable Requires that phenotypic measures be directly compatible –Association is likely to be complex, so reducing complexity is a must Decide on a measure and stick to it

15 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Rewards and Challenges: I.Persuading collaborators of the pros of open data sharing II.Ensuring sample collection and consent allows data posting and sharing III.Striving to use standardized phenotype (and genotype) measures that can be applied across past, present and future studies IV.Deciding what, where and how to post V.Continued support of the data and reasonable responses to queries I.Persuading collaborators of the pros of open data sharing II.Ensuring sample collection and consent allows data posting and sharing III.Striving to use standardized phenotype (and genotype) measures that can be applied across past, present and future studies IV.Deciding what, where and how to post V.Continued support of the data and reasonable responses to queries

16 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov What, when, where and how P-values and OR Genotype frequencies Individual genotypes (ie per sample) Calling metrics Raw image data P-values and OR Genotype frequencies Individual genotypes (ie per sample) Calling metrics Raw image data What level of phenotypic data?

17 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov What, when, where and how Embargoed release –On publication –X months following data production –Immediately on data production Embargoed release –On publication –X months following data production –Immediately on data production

18 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Where and how Depositing at a central database Depositing on an institutionally supported web-site or distributing in an ad hoc manner Depositing at a central database Depositing on an institutionally supported web-site or distributing in an ad hoc manner

19 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Where and how Depositing at a central database

20 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Where and how Depositing at a central database The main time investment is a one time deposit of data Infrastructure burden removed Some data-cleaning and manipulation – phenotypic data – Genotypic data – imputation, clustering Data tied in to NCBI databases Some data-cleaning and manipulation – phenotypic data – Genotypic data – imputation, clustering Data tied in to NCBI databases

21 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Continued support Depositing on a local web-site or distributing in an ad hoc manner – –Requires significant time and resources to distribute the data –Being prepared to answer questions –Being asked to re-analyze Depositing on a local web-site or distributing in an ad hoc manner – –Requires significant time and resources to distribute the data –Being prepared to answer questions –Being asked to re-analyze Recommend release to dbGAP; and only supplementary release locally if there is substantial infrastructure in place

22 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Our experience In the last 18 months generation of ~3 billion genotypes using Illumina platform Neurological disease (case-control) Diverse populations to catalog variation Epidemiological cohorts with focus on aging phenotypes In the last 18 months generation of ~3 billion genotypes using Illumina platform Neurological disease (case-control) Diverse populations to catalog variation Epidemiological cohorts with focus on aging phenotypes

23 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Generation and analysis of genome wide SNP genotype data in PD, ALS, stroke, AD and controls Six postdocs generated these data; all have one first author paper and five “joint 1 st author” papers Generation and analysis of genome wide SNP genotype data in PD, ALS, stroke, AD and controls Six postdocs generated these data; all have one first author paper and five “joint 1 st author” papers Our experience

24 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Paradigm – NOT to define all risk loci Searching for high risk loci – (none found) Generate GWSNP data in publicly available DNA samples Generating data that can be mined and augmented by other interested researchers Paradigm – NOT to define all risk loci Searching for high risk loci – (none found) Generate GWSNP data in publicly available DNA samples Generating data that can be mined and augmented by other interested researchers Our experience

25 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Public release – facilitating discovery >600,000,000 genotypes from PD, ALS stroke and control cohorts posted publicly downloaded by >500 unique visitors >600,000,000 genotypes from PD, ALS stroke and control cohorts posted publicly downloaded by >500 unique visitors http://ccr.coriell.org/ninds/ http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap Two manuscripts published; five in press/preparation by ‘downloaders’

26 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Data was released ~6 weeks prior to publication – nothing ‘bad’ happened Three ongoing collaborations based on the initial work Collaborators want to share sample series and results from their own GWAS This has brought novel analytic techniques and perspectives to our laboratory Data was released ~6 weeks prior to publication – nothing ‘bad’ happened Three ongoing collaborations based on the initial work Collaborators want to share sample series and results from their own GWAS This has brought novel analytic techniques and perspectives to our laboratory Public release – facilitating collaboration

27 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Public release – novel techniques We analyzed CNVs manually Six people 3 months Data release directly led to a collaboration to create and test automated CNV calling

28 Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging singleta@mail.nih.gov Bottom line Data sharing is becoming standard Requires time, resources and planning to be most effective It repays the investment, collaborators realize you know your data best Funding bodies are beginning to mandate data sharing –If there is no plan for sharing, no adequate consent, no depth to the data – will have a lower funding priority Data sharing is becoming standard Requires time, resources and planning to be most effective It repays the investment, collaborators realize you know your data best Funding bodies are beginning to mandate data sharing –If there is no plan for sharing, no adequate consent, no depth to the data – will have a lower funding priority


Download ppt "Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging Andrew Singleton, Chief of the."

Similar presentations


Ads by Google