Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Biomedical Informatics Bioinformatics and Genetics Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared Resource.

Similar presentations


Presentation on theme: "Department of Biomedical Informatics Bioinformatics and Genetics Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared Resource."— Presentation transcript:

1 Department of Biomedical Informatics Bioinformatics and Genetics Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared Resource The Ohio State University 2011

2 Department of Biomedical Informatics 2 Outline Introduction Genetic variations Technologies Array-based technology Massive sequencing Genome wide association study (GWAS) SNP array  exome sequencing  genome re- sequencing Expression quantitative trait loci (eQTL) Allelic specific ********ion

3 Department of Biomedical Informatics 3 Genetic Variations SNP In-Del Transposon Copy number variation LOH Gene fusion …

4 Department of Biomedical Informatics Single Nucleotide Polymorphism (SNP) At least 1% of a population has a different nucleotide There are many other classes of variants and these are no less important (e.g., deletions and duplications), SNP are simply the most abundant. First SNPs - RFLPs – D. Botstein - 1980 The single nucleotide polymorphism (SNP) [pronounced "snip"] is the most common form of genetic variation. As the name suggests, each SNP is a difference in a single nucleotide (A,T,C,or G) of an individual's DNA sequence, such as having AAGG instead of ATGG. There may be from 1 to 10 million SNPs in the entire human genome, but perhaps only a few thousand relate to disease outcomes. The numbers seem to change with every news report.

5 Department of Biomedical Informatics 5 Critical SNP concepts Marker SNP vs. Functional SNP SNPs highlights the spots for search (features, region of interest). SNP patterns from a target population can be compared with SNP patterns from unaffected populations to find genetic variations shared only by the affected group. The most useful SNPs are known as "functional SNPs." A single functional SNP or certain combinations of functional SNPs may help explain variability in individual responses to a given drug or pinpoint the subtle genetic differences that predispose some to diseases such as arthritis, Alzheimer's, cancer, diabetes, and depression.

6 Department of Biomedical Informatics 6 Critical SNP concepts Understand evolution DNA fingerprinting – forensic applications Markers for polygenetic traits Genotype-specific medicine (personalized medicine)

7 Department of Biomedical Informatics 7 Critical SNP concepts 1. Humans are diploid and exhibit significant heterogeneity and heterozygosity 2. DNA is essentially identical in every cell 3. The closer two SNP are the less likely they are to have segregated in a population (linkage disequilibrium) 4. Multiple variants/alleles can be combined into haplotypes (polygenic markers – quantitative trait loci or QTL)

8 Department of Biomedical Informatics 8 HapMap The International HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. Six participating countries: Japan, the United Kingdom, Canada, China, Nigeria, and the United States. The goal is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. Data generated by the Project can be downloaded with minimal constraints.downloaded http://www.hapmap.org/index.html.en

9 Department of Biomedical Informatics 9 NCBI SNP

10 Department of Biomedical Informatics 10 In-Del Transposon Aneuploidy … Keiko et al, Genome Research 2008

11 Department of Biomedical Informatics 11 SNP Array Affymetrix SNP 6.0 array More than 906,600 SNPs: Unbiased selection of 482,000 SNPs; historical SNPs from the SNP Array 5.0 Selection of additional 424,000 SNPs Tag SNPs SNPs from chromosomes X and Y Mitochondrial SNPs New SNPs added to the dbSNP database SNPs in recombination hotspots More than 946,000 copy number probes

12 Department of Biomedical Informatics 12 SNP Array Affymetrix SNP 5.0 array

13 Department of Biomedical Informatics Cytogenetics

14 Department of Biomedical Informatics CGH – Comparative Genomic Hybridization

15 Department of Biomedical Informatics $1000 genome project Solexa SOLiD 454 Re-sequencing using massive parallel sequencer

16 Department of Biomedical Informatics 16 GWAS Focus is on SNPs Control vs case Chi-square based test Distribution of haplotypes in different conditions Contigency table Other statistics or metric can also be used

17 Department of Biomedical Informatics 17 GWAS Statistical challenges Millions of SNPs – millions of tests Compensate for multiple tests P-value cutoff is very stringent Needs a lot of samples (thousands or more) to achieve the necessary power Rare event detection is statistically challenging

18 Department of Biomedical Informatics 18 GWAS Interpretation challenges Association is NOT causation Many SNPs are on inter-genic regions (not on genes) For SNPs on genes, most of them do NOT affect protein coding – what are they doing? Due to the stringent cut, many potentially associated genes were not selected and it is hard to infer high level information such as pathways

19 Department of Biomedical Informatics 19 GWAS Integration of bioinformatics information Pathway information – not necessarily the same genes are targeted – could be the same pathways Other annotations – networks, GO terms Frequent pattern – data mining using frequent item set on SNPs Frequent set mining on pathways (not just genes) The only phenotypes are disease vs control – how about other phenotypes?

20 Department of Biomedical Informatics 20 Quantitative Trait Locus (QTL) Quantitative phenotype – phenotype attributed to multiple genes (polygenic effects) Examples – height, longevity Multiple genes + environment QTLs – stretches of DNA containing or linked to the genes that underlie a QT Detection – copy number variance, SNPs Statistical analysis t statistics (compare the quantitative phenotypes between the two groups with different genotype) Multiple genotype groups – ANOVA (F statistics) Mutual information

21 Department of Biomedical Informatics 21 Expression Quantitative Trait Locus (eQTL) Gene expression is a quantitative phenotype – phenotype attributed to multiple genes (what are the possible ones?) Besides other genes – regulatory elements eQTLs – most focus on SNP vs gene expression 3 million SNPs X 20,000 genes  6X10 10 ANOVA tests

22 Department of Biomedical Informatics 22 Expression Quantitative Trait Locus (eQTL) Restrain to a small set of SNPs E.g., for a gene, only focus on the SNPs on the gene Cis-eQTL (local) Trans-eQTL (distal) Direct and indirect effects Second and third order effects eQTL networks Lodish et al, Molecular Cell Biology

23 Department of Biomedical Informatics RNA-seq Paradigm changes by NGS RNA-seq – not only gene expression, but also sequences

24 Department of Biomedical Informatics TopHat Trapnell et al. Bioinformatics 2009

25 Department of Biomedical Informatics After TopHat You got this: But you want this: Cufflinks

26 Department of Biomedical Informatics Assigning each reads to its potential isoform by maximizing a function that assigns a likelihood to all possible sets of relative abundances of the different isoforms. Open source software Trapnell et al. Nat. Biot 2010 Cufflinks

27 Department of Biomedical Informatics From sequence reads to isoforms Primary aligner:Eland, BFAST, BOWTIE, … Junction finding Strategy: TopHat SOLiD Bioscope … Isoform identification: Xing et al. NAR 2006 Jiang et al. Bioinformatics 2009 Cufflink (Nat Biot 2010) Scribble (Nat Biot 2010) …

28 Department of Biomedical Informatics Allelic Specific Expression Specific X-chromosome suppression Much more broader presence in the genome Screen for functional SNPs

29 Department of Biomedical Informatics Allelic Specific Expression Screen for functional SNPs A=48 G=89 A=99 G=105

30 Department of Biomedical Informatics Allelic Specific Binding Protein binding requires recognition of specific sequences (motifs) Mutations on the binding sites may lead to disruption of regulation and hence expression Kasowski et al, Science, 2010.

31 Department of Biomedical Informatics Allelic Specific Methylation One of the earliest known mechanism for allelic specific expression

32 Department of Biomedical Informatics Other Allelic Specific Events Allelic specific splicing BMC Genomics.BMC Genomics. 2008 Jun 2;9:265. Genome-wide survey of allele-specific splicing in humans. Nembaware VNembaware V, Lupindo B, Schouest K, Spillane C, Scheffler K, Seoighe C.Lupindo BSchouest KSpillane CScheffler K Seoighe C


Download ppt "Department of Biomedical Informatics Bioinformatics and Genetics Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared Resource."

Similar presentations


Ads by Google