Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.

Similar presentations

Presentation on theme: "(Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer."— Presentation transcript:

1 (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer Center

2 UCSC Genome Browser isPCR BLAT LiftOver Track types – BED minimum – BED extended – WIG Track Display and Shuffle Browser Navigation Custom Session – Export Figure Custom Tracks

3 0-based coordinates Sequence A|C|C|G|G|T|C|G|A 1 based based

4 Human Genome Assemblies

5 BED Track Formats track name="ItemRGBDemo" description="Item RGB demonstration" visibility=2 itemRgb="On" chr Pos ,0,0 chr Pos ,0,0 chr Pos ,0,0 chr Pos ,0,0 chr Neg ,0,255 chr Neg ,0,255 chr Neg ,0,255 chr Pos ,0,0 chr Neg ,0,255

6 BED Track Formats Header: space separated parameters name= description= type= - Defines the track type. The track type attribute is required for BAM, BED detail, bedGraph, bigBed, bigWig, broadPeak, narrowPeak, Microarray, VCF and WIG tracks. visibility= 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish. color= - Defines the main color for the annotation track. itemRgb=On colorByStrand= - Sets colors for + and - strands, in that order. useScore= group= - priority= - When the group attribute is set, defines the display position of the track relative to other tracks db= - When set, indicates the specific genome assembly for which the annotation data is intended; offset= - Defines a number to be added to all coordinates in the annotation track. The default is "0". maxItems= - Defines the maximum number of items the track can contain. url= - Defines a URL for an external link associated with this track. htmlUrl= - Defines a URL for an HTML description page to be displayed with this track. bigDataUrl= - Defines a URL to the data file for BAM, bigBed, bigWig or VCF tracks.

7 BED Track Formats For intervals Header: space separated configuration parameters – chrom - The name of the chromosome – chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0. – chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. – name - Defines the name of the BED line. – score - A score between 0 and – strand - Defines the strand - either '+' or '-'. – thickStart - The starting position at which the feature is drawn thickly – thickEnd - The ending position at which the feature is drawn thickly – itemRgb - An RGB value of the form R,G,B (e.g. 255,0,0). – blockCount - The number of blocks (exons) in the BED line. – blockSizes - A comma-separated list of the block sizes. – blockStarts - A comma-separated list of block starts.

8 WIG track format #150 base wide bar graph at arbitrarily spaced positions, #threshold line drawn at y=11.76 #autoScale off viewing range set to [0:25] #priority = 10 positions this as the first graph #Note, one-relative coordinate system in use for this format track type=wiggle_0 name="variableStep" description="variableStep format" visibility=full autoScale=off viewLimits=0.0:25.0 color=50,150,255 yLineMark=11.76 yLineOnOff=on priority=10 variableStep chrom=chr19 span= #200 base wide points graph at every 300 bases, 50 pixel high graph #autoScale off and viewing range set to [0:1000] #priority = 20 positions this as the second graph #Note, one-relative coordinate system in use for this format track type=wiggle_0 name="fixedStep" description="fixedStep format" visibility=full autoScale=off viewLimits=0:1000 color=0,200,100 maxHeightPixels=100:50:20 graphType=points priority=20 fixedStep chrom=chr19 start= step=300 span=

9 Specific Tracks of interest UCSC genes RefSeq Genes RepeatMasker Conservation TF motif predictions dbSNP ENCODE Roadmap

10 Custom Sessions Create an account Customize the tracks displayed Add you own track (limited in size and time) Save and Share

11 Table Browser Subset gene, region, genome Output BED or fasta Intersection Filters

12 ENCODE / Roadmap Tracks Track search Cell Types / Tissue Types Raw Peaks HMM

13 UNIX commands Head More (press Q to exit) Cat – Example cat file – Example cat file1 file2 Grep – Grep –v ‘expression’ – Grep –A 1 ‘expression’ – Grep –B 2 ‘expression’ – Example: grep –v ‘#’ file.txt to remove comments Expression metacharacters – $ end of line – $ beginning of line – [AB] A or B – * any character – Example: ‘CDKN*’ or ‘chr[1-7]’

14 UNIX commands Cut –cut –f 1 –cut –f 3 –d ‘:’ Sort –sort –n –sort –nr (or sort –n –r) –sort –k 2 uniq –uniq –uniq -c wc –wc –l file.txt –Example: cut –f 1 file | sort | uniq -c

15 UNIX commands Sed –Sed ‘s/foo/bar/g’ file : find and replace Awk –Awk ‘$3>2000’ file : select row with 3 rd field>2000 –Awk ‘{if ($3>2000) print $1,$2}’ file only print first 2 columns –Awk ‘{sum+=$3} END {print sum}’ file print sum of column 3 –Awk ‘{sum+=$3} END {print sum/NR}’ file print average of column 3 Join –join –j 1 sorted_file1 sorted_file2

16 Demo #1 and #2

17 DNA variants (Sequence differences) Highly Similar Genomes Phenotypic Differences (Physical traits) Human Genetic Variation

18 Variant Types Frazer et al Rahim, Harismendy et al (2008)

19 Within any given individual there are ~ 4 million genetic variants encompassing ~ 12 Mb Variants from an individual genome

20 Variants from multiple genomes Within a given individual the majority of variants are common.

21 Next Generation DNA analysis Whole genome sequencing – Mutations (coding and non-coding) – Translocations – Copy Number Variants Whole Exome Sequencing – Mutations (coding) – ~Copy number variants (trisomia, gene amplifications) Gene Panel – Mutations (coding)

22 Variant Frequencies Common genetic variants – second allele present at greater than 3% frequency Rare genetic variant – present at less than 3% frequency, and commonly at very low frequencies Private variants – in limited families or single individuals

23 Map of Genetic Variation Relationships between common SNPs in the human genome Frazer et al (2007) HapMap Project Genotyped ~ 3.1 million SNPs in 270 individual s –90 Yoruba in Ibadan, Nigeria (YRI) –90 European descent in Utah, USA (CEU) –45 Han Chinese in Beijing, China (CHB) –45 Japanese in Tokyo, Japan (JPT)

24 1000G Project

25 VCF format ##fileformat=VCFv4.1 ##fileDate= ##source=myImputationProgramV3.1 ##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta ##contig= ##phasing=partial ##INFO= ##FILTER= ##FORMAT= #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA rs G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:., T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41: rs A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35: T. 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61: microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3

26 Linkage Disequilibrium (LD) Given two biallelic sites there are four combinations that can be observed with the following distributions. SNP 1 = A/G SNP 2 = A/C SNP1- SNP2 Case r 2 =1Case r 2 =0 A7025 ACAC0 GAGA0 GCGC3025 LD measure the level of correlation between SNPs LD is the consequence of recombination at preferential sites

27 LD Bin structure example LD bin = groups of SNPs with r 2 ≥0.8 The majority of common SNPs are in LD bins in the human genome Genotypes of a set of ~500,000 “tag SNPs” provide information (r 2 ≥ 0.8) regarding a large fraction (90%) of all 8 million common SNPs present in humans.

28 GWAS principle Tests if common SNPs tagging an interval in the human genome are “associated” with a disease From phenotype to genotype

29 GWAS results WTCCC (2007) PR interval Large number to test requires low p-value ( ) Sample sizes determine variant frequencies and effect size (Power) Q traits 1319 studies >4000 associated SNPs

30 GWAS highlights Many genes/loci not previously known to be involved in the diseases studied Newly identified pathways suggest that molecular sub- phenotypes of common diseases may exist Many common diseases have the same associated genes suggesting similar etiologies

31 GWAS limitations – Genetic Small Effect sizes : only explains a small fraction (1-25%) of the heritability Missing heritability can be hiding in – Rare variants with large effects – Epitasis (Gene x Gene interactions) – Gene x Environment interaction (overlooked in heritability studies) – Clinical Limited Prognostic value : classic marker (family history, life style) work better Limited by ethnicity – Functional Proxy SNPs are not the functional ones Genes associated by proximity : Variants are mostly outside Cell type and condition unknown

32 Demo #3

33 Cancer Types

34 Clinical Data Collected age_at_initial_pathologic_diagnosis 100% history_of_colon_polyps 82% preoperative_pretreatment_cea_le vel 60% icd_10 89% pretreatment_history 100% icd_o_3_histology 99% primary_lymph_node_presentation _assessment 98% ajcc_cancer_staging_handbook_edition 80% icd_o_3_site 99% primary_tumor_pathologic_spread 100% anatomic_site_colorectal 88% informed_consent_verified 100% prior_diagnosis 100% bcr_patient_uuid 100% kras_gene_analysis_performed 89% race 57% braf_gene_analysis_performed 87% kras_mutation_codon 4% residual_tumor 82% braf_gene_analysis_result 6% kras_mutation_found 9% synchronous_colon_cancer_present 87% circumferential_resection_margin 10% loss_expression_of_mismatch_repair_protei ns_by_ihc 74% tissue_source_site 100% colon_polyps_present 42% lymph_node_examined_count 98% tumor_stage 96% date_of_form_completion 100% lymphatic_invasion 87% tumor_tissue_site 100% date_of_initial_pathologic_diagnosis 100% lymphnode_pathologic_spread 100% venous_invasion 83% days_to_birth 100% microsatellite_instability 16% vital_status 100% days_to_death 89% non_nodal_tumor_deposits 43% weight 51% days_to_initial_pathologic_diagnosis 100% number_of_abnormal_loci 12% anatomic_organ_subdivision 2% days_to_last_followup 96% number_of_first_degree_relatives_with_can cer_diagnosis 85% loss_expression_of_mismatch_repa ir_proteins_by_ihc_result 18% days_to_last_known_alive 61% number_of_loci_tested 12% distant_metastasis_pathologic_spread 98% number_of_lymphnodes_positive_by_he 94% ethnicity 55% number_of_lymphnodes_positive_by_ihc 9% gender 100% patient_id 100% height 47% perineural_invasion_present 33% histological_type 99% person_neoplasm_cancer_status 86% Personal and history Histology Clinical Molecular

35 Days after Dx Patients Decreasing Intrinsic sensitivity Clinical Data Collected

36 Molecular Data Collected MoleculeMethodMeasured entityData RNAmicroarrays15,000 transcriptsExpression levels RNARNA-Seq All known and novel trasncripts Expression levels, isoform quantification, editing, Novel transcripts, Fusion Trasncripts DNAmicroarrays100k to 1M SNP Copy Number Aberrations, LoH, Polymorphisms DNASanger Sequencing30 M Base pairsCoding Mutations DNA whole exome sequencing 50 M Base pairs Coding Mutations, Copy Number Aberrations DNAwhole genome3 billion base pairs Coding and Regulatory Mutations, Copy Number Aberrations, Rearragements DNAMethylation Array450,000 CpGMethylation levels DNAMethylation Array27,000 CpGMethylation levels

37 Demo #4

Download ppt "(Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer."

Similar presentations

Ads by Google