Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.

Slides:



Advertisements
Similar presentations
Chapter 10 How proteins are made.
Advertisements

Outline to SNP bioinformatics lecture
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Whole Genome Polymorphism Analysis of Regulatory Elements in Breast Cancer AAGTCGGTGATGATTGGGACTGCTCT[C/T]AACACAAGCGAGATGAAGAAACTGA Jacob Biesinger Dr.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
is accessible at: The following pages are a schematic representation of how to navigate through ALE-HSA21.
RNA and Protein Synthesis
RNA Ribonucleic Acid. Structure of RNA  Single stranded  Ribose Sugar  5 carbon sugar  Phosphate group  Adenine, Uracil, Cytosine, Guanine.
Gene Mutations Higher Human Biology Unit 1 – Human Cells.
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Chapter 13. The Central Dogma of Biology: RNA Structure: 1. It is a nucleic acid. 2. It is made of monomers called nucleotides 3. There are two differences.
Korea BioInformation Center Byoung-Chul Kim
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
DNA TO RNA Transcription is the process of creating a molecule that can carry the genetic blueprint for a particular protein coding gene from the DNA.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Sackler Medical School
AP Biology Discussion Notes Friday 02/06/2015. Goals for Today Be able to describe RNA processing and why it is EVOLUTIONARILY important. In a more specific.
1 Unit 4 The Code of Life. 2 Topic 3 Gene Expression.
 The central concept in biology is:  DNA determines what protein is made  RNA takes instructions from DNA  RNA programs the production of protein.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Genes and How They Work Chapter The Nature of Genes information flows in one direction: DNA (gene)RNAprotein TranscriptionTranslation.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Bioinformatics and Computational Biology
RNA, transcription & translation Unit 1 – Human Cells.
12/16/14 StarterConnection/Exit: What is the true meaning of the word mutation? Are mutations bad / harmful? 12/16/14 Protein Synthesis Writing
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
11 Gene function: genes in action. Sea in the blood Various kinds of haemoglobin are found in red blood cells. Each kind of haemoglobin consists of four.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
D.N.A Describe how you would go about genetically engineering a bacterium to produce human epidermal growth factor (EGF), a protein used in treating burns.
Cells use information in genes to build several thousands of different proteins, each with a unique function. But not all proteins are required by the.
Starter What do you know about DNA and gene expression?
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Notes: Human Genome (Right side page)
RNA processing and Translation. Eukaryotic cells modify RNA after transcription (RNA processing) During RNA processing, both ends of the primary transcript.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
1 From Bi 150 Lecture 0 October 4, 2012 An introduction to molecular biology... but you will learn the cell biology in this course.
Unit 1: DNA and the Genome Structure and function of RNA.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Chapter – 10 Part II Molecular Biology of the Gene - Genetic Transcription and Translation.
Gene Expression = Protein Synthesis.
Week-6: Genomics Browsers
Lesson Four Structure of a Gene.
Lesson Four Structure of a Gene.
Types of Mutations.
Chapter 10 How Proteins are Made.
School of Pharmacy, University of Nizwa
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
RNA.
Transcription & Translation.
DNA and the Genome Key Area 6a & b Mutations.
School of Pharmacy, University of Nizwa
DNA and the Genome Key Area 6a & b Mutations.
One SNP at a Time: Moving beyond GWAS in Psoriasis
From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.
Gene Safari (Biological Databases)
RNA.
Presentation transcript:

Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health and Population Studies, University of KwaZulu-Natal, Durban, South Africa

Introductions whole genome sequencing and fine-mapping meta-analysis and power of genetic studies Genetics GWAS results and interpretation GWAS QC Basic principles of measuring disease in populations population genetics Principal components analyses Basic genotype data summaries and analyses GWAS association analyses Epidemiology Bioinformatics Public databases and resources for genetics

2003: The HGP publishes the sequence of a “reference” human genome

You can download the human genome sequence from here: It looks like this: The sequence alone is not that useful!

Other projects generated resources to shed more light on genomes. Genome variation – How does the genome sequence vary from person to person? – Genotype (HapMap) or sequence (1000 Genomes) many more individuals Genome function – How does the DNA sequence make and regulate RNA and proteins? – Gene prediction models (GenCode, RefSeq), assays of non-coding function (ENCODE) All these resources can be access free on the internet

Genome browsers Most of the data that we will discuss is available from “genome browsers” These are websites that put together public data in one place, and make it searchable and browsable The main two genome browsers are Ensembl and the University of California Santa Cruz (UCSC) genome browser

Much of the data Ensembl UCSC genome browser

Genome variation

: HapMap documents common variation within and across human populations - ~2M single nucleotide polymorphisms (SNPs) genotyped in ~1000 individuals from 11 populations - Used genotyping microarrays.

You can download the HapMap data from here: It looks like this:

: 1000 Genomes Project sequences thousands more human genomes samples from 25 populations, documenting over 40M SNPs, insertions, deletions and inversions - Used high-throughput sequencing

You can download the 1000 Genomes data from here: It looks like this:

Ensembl can give us HapMap and 1000 Genomes information on particular SNPs:

Genome function

Annotating function onto sequence ACTCATGCATCGATGCGATG

Transcription start site Annotating function onto sequence

Transcription start site Transcript end Annotating function onto sequence

Transcription start site Transcript end Start codon End codon Annotating function onto sequence

Transcription start site Transcript end Start codon End codon Splice sites Annotating function onto sequence

Exons3’ UTR Introns 5’ UTR Annotating function onto sequence

mRNA: Exons3’ UTR Introns Coding sequence 5’ UTR Annotating function onto sequence

mRNA: 5’ UTRExons3’ UTR Introns Coding sequence promoterenhancer Transcription factor binding Annotating function onto sequence

Functional annotation projects Gene model builders (e.g. RefSeq, GENCODE) Use computational models to predict where transcription starts and stops, and where splicing occurs to make predicted transcripts. Use both sequence data and experimental data. Discovery of non-coding regulatory elements (e.g. ENCODE) Use experimental data (RNA-Seq, ChIP-Seq, histone modifications) to discover functional regulatory elements outside of genes.

You can download GENCODE data from here: And ENCODE data from here: They look like this:

UCSC can show us functional information on a gene Transcripts Promoter activity Transcription factor binding

Ensembl tells us what impact a variant has on nearby genes This variant causes a frameshift in the gene NOD2

Putting it all together We discover a protective effect of the A allele of SNP rs334 on severe malaria How common is this variant? – Search for rs334 in Ensembl, click “Variation”, then “Human”, then “rs334”, then “Population Genetics”.

According to 1000 Genomes, Europeans and Asians do not carry the A variant, but 9% of Africans do.

Putting it all together Does this variant lie within a predicted gene? – Search for rs334 in the UCSC genome browser and click “rs334”

According to RefSeq and GENCODE, this variant lies within a transcript for the gene HBB

Putting it all together Does this variant alter the HBB gene? – Search for rs334 in Ensembl, click “Variation”, then “Human”, then “rs334”, then “Genes and Regulation”.

The A allele changes the 7 th amino acid in the HBB protein from Glutamic acid (E) to Valine (V)

Practical task For each of the following variants please find: – The allele frequencies in different populations – The gene it is in (if any) – The consequence it has on the gene (if any). This could be an amino acid change, or presence in a regulatory region. – Any other interesting information you can find on those pages. You can even search Google for these SNPs to see if you can learn anything else! The variants are: – rs – rs – rs – rs