Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding the Human Genome: Lessons from the ENCODE project

Similar presentations

Presentation on theme: "Understanding the Human Genome: Lessons from the ENCODE project"— Presentation transcript:

1 Understanding the Human Genome: Lessons from the ENCODE project
University of Brawijaya 4th December 2013 Austen Ganley INMS Understanding the Human Genome: Lessons from the ENCODE project


3 Glossary Genome Non-coding RNA Genes Sequencing DNA/RNA Microarray
Protein Cell Transcription Chromatin Histones Nucleosomes Non-coding RNA Sequencing Microarray Transcription start site Active/open Inactive/repression

4 transcriptional start site transcriptional terminator
intron promoter exon

5 Introduction Individual scientists worked together
Aim was to understand 1% of the human genome (2007), and 100% (2012) Looked at: Transcription Chromatin/transcription factors Replication Evolution

6 Genes Now estimated to be about 21,000 protein-coding genes (taking about 3% of the whole genome) In addition, there are about 9,000 microRNAs, and about 10,000 long non-coding RNAs

7 Transcription Transcription was measured by two different methods:
Whole genome microarrays RNA-sequencing

8 Detecting transcription using tiled microarrays

9 Transcription Transcription was measured by two different methods:
Whole genome microarrays RNA-sequencing They found at least 62% of the whole genome is transcribed (remember, genes only account for about 3% of the whole genome)

10 Transcriptional start sites
Goal is to identify the transcription start sites Not easy to do! Use a technique called CAGE (Cap Analysis Gene Expression)

11 CAGE Makes use of the 5’ CAP on mRNA
First, mRNA is reverse-transcribed, to form cDNA (RNA-DNA hybrid) Then, biotin is attached to the 5’ CAP, and the cDNA is fragmented The biotin fragments are isolated (representing the 5’ end of mRNA), and these fragments are sequenced

12 About 60,000 transcription start sites found
Only half of these match known genes What do the other ones do? May explain high level of transcription The transcription start sites are often far upstream of the gene start, and can overlap genes

13 Overlapping Genes An overlapping gene, starting far upstream
Transcriptional start sites from the DONSON gene An overlapping gene, starting far upstream The DONSON gene is a known gene However, some transcripts start in the ATP50 gene, and include some ATP50 exons Two genes are skipped out

14 Chromatin: histones and nucleosomes
Nucleosomes are formed from DNA that is packaged around histones Histones are a set of proteins that usually associate as an octamer

15 Dnase I hypersensitive sites (DHS)
DNase I preferentially digests nucleosome-depleted regions (DNase I hypersensitive sites) These are associated with gene transcription Chromatin is digested with DNase I: only digests nucleosome-free regions The remaining DNA is isolated, and put on a microarray or sequenced Find the open, active regions of the genome Hebbes Lab, University of Portsmouth, UK Gilbert, Developmental Biology, Sinauer

16 DNase I hypersensitive sites
In total, about 3 million DNase I hypersensitive sites in the genome, covering about 15% (versus about 40,000 genes covering about 4%) Transcriptional start sites are regions of DNase I hypersensitivity, as expected Most DNase I hypersensitive sites are not associated with transcriptional start site, though

17 Transcription start sites
Genome Transcribed region Transcription start sites DNase I hypersensitive region Genes

18 Histone Modification Effects
Modifications occur on the histone tails They alter the strength of DNA-histone binding, and influence the binding of other proteins to the DNA Thus they can activate or silence gene expression

19 The “Histone Code” The combination of histone modifications determine a gene’s transcriptional status – histone code Some modifications are associated with active gene expression H3K4me2 H3K4me3 H3ac H4ac Some with repression H3K27me3 H3K4me1

20 ChIP (Chromatin immunoprecipitation)
Method to find where your protein of interest is binding to You cross-link the sample, and fragment the DNA into pieces Immunoprecipitate using an antibody to your protein of interest Reverse the cross-links, and isolate the DNA To find where in the genome the protein was bound: Hybridise the DNA to a microarray (ChIP-chip) OR sequence it (ChIP-seq)

21 Histone modification profiles
They found that histone modifications associated with active transcription were found around transcription start sites They found that histone modifications associated with gene repression were depleted around transcription start sites This is as expected Around DNase I hypersensitive sites not near transcription start sites, they found almost the opposite pattern

22 Enrichment of active histone marks and depletion of inactive histone marks at a transcription start site Enrichment of inactive histone marks but little enrichment of active histone marks at a DNase I hypersensitive site

23 Histone modification profiles
They also found other patterns Combining all the results (plus results for transcription factor binding), they say that the human genome is divided into seven different types of chromatin states Which state it is depends on what combination of histone modifications/transcription factor binding there is

24 The seven chromatin states

25 The seven chromatin states
Enhancer (yellow) Gene body (green) Inactive region (grey) Promoter (red)

26 Grand Summary ENCODE Transcription start sites:
• Twice as many transcription start sites as traditional “genes” • transcripts span large regions, even between genes DNase I hypersensitive sites: • more than just at transcription start sites • two types: those found both at TSS, and those found at other regions • these have different chromatin profiles Transcription: • a lot of non-coding transcription (~60% of the genome transcribed) – much more than needed just to transcribe all the genes ENCODE Overview: • genome can be generalised into seven different states • the function of some of these states is known – e.g. promoter • the function of others is not known, but may explain the high level of transcription and open chromatin structure Histone modifications: • active marks correlate with TSS/DHS • distal DHS have a different histone modification profile Chromatin states: • The genome can be divided into seven different types • these are determined by the combination of histone modifications and transcription factor binding that occur

Download ppt "Understanding the Human Genome: Lessons from the ENCODE project"

Similar presentations

Ads by Google