Presentation is loading. Please wait.

Presentation is loading. Please wait.

RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute.

Similar presentations


Presentation on theme: "RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute."— Presentation transcript:

1 RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute

2 Outline I. Non-coding RNA The genome’s dark matter Family classification Genome annotation II. ncRNA genes in the human genome Rogue’s gallery miRNAs Regulatory elements

3 T. thermophilus - Ramakrishnan et al., Cell, 2002

4 Protein/RNA genes DNA RNA protein X

5 ncRNA genes …. code for functional RNAs Many cellular machines contain RNA RibosomerRNA SpliceosomesnRNAs (U1,U2,U4,U5,U6) TelomeraseTelomerase RNA SRPSRP RNA

6 How many genes in the human genome?

7 Gene sweep CSHL Rules $1 in 2000, $5 in 2001 and $20 in 2002 A gene is a set of connected transcripts. A transcript is a set of exons connected via transcription. At least one transcript must be expressed outside of the nucleus and one transcript must encode a protein. One bet per person, per year Results 165 bets Mean Lowest Highest Answer: 21000Winner: Lee Rowen

8 ncRNA genes Genomic dark matter Ignored by gene prediction methods Not in EnsEMBL Computational complexity ~10% of human gene count?

9 The RNA World Origin of life / central dogma paradox DNA needs proteins to replicate Proteins coded for by DNA RNA can be code and machinery Selex, aptamers RNAs are remnants Ancient Essential

10 Biological sequence analysis Protein easy RNA hard

11 Gene finding Rules ATG TAA, TGA, TAG GT…..AG Compositional features Exon lengths Intron lengths Codon bias General genomic properties Homology ? ?

12 Protein sequence analysis Query: 1 MKFYTIKLPKFLGGIVRAMLGSFRKD 26 M+ TIKLPKFL IVR G+ + D Sbjct: 390 MRIMTIKLPKFLAKIVRMFKGNKKSD 467

13

14 RNA sequence analysis

15

16 Why are families useful? Alignments of related sequences Phylogenetic trees Homologue detection Genome annotation Secondary structure prediction S. cerevisiae UCCUCGUGAGAGGG P. canadensis GUCUC.UGAGAGAU P. strasburgensis CUCUC.UGAGAGAG K. thermotolerans UUCUCGUGAGAGAA SS >>>>

17

18 RNA models Covariance models (profile-SCFGs) Analogue to profile-HMMs Statistical representation of the alignment with structure Homologue detection Multiple sequence alignment (Sean Eddy)

19 Protein sequence analysis - HMMs ERELKKQKKLSNR ERELKK..KQSNR ERELKRQRKQSNR KAAAQRQKMIKNR MMMM D I EREKKKRKQSNR D I B E DD I

20 RNA sequence analysis - SCFGs MP G G A A G A U C C > > MP ML A – U G – C AA G ML

21 RNA models - problems Problems Speed Memory Sensitivity Speed 30 billion bases in DBs O(N 3 ) wrt model length small model300 b/s 28S rRNA200 b/day

22 Sanger supercomputers

23

24

25 Rfam ncRNA families Structure annotated alignments Species distributions Keyword searches Sequence searches > regions in EMBL 76

26 ncRNA families What we have: tRNA 5S, 5.8S rRNAs Spliceosomal RNAs SRP, RNaseP Telomerase, tmRNA, vault E. coli screens Some snoRNAs Some miRNAs Some UTR elements Self-splicing introns …… more What we don’t: 18S, 23S rRNAs Other large things (Xist etc) Lots of snoRNAs Lots of miRNAs Many small families Unknowns

27 Genome annotation General One tool fits allCompute drain Automatic Eukaryotic complications Comprehensive Great for prokaryotes Specific HeuristicsOne family, one gene finder Increased speed Increased sensitivity tRNAscan-SE, BRUCE, SRPscan, snoscan

28 Outline I. Non-coding RNA The genome’s dark matter Family classification Genome annotation II. ncRNA genes in the human genome Rogue’s gallery miRNAs Regulatory elements

29

30 Outline I. Non-coding RNA The genome’s dark matter Family classification Genome annotation II. ncRNA genes in the human genome Rogue’s gallery miRNAs Regulatory elements

31 International Human Genome Sequencing Consortium, Nature, 2001

32 X chromosome inactivation in mammals X XX Y X Dosage compensation

33 Xist – X inactive-specific transcript Avner and Heard, Nat. Rev. Genetics (1):59-67

34 International Human Genome Sequencing Consortium, Nature, 2001

35 microRNAs A novel class of ncRNA gene Products are ~22 nt RNAs Precursors are nt hairpins Gene regulation by pairing to mRNA Unknown before 2001

36 Timeline Late 70’s – lin-4 and let-7 regulate developmental timing in worm 1993 – lin-4 codes for a ~22 nt RNA, complementary to 3’ UTR of lin – …. so does let-7 (stRNAs) 2000 – let-7 is conserved in bilaterally symmetric animals 2001 – ~100 miRNAs discovered by cloning in worm, fly and human 2002 – miRNAs conserved in plants 2002 – Science magazine’s breakthrough of the year 2002 – miRNA Registry established 2003 – miRNAs may account for 1% of total gene count in animals 2003 – a few targets of miRNAs identified 2004 – miRNA Registry has 719 miRNAs

37 “miRNA” in PubMed

38 miRNA biogenesis Adapted from DP Bartel, Cell 116: (2004)

39 miRNAs targets DP Bartel, Cell :

40 PNAS 99: (2002)

41 miRNA Registry 3.0 Searchable database of published miRNAs 719 entries from human, mouse, rat, worm, fly, and plants Naming service Pre-publication Unique names for distinct miRNAs Confidentiality for unpublished data

42

43 Genomic context 180 known miRNAs in human 130 intergenic50 intronic 60 polycistronic 70 monocistronic

44 ncRNA gene contexts AAAAAAA tRNA, snRNAs,SRP, RNase P ….. Xist miRNAs miRNAs, snoRNAs

45 Inside-out genes protein

46 Inside-out genes degradation Gas5, UHG, U17HG,U19H snoRNA

47 PrfA 37 o C 25 o C Virulence gene expression Cis-regulatory RNA elements PrfA in Listeria

48 UTR elements in human IRE regulation of iron metabolism SECIS UGA -> SeC Histone 3’ UTR 3’ end formation Vimentin 3’ UTR mRNA localisation CAESAR CTGF repression …. many more

49 ncRNAs in human genome tRNA600 18S rRNA S rRNA200 28S rRNA200 5S rRNA200 snoRNA300 miRNA250 U1 40 U2 30 U4 30 U5 30 U6 20 U4atac 5 U6atac 5 U11 5 U12 5 SRP RNA1 RNase P RNA1 Telomerase RNA1 RNase MRP1 Y RNA 5 Vault4 7SK RNA1 Xist1 H191 BIC1 Antisense RNAs 1000s? Cis reg regions 100s? Others ?

50

51 Summary ncRNA genes …. have diverse and essential roles may be relics of ancient RNA-based life provide major computational challenges are often ignored! >10% of human gene count? Family classifications are useful for …. finding homologues predicting structure allow automatic genome annotation

52 Just plain weird Vault is huge 13 Md 30 x 55 nm Described in proteins MVP TEP1 vPARP vRNA Conserved in higher euks

53

54 Thanks Alex Bateman Mhairi Marshall Simon Moxon Ajay Khanna Sean Eddy Informatics support group Ian Holmes Bjarne Knudsen Robbie Klein David Bartel Tom Tuschl Victor Ambros

55 Bibliography Computational genomics of non-coding RNA genes. Sean R. Eddy, Cell 109: (2002) Non-coding RNAs: the architects of eukaryotic complexity. John S. Mattick, EMBO Reports 2: (2001) MicroRNAs: Genomics, biogenesis, mechanism and function. David P. Bartel, Cell 116: (2004) Rfam: An RNA family database. Sam Griffiths-Jones et al., Nucl. Acids Res. 31: (2003)

56


Download ppt "RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute."

Similar presentations


Ads by Google