Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene architecture and sequence annotation

Similar presentations


Presentation on theme: "Gene architecture and sequence annotation"— Presentation transcript:

1 Gene architecture and sequence annotation
Week 2

2 Last week: How to search genomic databases such as NCBI and ensembl
How to obtain sequence files

3 This week we will learn to identify genetic architecture within sequence files
Sequence of the Cystic Fibrosis Gene: CFTR

4 This week will learn the differences between the two types of Nucleic Acid Sequences
Genomic—the sequence of nucleotides on a chromosome Expressed sequences—the sequence of nucleotides in mRNA/cDNA

5 The expression of genomic information
DNA RNA protein Bioinformatics and Functional Genomics, 2nd Edition. (2014).

6 DNA RNA protein genome transcriptome proteome
Bioinformatics and Functional Genomics, 2nd Edition. (2014).

7 DNA RNA protein phenotype protein sequence databases cDNA ESTs UniGene
genomic DNA databases Bioinformatics and Functional Genomics, 2nd Edition. (2014).

8 Learning Objectives: Understand sequence differences between genomic and expressed sequences Use programs to determine the correct open reading frame (ORF) of an expressed sequence Annotate sequence files

9 Genomic DNA is one source of nucleic acid sequence
Strachan, T. & Read, A.P. Human Molecular Genetics. (New York; Wiley-Liss, 1999).

10 The chemical properties of DNA are important for sequence analysis
Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

11 DNA is composed of two anti-parallel strands
5’ is the beginning of the sequence and 3’ is the end of the sequence DNA sequence is always written with 5’ at the left side and 3’ at the right side Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

12 DNA is composed of two anti-parallel strands
5’ is the beginning of the sequence and 3’ is the end of the sequence DNA sequence is always written with 5’ at the left side and 3’ at the right side Strand 1: 5’ GAT… Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

13 DNA is composed of two anti-parallel strands
5’ is the beginning of the sequence and 3’ is the end of the sequence DNA sequence is always written with 5’ at the left side and 3’ at the right side Strand 1: 5’ GAT… Strand 2: 5’ AGT… Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

14 DNA has strict base pairing rules that determine the sequence of the complementary strand
Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

15 Transcription is the process of making RNA from a DNA template
protein Bioinformatics and Functional Genomics, 2nd Edition. (2014).

16 During transcription and RNA molecule is synthesized from genomic DNA
Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

17 RNA polymerase adds bases to the 3’ end of the growing RNA molecule
Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

18 The rule of complementary base pairing are followed for RNA transcription
During RNA transcription Uridine is added instead of Thymine. Uridine base pairs with Adenine. In Bioinformatics we ignore this fact—all Uridine are written as Thymine. Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

19 The template strand is anti-parallel to the growing mRNA molecule
Template strand= antisense 5’ 3’ Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000). 3’ 5’

20 The template strand is anti-parallel to the growing mRNA molecule
non-template strand = sense strand Template strand= antisense 5’ 3’ This strand has the same sequence as the mRNA molecule 3’ 5’ Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

21 Genes can be found on both strands of a chromosome
Forward strand 5’ 5’ Reverse strand

22 The original RNA molecule undergoes processing that changes the sequence
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

23 The original RNA molecule is processed
Exons are segments of DNA that are found in mature mRNA Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

24 The original RNA molecule is processed
Introns are segments of DNA that are removed through splicing. They are not found in mRNA Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

25 The original RNA molecule is processed
The sequence in red is the coding sequence (often abbreviated CDS) Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

26 The original RNA molecule is processed
The sequence in red is the coding sequence (often abbreviated CDS) Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

27 In the mRNA the exons are joined together as one continuous sequence
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

28 Translation is the process by which an mRNA molecule is used to make a protein
+1 is the first translated nucleotide (usually the A (followed by TG (ATG=Methionine)

29 Translation is the process by which an mRNA molecule is used to make a protein
The red indicates all the sequence within the mRNA that will be used during translation to code for protein

30 The sequences within an mRNA that do not directly code for protein are called Untranslated Regions
5’ UTR- UnTranslated Region before start codon—does not code for protein 3’ UTR- UnTranslated Region after stop codon—does not code for protein

31 mRNA is converted to cDNA using reverse transcription
Alberts, B. et al. Molecular Biology of the Cell (New York; Garland, 1994).

32 Because it is cDNA, not mRNA that is sequenced we use T not U in sequence files
Alberts, B. et al. Molecular Biology of the Cell (New York; Garland, 1994).

33 How do we identify introns/exons in our sequence files?

34 We will use KRAS as an example

35 The KRAS gene produces 4 transcripts (splice variants)
Table

36 This is the transcript diagram for this gene region

37 The Transcript Diagram shows the organization of the transcripts generated from the gene locus

38 Use the link under the “Transcript ID” column identify the exons and introns in a specific transcript

39 The exon/intron map for a specific transcript
The lines are intronic sequence

40 The exon/intron map for a specific transcript
The lines are intronic sequence Bars are exonic sequence: filled bars mean coding sequence and unfilled bars are UTR sequence

41 The exon/intron map for a specific transcript
The number of introns is always the number of exons -1. 5 exons, means 4 introns

42 The RefSeq link will direct you to the NCBI nucleotide record for that gene

43 NCBI nucleotide record

44 NCBI nucleotide record continued

45 NCBI nucleotide record also contains the sequence

46 Every nucleotide within the sequence has an exact position
60 Each nucleotide has a number associated with its position

47 NCBI nucleotide contains the annotation of the sequence

48 The numbers refer to nucleotide positions

49 Viewing features within the sequence file

50 Once you select a sequence feature, the nucleotide sequence of the feature become highlighted

51 CDS stands for coding sequence and this will also show you the translation of the nucleotide sequence into amino acid sequence

52 The genetic code DNA RNA protein
Bioinformatics and Functional Genomics, 2nd Edition. (2014).

53 The genetic code is based on three nucleotides “coding” for one amino acid
Codons Korf, Y., Yandell, M. & Bedell, J. BLAST: an essential Guide to the Basic Local Alignment Search Tool (Sebastopol; O’Reilly, 2003).

54 An Open Reading Frame (ORF) begins with ATG and ends with TAA, TAG or TGA
Korf, Y., Yandell, M. & Bedell, J. BLAST: an essential Guide to the Basic Local Alignment Search Tool (Sebastopol; O’Reilly, 2003).

55 To find the coding sequence you must identify the start and stop codons within the sequence

56 Which start codon is right?

57 Which start codon is right?
The correct ORF is the longest translated sequence

58 Any sequence has 6 possible reading frames
Two strands of DNA Triplet code (three nucleotides in a codon)

59 Any sequence has 6 possible reading frames
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 5’ CGC ATG GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT TAA 3’ FRAME +1 5’ C GCA TGG TCT TAC GCT GGA GCT CTC ATG GAT CGG TTT AA 3’ FRAME +2 5’ CG CAT GGT CTT ACG CTG GAG CTC TCA TGG ATC GGT TTA A 3’ FRAME +3

60 The next three reading frames are based on the reverse complement sequence
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence 5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement

61 Generating the reverse complement sequence
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence 5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement

62 The 6 possible reading frames
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence 5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement 5’ TTA AAC CGA TCC ATG AGA GCT CCA GCG TAA GAC CAT GCG 3’ FRAME -1 5’ T TAA ACC GAT CCA TGA GAG CTC CAG CGT AAG ACC ATG CG 3’ FRAME -2 5’ TT AAA CCG ATC CAT GAG AGC TCC AGC GTA AGA CCA TGC G 3’ FRAME -3

63 The correct reading frame will have the largest ORF
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 5’ CGC ATG GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT TAA 3’ FRAME +1 5’ M V L R W S S H G S V Ter 3’ (amino acids) Always ends with a stop codon Always begins with ATG ATG (M) is the start codon TAA, TAG or TGA are the three stop codons—they do not code for an amino acid

64 Using the ORF-finder program to identify ORFs
Or Google “ORF-finder”

65 Using ORF-finder

66 Using ORF-finder

67 Using ORF-finder

68 Results from ORF-finder

69 There are 6 possible reading frames

70 For our purposes, the largest ORF is the correct one

71 Selecting an ORF gives you the translation

72 ORFs begin with a start codon and end with a stop codon

73 ORF-finder results match with NCBI nucleotide

74 Sequences found in the genomic DNA are removed from the mRNA

75 Sequences found in the genomic DNA are removed from the mRNA
Introns are the sequences that are removed The mature mRNA sequence contains only exonic sequence

76 An mRNA sequence includes 5’UTR, ORF, 3’UTR
Coding sequence (red) 3’ UTR- Untranslated region after stop codon—does not code for protein 5’ UTR- Unstranslated region before start codon—does not code for protein

77 There are 6 possible reading frames in a nucleic acid sequence

78 The correct ORF is usually the largest

79 ORFs start with ATG and end with a stop codon

80 Worksheet


Download ppt "Gene architecture and sequence annotation"

Similar presentations


Ads by Google