DNA/RNA Protein Expression Interaction

Slides:



Advertisements
Similar presentations
Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China.
Advertisements

The genetic code.
DNA Function genetic information –how to build/grow, operate, and repair cells –Specifically how and when to make proteins passed from one cell generation.
 -GLOBIN MUTATIONS AND SICKLE CELL DISORDER (SCD) - RESTRICTION FRAGMENT LENGTH POLYMORPHISMS (RFLP)
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
Gene Mutations Worksheet
Transcription & Translation Worksheet
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Sequence analysis How to locate rare/important sub- sequences.
Introduction to Molecular Biology. G-C and A-T pairing.
1 Essential Computing for Bioinformatics Bienvenido Vélez UPR Mayaguez Lecture 5 High-level Programming with Python Part II: Container Objects Reference:
 Genetic information, stored in the chromosomes and transmitted to the daughter cells through DNA replication is expressed through transcription to RNA.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Figure S1. Sequence alignment of yeast and horse cyt-c (Identity~60%), green highly conserved residues. There are 40 amino acid differences in the primary.
Dictionaries.
GENE MUTATIONS aka point mutations. DNA sequence ↓ mRNA sequence ↓ Polypeptide Gene mutations which affect only one gene Transcription Translation © 2010.
C HAPTER 11: DNA AND G ENES 11-1 – DNA: The Molecule of Heredity.
IGEM Arsenic Bioremediation Possibly finished biobrick for ArsR by adding a RBS and terminator. Will send for sequencing today or Monday.
Nature and Action of the Gene
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Gene Prediction in silico Nita Parekh BIRC, IIIT, Hyderabad.
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Computational Biology, Part A More on Sequence Operations Robert F. Murphy Copyright  1997, All rights reserved.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
Fig. S1 siControl E2 G1: 45.7% S: 26.9% G2-M: 27.4% siER  E2 G1: 70.9% S: 9.9% G2-M: 19.2% G1: 57.1% S: 12.0% G2-M: 30.9% siRNF31 E2 A B siRNF31 siControl.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
TRANSLATION: information transfer from RNA to protein the nucleotide sequence of the mRNA strand is translated into an amino acid sequence. This is accomplished.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
NSCI 314 LIFE IN THE COSMOS 4 - The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB
Prodigiosin Production in E. Coli Brian Hovey and Stephanie Vondrak.
Passing Genetic Notes in Class CC106 / Discussion D by John R. Finnerty.
DNA – RNA – Protein Synthesis
Supplementary materials
Dictionaries. A “Good morning” dictionary English: Good morning Spanish: Buenas días Swedish: God morgon German: Guten morgen Venda: Ndi matscheloni Afrikaans:
DNA & GENES. What is DNA?  DNA (deoxyribonucleic acid) is a nucleic acid  It is composed of smaller units called nucleotides  These are:  A, T, C,
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
Structure and Function of DNA DNA Replication and Protein Synthesis.
 Molecules of DNA are composed of long chains of _______.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Topic: Replication of DNA Standard: Explain the role of DNA in storing and transmitting cellular information.
Name of presentation Month 2009 SPARQ-ed PROJECT Mutations in the tumor suppressor gene p53 Pulari Thangavelu (PhD student) April Chromosome Instability.
DNA, RNA and Protein.
ORF Calling.
Protein Synthesis DNA RNA Protein.
Modelling Proteomes.
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Python.
Supplemental Table 3. Oligonucleotides for qPCR
GENE MUTATIONS aka point mutations © 2016 Paul Billiet ODWS.
Review Sheet: DNA, RNA & Protein Synthesis
Supplementary Figure 1 – cDNA analysis reveals that three splice site alterations generate multiple RNA isoforms. (A) c.430-1G>C (IVS 6) results in 3.
Huntington Disease (HD)
DNA By: Mr. Kauffman.
DNA and RNA.
Gene architecture and sequence annotation
More on translation.
Molecular engineering of photoresponsive three-dimensional DNA
April 19, 2011 What is the whole goal of a cell?
Fundamentals of Protein Structure
Python.
Station 2 Protein Synethsis.
6.096 Algorithms for Computational Biology Lecture 2 BLAST & Database Search Manolis Piotr Indyk.
Are we done yet? Answer: Almost.
Shailaja Gantla, Conny T. M. Bakker, Bishram Deocharan, Narsing R
Presentation transcript:

DNA/RNA Protein Expression Interaction Molecular Data DNA/RNA Protein Expression Interaction

A sequence A sequence is a linear set of characters (sequence elements) representing nucleotides or amino acids http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

Character representation of sequences DNA or RNA use 1-letter codes (e.g., A,C,G,T) protein use 1-letter codes can convert to/from 3-letter codes http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

The I.U.B. Code proposed by International Union of Biochemistry A, C, G, T, U R = A, G (puRine) Y = C, T (pYrimidine) S = G, C (Strong hydrogen bonds) W = A, T (Weak hydrogen bonds) M = A, C (aMino group) K = G, T (Keto group) B = C, G, T (not A) D = A, G, T (not C) H = A, C, T (not G) V = A, C, G (not T/U) N = A, C, G, T/U (iNdeterminate) X or - are sometimes used

DNA code Amino Acid Abbreviation DNA Codons Alanine Ala GCA, GCC, GCG, GCT Cysteine Cys TGC, TGT Aspartic Acid Asp GAC, GAT Glutamic Acid Glu GAA, GAG Phenylalanine Phe TTC, TTT Glycine Gly GGA, GGC, GGG, GGT Histidine His CAC, CAT Isoleucine Ile ATA, ATC, ATT Lysine Lys AAA, AAG Leucine Leu TTA, TTG, CTA, CTC, CTG, CTT Methionine Met ATG Asparagine Asn AAC, AAT Proline Pro CCA, CCC, CCG, CCT Glutamine Gln CAA, CAG Arginine Arg CGA, CGC, CGG, CGT Serine Ser TCA, TCC, TCG, TCT, AGC, AGT Threonine Thr ACA, ACC, ACG, ACT Valine Val GTA, GTC, GTG, GTT Tryptophan Trp TGG Tyrosine Tyr TAC, TAT Stop . TAA, TAG, TGA

Fasta format >gi|17978494|ref|NM_078467.1| Homo sapiens cyclin-dependent kinase inhibitor AGCTGAGGTGTGAGCAGCTGCCGAAGTCAGTTCCTTGTGGAGCCGGAGCTGGGCGCGGATTCGCCGAGGC ACCGAGGCACTCAGAGGAGGTGAGAGAGCGGCGGCAGACAACAGGGGACCCCGGGCCGGCGGCCCAGAGC CGAGCCAAGCGTGCCCGCGTGTGTCCCTGCGTGTCCGCGAGGATGCGTGTTCGCGGGTGTGTGCTGCGTT CACAGGTGTTTCTGCGGCAGGCGCCATGTCAGAACCGGCTGGGGATGTCCGTCAGAACCCATGCGGCAGC AAGGCCTGCCGCCGCCTCTTCGGCCCAGTGGACAGCGAGCAGCTGAGCCGCGACTGTGATGCGCTAATGG CGGGCTGCATCCAGGAGGCCCGTGAGCGATGGAACTTCGACTTTGTCACCGAGACACCACTGGAGGGTGA CTTCGCCTGGGAGCGTGTGCGGGGCCTTGGCCTGCCCAAGCTCTACCTTCCCACGGGGCCCCGGCGAGGC CGGGATGAGTTGGGAGGAGGCAGGCGGCCTGGCACCTCACCTGCTCTGCTGCAGGGGACAGCAGAGGAAG ACCATGTGGACCTGTCACTGTCTTGTACCCTTGTGCCTCGCTCAGGGGAGCAGGCTGAAGGGTCCCCAGG TGGACCTGGAGACTCTCAGGGTCGAAAACGGCGGCAGACCAGCATGACAGATTTCTACCACTCCAAACGC CGGCTGATCTTCTCCAAGAGGAAGCCCTAATCCGCCCACAGGAAGCCTGCAGTCCTGGAAGCGCGAGGGC CTCAAAGGCCCGCTCTACATCTTCTGCCTTAGTCTCAGTTTGTGTGTCTTAATTATTATTTGTGTTTTAA TTTAAACACCTCCTCATGTACATACCCTGGCCGCCCCCTGCCCCCCAGCCTCTGGCATTAGAATTATTTA AACAAAAACTAGGCGGTTGAATGAGAGGTTCCTAAGAGTGCTGGGCATTTTTATTTTATGAAATACTATT TAAAGCCTCCTCATCCCGTGTTCTCCTTTTCCTCTCTCCCGGAGGTTGGGTGGGCCGGCTTCATGCCAGC TACTTCCTCCTCCCCACTTGTCCGCTGGGTGGTACCCTCTGGAGGGGTGTGGCTCCTTCCCATCGCTGTC ACAGGCGGTTATGAAATTCACCCCCTTTCCTGGACACTCAGACCTGAATTCTTTTTCATTTGAGAAGTAA ACAGATGGCACTTTGAAGGGGCCTCACCGAGTGGGGGCATCATCAAAAACTTTGGAGTCCCCTCACCTCC TCTAAGGTTGGGCAGGGTGACCCTGAAGTGAGCACAGCCTAGGGCTGAGCTGGGGACCTGGTACCCTCCT GGCTCTTGATACCCCCCTCTGTCTTGTGAAGGCAGGGGGAAGGTGGGGTCCTGGAGCAGACCACCCCGCC TGCCCTCATGGCCCCTCTGACCTGCACTGGGGAGCCCGTCTCAGTGTTGAGCCTTTTCCCTCTTTGGCTC CCCTGTACCTTTTGAGGAGCCCCAGCTACCCTTCTTCTCCAGCTGGGCTCTGCAATTCCCCTCTGCTGCT GTCCCTCCCCCTTGTCCTTTCCCTTCAGTACCCTCTCAGCTCCAGGTGGCTCTGAGGTGCCTGTCCCACC CCCACCCCCAGCTCAATGGACTGGAAGGGGAAGGGACACACAAGAAGAAGGGCACCCTAGTTCTACCTCA GGCAGCTCAAGCAGCGACCGCCCCCTCCTCTAGCTGTGGGGGTGAGGGTCCCATGTGGTGGCACAGGCCC CCTTGAGTGGGGTTATCTCTGTGTTAGGGGTATATGATGGGGGAGTAGATCTTTCTAGGAGGGAGACACT GGCCCCTCAAATCGTCCAGCGACCTTCCTCATCCACCCCATCCCTCCCCAGTTCATTGCACTTTGATTAG CAGCGGAACAAGGAGTCAGACATTTTAAGATGGTGGCAGTAGAGGCTATGGACAGGGCATGCCACGTGGG CTCATATGGGGCTGGGAGTAGTTGTCTTTCCTGGCACTAACGTTGAGCCCCTGGAGGCACTGAAGTGCTT AGTGTACTTGGAGTATTGGGGTCTGACCCCAAACACCTTCCAGCTCCTGTAACATACTGGCCTGGACTGT TTTCTCTCGGCTCCCCATGTGTCCTGGTTCCCGTTTCTCCACCTAGACTGTAAACCTCTCGAGGGCAGGG ACCACACCCTGTACTGTTCTGTGTCTTTCACAGCTCCTCCCACAATGCTGAATATACAGCAGGTGCTCAA TAAATGATTCTTAGTGACTTTAAAAAAAAAAAAAAAAAAAA

Sequence Content Mononucleotide frequencies Dinucleotide frequencies GC content Dinucleotide frequencies CpG islands

GC content is non-random Lander et al

GC content and expression

Determining mononucleotide frequencies Alphabet: A T C G Count how many times each nucleotide appears in sequence Divide (normalize) by total number of nucleotides fA  mononucleotide frequency of A (frequency that A is observed) pAmononucleotide probability that a nucleotide will be an A http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

Determining dinucleotide frequencies Make 4 x 4 matrix, one element for each ordered pair of nucleotides Set all elements to zero Go through sequence linearly, adding one to matrix entry corresponding to the pair of sequence elements observed at that position Divide by total number of dinucleotides fAC  dinucleotide frequency of AC (frequency that AC is observed out of all dinucleotides) http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

Dinucleotide counts A T C G ATTCGACCAGAG Create a 4 x 4 matrix Set all cells to zeros Use a window of size 2 and add 1 to each cell of the matrix when encountering the specified dinucleotide A T C G ATTCGACCAGAG

Dinucleotide counts A T C G 1 2 ATTCGACCAGAG

Observed and expected frequencies http://www.maths.lth.se/bioinformatics/publications/BasicE_2005.pdf

Observed and expected frequencies http://www.maths.lth.se/bioinformatics/publications/BasicE_2005.pdf

Dinucleotide frequencies in genome http://www.lapcs.univ-lyon1.fr/~piau/mps/Poster-CpG.pdf

Sequence features A sequence feature is a pattern that is observed to occur in more than one sequence and (usually) to be correlated with some function http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

Sequence features promoters transcription initiation sites transcription termination sites polyadenylation sites ribosome binding sites protein features http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

Consensus sequences A consensus sequence is a sequence that summarizes or approximates the pattern observed in a group of aligned sequences containing a sequence feature Consensus sequences are regular expressions http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

Occurences Basic Algorithm Example: recognition site for a restriction enzyme EcoRI recognizes GAATTC AccI recognizes GTMKAC Basic Algorithm Start with first character of sequence to be searched See if enzyme site matches starting at that position Advance to next character of sequence to be searched Repeat previous two steps until all positions have been tested http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

Statistics of pattern appearance Goal: Determine the significance of observing a feature (pattern) Method: Estimate the probability that a pattern would occur randomly in a given sequence. Three different methods Assume all nucleotides are equally frequent Use measured frequencies of each nucleotide (mononucleotide frequencies) Use measured frequencies with which a given nucleotide follows another (dinucleotide frequencies) http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

Example 1 What is the probability of observing the sequence feature ART (A followed by a purine, either A or G, followed by a T)? Using observed mononucleotide frequencies: pART = pA (pA + pG) pT Using equal mononucleotide frequencies pA = pC = pG = pT = 1/4 pART = 1/4 * (1/4 + 1/4) * 1/4 = 1/32 http://www.cmu.edu/bio/education/courses/03310/LectureNotes/

Example 1: using mononucleotide frequencies Using equal mononucleotide frequencies pA = pC = pG = pT = 1/4 pART = 1/4 * (1/4 + 1/4) * 1/4 = 1/32 Using observed mononucleotide frequencies: pART = pA (pA + pG) pT

Example 1: using dinucleotide frequencies pART=pA(p*AAp*AT+p*AGp*GT)

Example 2: What is the probability of observing the sequence feature ARYT (A followed by a purine {either A or G}, followed by a pyrimidine {either C or T}, followed by a T)? Using equal mononucleotide frequencies pA = pC = pG = pT = 1/4 pARYT = 1/4 * (1/4 + 1/4) * (1/4 + 1/4) * 1/4 = 1/64 http://www.cmu.edu/bio/education/courses/03310/LectureNotes/