Presentation is loading. Please wait.

Presentation is loading. Please wait.

BNFO 615 Data Analysis in Bioinformatics Instructor Zhi Wei.

Similar presentations


Presentation on theme: "BNFO 615 Data Analysis in Bioinformatics Instructor Zhi Wei."— Presentation transcript:

1 BNFO 615 Data Analysis in Bioinformatics Instructor Zhi Wei

2 Outline  Cell  Genome  Gene  mRNA  Proteins  Systems biology

3 Outline  Cell  Genome  Gene  mRNA  Proteins  Systems biology

4 Cells  Fundamental working units of every living system.  Every organism is composed of one of two radically different types of cells: prokaryotic cells or eukaryotic cells.  Prokaryotes and Eukaryotes are descended from the same primitive cell. All extant prokaryotic and eukaryotic cells are the result of a total of 3.5 billion years of evolution.

5 Prokaryotes v.s. Eukaryotes Different Structures Different Components Different biological processes

6 Prokaryotes vs Eukaryotes ProkaryotesEukaryotes Single cellSingle or multi cell No nucleusNucleus No organellesOrganelles One piece of circular DNAChromosomes No mRNA post transcriptional modification Exons/Introns splicing

7 Prokaryotes v.s. Eukaryotes Prokaryotes  bacteria, archaea  Ecoli cell 5X10 6 base pairs > 90% of DNA encode protein 5400 genes Lacks a membrane-bound nucleus. Circular DNA Histones are unknown Eukaryotes  plants, animals, protista, and fungi  Yeast cell 12.4x10 6 base pairs A small fraction of the total DNA encodes protein. Many repeats of non-coding sequences 5800 genes All chromosomes are contained in a membrane bound nucleus DNA is divided between 16 chromosomes A set of five histones: DNA packaging and gene expression regulation

8 Cells chemical composition  Chemical composition -by weight 70% water 7% small molecules  salts  Lipids  amino acids  nucleotides 23% macromolecules  Proteins  Polysaccharides  lipids

9 We have different cells  Cells differ in size, shape and weights  Q: what is the biggest cell in the human body?

10 Cell Cycle  Born, eat, replicate, and die Lodish et al. Molecular Biology of the Cell (5 th ed.). W.H. Freeman & Co., 2003.

11 Sexual Reproduction v.s. Cell Division  Cell Division: Cells reproduce by duplicating their contents and dividing in two.  Sexual Reproduction Formation of new individual by a combination of two haploid sex cells (gametes). Gametes for fertilization usually come from separate parents Both gametes are haploid, with a single set of chromosomes. The new individual is called a zygote, with two sets of chromosomes (diploid). Meiosis is a process to convert a diploid cell to a haploid gamete, and cause a change in the genetic information to increase diversity in the offspring.

12 Meiosis v.s. Mitotic cell division

13 Outline  Cell  Genome  Gene  mRNA  Proteins  Systems biology

14 Genome  A genome is an organism ’ s complete set of DNA (including its genes).  However, in humans less than 3% of the genome actually encodes for genes.  A part of the rest of the genome serves as a control regions (though that ’ s also a small part)  The function of the rest of the genome is unknown (junk DNA? An open question).

15 Comparison of Different Organisms Genome size (bp)Num. of genes E. Coli.05*10 8 5,400 Yeast.12*10 8 5,800 Worm.15* ,400 Fly1.8* ,600 Human30* ,000 Plant1.3* ,000

16 Outline  Cell  Genome  Gene  mRNA  Proteins  Systems biology

17 What is a gene? Genomic DNA Protein coding sequencePromoterTerminator DNA: Deoxyribo Nucleic Acid

18 Example of a Gene: Gal4 DNA ATGAAGCTACTGTCTTCTATCGAACAAGCATGCGATATTTGCCGACTTAAAAAGCTCAAG TGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGTCTGAAGAACAACTGGGAGTGTCGCTAC TCTCCCAAAACCAAAAGGTCTCCGCTGACTAGGGCACATCTGACAGAAGTGGAATCAAGG CTAGAAAGACTGGAACAGCTATTTCTACTGATTTTTCCTCGAGAAGACCTTGACATGATT TTGAAAATGGATTCTTTACAGGATATAAAAGCATTGTTAACAGGATTATTTGTACAAGAT AATGTGAATAAAGATGCCGTCACAGATAGATTGGCTTCAGTGGAGACTGATATGCCTCTA ACATTGAGACAGCATAGAATAAGTGCGACATCATCATCGGAAGAGAGTAGTAACAAAGGT CAAAGACAGTTGACTGTATCGATTGACTCGGCAGCTCATCATGATAACTCCACAATTCCG TTGGATTTTATGCCCAGGGATGCTCTTCATGGATTTGATTGGTCTGAAGAGGATGACATG TCGGATGGCTTGCCCTTCCTGAAAACGGACCCCAACAATAATGGGTTCTTTGGCGACGGT TCTCTCTTATGTATTCTTCGATCTATTGGCTTTAAACCGGAAAATTACACGAACTCTAAC GTTAACAGGCTCCCGACCATGATTACGGATAGATACACGTTGGCTTCTAGATCCACAACA TCCCGTTTACTTCAAAGTTATCTCAATAATTTTCACCCCTACTGCCCTATCGTGCACTCA CCGACGCTAATGATGTTGTATAATAACCAGATTGAAATCGCGTCGAAGGATCAATGGCAA ATCCTTTTTAACTGCATATTAGCCATTGGAGCCTGGTGTATAGAGGGGGAATCTACTGAT ATAGATGTTTTTTACTATCAAAATGCTAAATCTCATTTGACGAGCAAGGTCTTCGAGTCA A sequence of A,C,G,T

19 Example of a Gene: Gal4 AA MKLLSSIEQACDICRLKKLKCSKEKPKCAKCLKNNWECRYSPKTKRSPLTRAHLTEVESR LERLEQLFLLIFPREDLDMILKMDSLQDIKALLTGLFVQDNVNKDAVTDRLASVETDMPL TLRQHRISATSSSEESSNKGQRQLTVSIDSAAHHDNSTIPLDFMPRDALHGFDWSEEDDM SDGLPFLKTDPNNNGFFGDGSLLCILRSIGFKPENYTNSNVNRLPTMITDRYTLASRSTT SRLLQSYLNNFHPYCPIVHSPTLMMLYNNQIEIASKDQWQILFNCILAIGAWCIEGESTD IDVFYYQNAKSHLTSKVFESGSIILVTALHLLSRYTQWRQKTNTSYNFHSFSIRMAISLG LNRDLPSSFSDSSILEQRRRIWWSVYSWEIQLSLLYGRSIQLSQNTISFPSSVDDVQRTT TGPTIYHGIIETARLLQVFTKIYELDKTVTAEKSPICAKKCLMICNEIEEVSRQAPKFLQ MDISTTALTNLLKEHPWLSFTRFELKWKQLSLIIYVLRDFFTNFTQKKSQLEQDQNDHQS YEVKRCSIMLSDAAQRTVMSVSSYMDNHNVTPYFAWNCSYYLFNAVLVPIKTLLSNSKSN AENNETAQLLQQINTVLMLLKKLATFKIQTCEKYIQVLEEVCAPFLLSQCAIPLPHISYN NSNGSAIKNIVGSATIAQYPTLPEENVNNISVKYVSPGSVGPSPVPLKSGASFSDLVKLL SNRPPSRNSPVTIPRSTPSHRSVTPFLGQQQQLQSLVPLTPSALFGGANFNQSGNIADSS A sequence of 20 amino acids {A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V}

20 The Central Dogma

21 DNA  RNA: Gene Transcription promoter 3’ 5’ 3’ G A T T A C A... C T A A T G T...

22 Gene Transcription transcription factor, binding site, RNA polymerase 3’ 5’ 3’ Transcription factors recognize transcription factor binding sites and bind to them, forming a complex. RNA polymerase binds the complex. G A T T A C A... C T A A T G T...

23 Gene Transcription 3’ 5’ 3’ The two strands are separated G A T T A C A... C T A A T G T...

24 Gene Transcription 3’ 5’ 3’ An RNA copy of the 5’ → 3’ sequence is created from the 3’ → 5’ template G A T T A C A... C T A A T G T... G A U U A C A

25 Gene Transcription 3’ 5’ 3’ G A U U A C A... G A T T A C A... C T A A T G T... pre-mRNA5’3’

26 RNA Processing (Eukaryotes) 5’ cap, polyadenylation, exon, intron, splicing, UTR 5’ cap poly(A) tail intron exon mRNA 5’ UTR3’ UTR

27 Mammalian Gene Structure 5’3’ promoter 5’ UTR exons3’ UTR introns coding non-coding Only 1.5% DNA for coding in Human!  Regulatory regions: up to 50 kb upstream of +1 site  Exons: protein coding and untranslated regions (UTR) 1 to 178 exons per gene (mean 8.8) 8 bp to 17 kb per exon (mean 145 bp)  Introns:splice acceptor (GU) and donor (AG) sites, junk DNA average 1 kb – 50 kb per intron  Gene size:Largest – 2.4 Mb (Dystrophin). Mean – 27 kb.

28 Identifying Genes in Sequence Data  Predicting the start and end of genes as well as the introns and exons in each gene is one of the basic problems in computational biology.  Gene prediction methods look for ORFs (Open Reading Frame).  These are (relatively long) DNA segments that start with the start codon, end with one of the end codons, and do not contain any other end codon in between.  Splice site prediction has received a lot of attention in the literature.  Comparative genomics

29 Outline  Cell  Genome  Gene  mRNA  Proteins  Systems biology

30 RNA  RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by U(racil)  Some forms of RNA can form secondary structures by “ pairing up ” with itself. This can have change its propertiesdramatically. DNA and RNA can pair with each other. linear and 3D view:

31 RNA, continued  Several types exist, classified by function  mRNA – this is what is usually being referred to when a Bioinformatician says “ RNA ”. This is used to carry a gene ’ s message out of the nucleus.  tRNA – transfers genetic information from mRNA to an amino acid sequence  rRNA – ribosomal RNA. Part of the ribosome which is involved in translation.

32 Messenger RNA  Basically, an intermediate product  Transcribed from the genome and translated into protein  Number of copies correlates well with number of proteins for the gene.  Unlike DNA, the amount of messenger RNA (as well as the number of proteins) differs between different cell types and under different conditions.

33 Complementary base-pairing  mRNA is transcribed from the DNA  mRNA (like DNA, but unlike proteins) binds to its complement

34 Quantify mRNA levels

35 Outline  Cell  Genome  Gene  mRNA  Proteins  Systems biology

36 Proteins: Workhorses of the Cell  Proteins are polypeptide chains of amino acids.  20 different amino acids different chemical properties cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell.  Proteins do all essential work for the cell build cellular structures digest nutrients execute metabolic functions Mediate information flow within a cell and among cellular communities.

37 Genes Make Proteins  genome-> genes ->protein(forms cellular structural & life functional)->pathways & physiology

38 Genes Encode for Proteins UCAG U UUU Phenylalanine (Phe)UCU Serine (Ser)UAU Tyrosine (Tyr)UGU Cysteine (Cys)U UUC PheUCC SerUAC TyrUGC CysC UUA Leucine (Leu)UCA SerUAA STOPUGA STOPA UUG LeuUCG SerUAG STOPUGG Tryptophan (Trp)G C CUU Leucine (Leu)CCU Proline (Pro)CAU Histidine (His)CGU Arginine (Arg)U CUC LeuCCC ProCAC HisCGC ArgC CUA LeuCCA ProCAA Glutamine (Gln)CGA ArgA CUG LeuCCG ProCAG GlnCGG ArgG A AUU Isoleucine (Ile)ACU Threonine (Thr)AAU Asparagine (Asn)AGU Serine (Ser)U AUC IleACC ThrAAC AsnAGC SerC AUA IleACA ThrAAA Lysine (Lys)AGA Arginine (Arg)A AUG Methionine (Met) or STARTACG ThrAAG LysAGG ArgG G GUU Valine (Val)GCU Alanine (Ala)GAU Aspartic acid (Asp)GGU Glycine (Gly)U GUC ValGCC AlaGAC AspGGC GlyC GUA ValGCA AlaGAA Glutamic acid (Glu)GGA GlyA GUG ValGCG AlaGAG GluGGG GlyG Second letter First letter Third letter Triplet  one Amino Acid 4^3 combinations mapped to 20 Amino Acids

39 Open Reading Frames G C U U G U U U A C G A A U U A G

40 Synonymous Mutation G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U U A C G A A U U A G G G C U U G U U U G C G A A U U A G Ala Cys Leu Arg Ile

41 Missense Mutation G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U U A C G A A U U A G G G C U U G G U U A C G A A U U A G Ala Trp Leu Arg Ile

42 Nonsense Mutation G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U U A C G A A U U A G A G C U U G A U U A C G A A U U A G Ala STOP

43 Frameshift G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U U A C G A A U U A G G C U U G U U A C G A A U U A G Ala Cys Tyr Glu Leu

44 Protein Structure  Proteins work together with other proteins or nucleic acids as "molecular machines" structures fit together and function in highly specific, lock-and-key ways.  Four levels of structure: Primary Structure: The sequence of the protein Secondary structure: Local structure in regions of the chain. (alpha helix, beta sheet) Tertiary Structure: Three dimensional structure Quaternary Structure: multiple subunits

45 Assigning Function to Proteins  While genes have been identified in the human genome, relatively few have known functional annotation.  Determining the function of the protein can be done in several ways. Sequence similarity to other (known) proteins Using domain information Using three dimensional structure Based on high throughput experiments (when does it functions and who it interacts with)

46 Summary: DNA(Gene)  RNA  Protein TranslationTranscription Replication

47 Outline  Cell  Genome  Gene  mRNA  Proteins  Systems biology

48 Biological pathway/gene networks  Instead of having brains, cells make decision through complex networks of interactions, called pathways Synthesize new materials Break other materials down for spare parts Signal to eat or die  In order to fulfill their function, proteins interact with other proteins in a number of ways including: Regulation Signaling Pathways, for example A -> B -> C Post translational modifications Forming protein complexes

49 An Example

50 Systems Biology  We now have many sources of data, each providing a different view on the activity in the cell Sequence (genes) DNA motifs Gene expression Protein interactions Protein-DNA interaction Etc.  Putting it all together: Systems Biology

51 Next week  Introduction to R programming You need to do in-class exercises

52 Acknowledgments  Ziv Bar-Joseph: for some of the slides adapted or modified from his lecture slides at Carnegie Mellon University  Neil Jones: for some of the slides adapted or modified from his slides for the book An Introduction to Bioinformatics Algorithms


Download ppt "BNFO 615 Data Analysis in Bioinformatics Instructor Zhi Wei."

Similar presentations


Ads by Google