Bio-Medical Informatics

Slides:



Advertisements
Similar presentations
Computational Methods in Systems Biology
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
MicroArray Image Analysis Robin Liechti
Introduction to molecular biology. Subjects overview Investigate how cells organize their DNA within the cell nucleus, and replicate it during cell division.
(Please study textbook, notes and hand-outs)
Tutorial 1 Biology background for the course. Genome sizes and number of genes OrganismGenome SizeNo. of genes E. coli4.6 Mb~4,300 genes Baker’s Yeast12.
MicroArray Image Analysis
MicroArray Image Analysis Robin Liechti
Prof. Drs. Sutarno, MSc., PhD.. Biology is Study of Life Molecular Biology  Studying life at a molecular level Molecular Biology  modern Biology The.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Introduction to Bioinformatics Yana Kortsarts Bob Morris.
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
DNA and Gene Expression. DNA Deoxyribonucleic Acid Deoxyribonucleic Acid Double helix Double helix Carries genetic information Carries genetic information.
Image Quantitation in Microarray Analysis More tomorrow...
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
ECE 501 Introduction to BME
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
1 Gene Predictor Date:20/11/2003 Implemented By: Zohar Idelson Supervisor: Dr. Yizhar Lavner Winter - Summer 2003.
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
Introduction to Molecular Biology. G-C and A-T pairing.
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
RNA Ribonucleic Acid.
Image Quantitation in Microarray Analysis More tomorrow...
Elements of Molecular Biology All living things are made of cells All living things are made of cells Prokaryote, Eukaryote Prokaryote, Eukaryote.
Lesson Overview 13.1 RNA.
CSE 6406: Bioinformatics Algorithms. Course Outline
Replication, Transcription and Translation
DNA.
Intelligent Systems for Bioinformatics Michael J. Watts
CHMI E.R. Gauthier, Ph.D. 1 CHMI 2227E Biochemistry I Gene expression.
Today: Genetic Technology Wrap-up Exam Review Remember: Final Exam is Wednesday, 12/13 at 1 pm!
Microarray Technology
From DNA to Protein Chapter DNA, RNA, and Gene Expression  What is genetic information and how does a cell use it?
Introduction to Molecular Biology, Genetics and Genomics References: Lecture 1.
RNA and Protein Synthesis
RNA AND PROTEIN SYNTHESIS RNA vs DNA RNADNA 1. 5 – Carbon sugar (ribose) 5 – Carbon sugar (deoxyribose) 2. Phosphate group Phosphate group 3. Nitrogenous.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
RNA and Protein Synthesis
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
Sevas Educational Society All Rights Reserved, 2008 Module 1 Introduction to Bioinformatics.
BDC331 Conservation Genetics 2015 Mr. Adriaan Engelbrecht Department of Biodiversity and Conservation Biology New Life Sciences Building Core 2, Room
WMU CS 6260 Parallel Computations II Spring 2013 Presentation #1 about Semester Project Feb/18/2013 Professor: Dr. de Doncker Name: Sandino Vargas Xuanyu.
 The central concept in biology is:  DNA determines what protein is made  RNA takes instructions from DNA  RNA programs the production of protein.
Comparative genomics Haixu Tang School of Informatics.
Brief Overview of Macromolecules DNA, RNA, and Proteins.
Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.
Bailee Ludwig Quality Management. Before we get started…. ….Let’s see what you know about Genomics.
DNA in the Cell Stored in Number of Chromosomes (24 in Human Genome) Tightly coiled threads of DNA and Associated Proteins: Chromatin 3 billion bp in Human.
CHAPTER 13 RNA and Protein Synthesis. Differences between DNA and RNA  Sugar = Deoxyribose  Double stranded  Bases  Cytosine  Guanine  Adenine 
Protein Synthesis Review By PresenterMedia.com PresenterMedia.com.
Replication, Transcription and Translation. Griffith’s Experiment.
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
BASIC GENETICS, COMMON TO ALL LIVING THINGS GENOME NUCLEOTIDES CHROMOSOME GENE DNA MUTATION NATURAL SELECTION.
Finding genes in the genome
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
DNA: The Genetic Material Molecular Genetics Section 1 Griffith  Performed the first major experiment that led to the discovery of DNA as the genetic.
Introduction to molecular biology Data Mining Techniques.
Molecular Biology Lecture 1 Introduction to Molecular Biology.
8.2 KEY CONCEPT DNA structure is the same in all organisms.
Molecular Genetics Transcription & Translation
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
1st lesson Medical students Medical Biology Molecular Biology
EL: To find out what a genome is and how gene expression is regulated
Ab initio gene prediction
Genomes and Their Evolution
The Study of Biological Information
Presentation transcript:

Bio-Medical Informatics Instructor : Hanif Yaghoobi Website: site444703.44.webydo.com E-mail : Hyiautcourse@gmail.com My personal Mail: hanifeyaghoobi@gmail.com

About this Course Activities during the semester 5 score: 1)Home Works 2) MATLAB exercises Your Final Projects 3 score Final Exam 12 score

Shortliffe “ Medical informatics is the rapidly developing scientific field that deals with resources, devices and formalized methods for optimizing the storage, retrieval and management of biomedical information for problem solving and decision making” Edward Shortliffe, MD, PhD 1995

Organisms Classified into two types: Eukaryotes: contain a membrane-bound nucleus and organelles (plants, animals, fungi,…) Prokaryotes: lack a true membrane-bound nucleus and organelles (single-celled, includes bacteria) Not all single celled organisms are prokaryotes!

Cells 60 trillion cells 320 cell types Complex system enclosed in a membrane Organisms are unicellular (bacteria, baker’s yeast) or multicellular Humans: 60 trillion cells 320 cell types Example Animal Cell www.ebi.ac.uk/microarray/ biology_intro.htm

DNA Basics – cont. DNA in Eukaryotes is organized in chromosomes.

Chromosomes In eukaryotes, nucleus contains one or several double stranded DNA molecules orgainized as chromosomes Humans: 22 Pairs of autosomes 1 pair sex chromosomes Human Karyotype http://avery.rutgers.edu/WSSP/StudentScholars/ Session8/Session8.html

www.biotec.or.th/Genome/whatGenome.html

What is DNA? DNA: Deoxyribonucleic Acid Single stranded molecule (oligomer, polynucleotide) chain of nucleotides 4 different nucleotides: Adenosine (A) Cytosine (C) Guanine (G) Thymine (T)

Nucleotide Bases Purines (A and G) Pyrimidines (C and T) Difference is in base structure Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm

DNA

The Central DogmaProtein Synthesis Transcription Translation Cell Function Genome Transcriptome Proteome Gene Expression Level

Genome chromosomal DNA of an organism number of chromosomes and genome size varies quite significantly from one organism to another Genome size and number of genes does not necessarily determine organism complexity

Genome Comparison ORGANISM CHROMOSOMES GENOME SIZE GENES Homo sapiens (Humans) 23 3,200,000,000 ~ 30,000 Mus musculus (Mouse) 20 , 2600,000,000 ~30,000 Drosophila melanogaster (Fruit Fly) 4 180,000,000 ~18,000 Saccharomyces cerevisiae (Yeast) 16 14,000,000 ~6,000 Zea mays (Corn) 10 2,400,000,000 ???

DNA Basics – cont. The DNA in each chromosome can be read as a discrete signal to {a,t,c,g}. (For example: atgatcccaaatggaca…)

DNA Basics – cont. In genes (protein-coding region), during the construction of proteins by amino acids, these nucleotides (letters) are read as triplets (codons). Every codon signals one amino acid for the protein synthesis (there are 20 aa).

DNA Basics – cont. There are 6 ways of translating DNA signal to codons signal, called the reading frames (3 * 2 directions). …CATTGCCAGT…

DNA Basics – Cont. …CATTGCCAGT… Start: ATG Stop: TAA, TGA, TAG Exon gene Exon Intron

Understanding Genome Sequences ~3,289,000,000 characters: aattgtgctctgcaaattatgatagtgatctgtatttactacgtgcatat attttgggccagtgaatttttttctaagctaatatagttatttggacttt tgacatgactttgtgtttaattaaaacaaaaaaagaaattgcagaagtgt tgtaagcttgtaaaaaaattcaaacaatgcagacaaatgtgtctcgcagt cttccactcagtatcatttttgtttgtaccttatcagaaatgtttctatg tacaagtctttaaaatcatttcgaacttgctttgtccactgagtatatta tggacatcttttcatggcaggacatatagatgtgttaatggcattaaaaa taaaacaaaaaactgattcggccgggtacggtggctcacgcctgtaatcc cagcactttgggagatcgaggagggaggatcacctgaggtcaggagttac agacatggagaaaccccgtctctactaaaaatacaaaattagcctggcgt ggtggcgcatgcctgtaatcccagctactcgggaggctgaggcaggagaa tcgcttgaacccgggagcggaggttgcggtgagccgagatcgcaccgttg cactccagcctgggcgacagagcgaaactgtctcaaacaaacaaacaaaa aaacctgatacatggtatgggaagtacattgtttaaacaatgcatggaga tttaggttgtttccagtttttactggcacagatacggcaatgaatataat tttatgtatacattcatacaaatatatcggtggaaaattcctagaagtgg aatggctgggtcagtgggcattcatattgagaaattggaaggatgttgtc aaactctgcaaatcagagtattttagtcttaacctctcttcttcacaccc ttttccttggaagaaagctaaatttagacttttaaacacaaaactccatt ttgagacccctgaaaatctgggttcaaagtgtttgaaaattaaagcagag gctttaatttgtacttatttaggtataatttgtactttaaagttgttcca . . . Goal: Identify components encoded in the DNA sequence

Open Reading Frame ATGCTCAGCGTGACCTCA . . . CAGCGTTAA M L S V T S . . . Q R STP Protein-encoding DNA sequence consists of a sequence of 3 letter codons Starts with the START codon (ATG) Ends with a STOP codon (TAA, TAG, or TGA)

Finding Open Reading Frames ATGCTCAGCGTGACCTCA . . . CAGCGTTAA M L S V T S . . . Q R STP Try all possible starting points 3 possible offsets 2 possible strands Simple algorithm finds all ORFs in a genome Many of these are spurious (are not real genes) How do we focus on the real ones?

Using Additional Genomes Basic premise “What is important is conserved” Evolution = Variation + Selection Variation is random Selection reflects function Idea: Instead of studying a single genome, compare related genomes A real open reading frame will be conserved

Phylogentic Tree of Yeasts S. cerevisiae S. paradoxus S. mikatae S. bayanus C. glabrata S. castellii K. lactis A. gossypii K. waltii D. hansenii C. albicans Y. lipolytica N. crassa M. graminearum M. grisea A. nidulans S. pombe ~10M years Kellis et al, Nature 2003

Evolution of Open Reading Frame S. cerevisiae S. paradoxus S. mikatae S. bayanus ATGCTCAGCGTGACCTCA . . . ATGCTCAGCGTGACATCA . . . ATGCTCAGGGTGACA--A . . . ATGCTCAGG---ACA--A . . . Conserved positions Frame shift changes interpretation of downstream seq Variable positions A deletion

Examples Spurious ORF Confirmed ORF Frame shift ATG not conserved Variable Frame shift Spurious ORF ATG not conserved Confirmed ORF Greedy algorithm to find conserved ORFs surprisingly effective (> 99% accuracy) on verified yeast data Sequencing error [Kellis et al, Nature 2003]

Defining Conservation Conserved Variable Naïve approach Consensus between all species Problem: Rough grained Ignores distances between species Ignores the tree topology Goal: More sensitive and robust methods A A C G T A C C A % conserv 100 33 55 55

Bioinformatics – an area of emerging knowledge Each cell of the body contains the whole DNA of the individual (about 40,000 genes in the human genome, each of them comprising from 50 to a mln base pairs – A,T,C or G) The Main Dogma in Genetics: DNA->RNA->proteins Transcription: DNA (about 5%) -> mRNA DNA -> pre-RNA -> splicing -> mRNA (only the exons) Translation: mRNA -> proteins Proteins make cells alive and specialised (e.g. blue eyes) Genome -> proteome N.Kasabov, 2003

Bioinformatics The area of Science that is concerned with the development and applications of methods, tools and systems for storing and processing of biological information to facilitate knowledge discovery. Interdisciplinary: Information and computer science, Molecular Biology, Biochemistry, Genetics, Physics, Chemistry, Health and Medicine, Mathematics and Statistics, Engineering, Social Sciences. Biology, Medicine -- Information Science --> IT, Clinics, Pharmacy, I____________________I Links to Health informatics, Clinical DSS, Pharmaceutical Industry N.Kasabov, 2003

Bioinformatics: challenging problems for computer and information sciences Discovering patterns (features) from DNA and RNA sequences (e.g. genes, promoters, RBS binding sites, splice junctions) Analysis of gene expression data and predicting protein abundance Discovering of gene networks – genes that are co-regulated over time Protein discovery and protein function analysis Predicting the development of an organism from its DNA code (?) Modeling the full development (metabolic processes) of a cell (?) Implications: health; social,… N.Kasabov, 2003

Problems in Computational Modeling for Bioinformatics Abundance of genome data, RNA data, protein data and metabolic pathway data is now available (see http://www.ncbi.nlm.nih.gov) and this is just the beginning of computational modeling in Bioinformatics Complex interactions: between proteins, genes, DNA code, between the genome and the environment much yet to to be discovered Stability and repetitiveness: Genes are relatively stable carriers of information. Many sources of uncertainty: Alternative splicing Mutation in genes caused by: ionising radiation (e.g. X-rays); chemical contamination, replication errors, viruses that insert genes into host cells, aging processes, etc. Mutated genes express differently and cause the production of different proteins It is extremely difficult to model dynamic, evolving processes N.Kasabov, 2003

Bioinformatics Important Challenges Transcription Translation Protein Function Protein 3D Structure Gene Predication Gene Function

Public Data Base Protein sequence KMLSLLMARTYW DNA sequence Microarray Transcription Translation Protein sequence KMLSLLMARTYW DNA sequence {A,T,C,G} Microarray Gene Expression Level

Gene Expression Gene Expression Level: The amount of mRNA copies are in the cell for each Gene. Over expressed Under expressed 49

What are the Advantages? Microarray What can it be used for? How does it work? What are the Advantages? An Example Application Current version I the past Biologist have followed a one gene one experiments philosophy. Today with new technology of microarrays scientist can investigate the expression of several genes at once using this high throughput method. This provides the potential to monitor thousands of genes at the same time. Hybridization is the fundamental concept of how this works. Impacting many fields like genomics, drug discovery, and toxicological research. Currently cost is high. Demand, competition and time will bring down the costs. Most used for RNA expression levels. Get differences between oligonucleotides and cDNA methods. Arrays with 5,000 to 10,000 genes are common. The challenge is in The main use is for comparison of expression analysis. oligonucleotides method; Photolithography is used to construct arrays with high information content onto glass slides. The slides are turned upside down over the hybridization chamber so that fluorescently tagged nucleic acids can hybridized. The laser if focused through the back of the glass to the target solution. The fluorescence emission is collected by a lens and passed through filters to the detector. The array may be moved or the laser or both. Used to understand under what conditions and to what level the gene is expressed. cDNA methods; Gene-specific polynucleotides are arrayed on the matrix. The matrix is probed by fluorescently tagged cDNA made from the expressed RNA. Printing Methods; Robot used to place material on slide. Contact method. Non-contact, capillary tube, ink-jet being tried. The DNA is cross-linked to the matrix by UV light to fix it. Some of the DNA is changed to single-strand by heat or alkali. RNA is used as template to make cDNA by reverse transcription using fluorescence labeled nucleotides.

Microarrays can be used for: Comparison of transcription levels between two cells Examples: Comparison between: Cells from a young mouse vs cell from an old mouse Drug efficacy: Treated cells vs untreated cells

Based on hybridization G A C U G A mRNA How it works: Based on hybridization U G A C A C T G ▀ A = C ≡ T = G ≡ ▀ U G A C A = C ≡ T = A ≡ ▀ U G A A = C ≡ T = A ≡ ▀

Probes and the printing process Head slides (100) Microtiter Plates

Pins Print Head Printing Methods; Robot used to place material on slide. Contact method. Non-contact, capillary tube, ink-jet being tried. The DNA is cross-linked to the matrix by UV light to fix it. Some of the DNA is changed to single-strand by heat or alkali.

The printing tips is like a foutain tip pen The printing tips is like a foutain tip pen. They are machined in a way that promotes capillary action. Liquid is drawn into the tip by putting it into DNA solution in a 96-well plate. The drops are also formed by capillary action, and the size of the drops are determined partly by the sharpness of the point. Printing is performed by lowered the tip very close the surface of the slide.

Print Head with Pins

Microarray Technology 23/2/2008

pseudo-colour image sample (labelled) probe (on chip) [image from Jeremy Buhler]

Experimental design Track what’s on the chip which spot corresponds to which gene Duplicate experimental spots reproducibility Controls DNAs spotted on glass positive probe (induced or repressed) negative probe (bacterial genes on human chip) oligos on glass or synthesised on chip (Affymetrix) point mutants (hybridisation plus/minus)

Images from scanner Resolution standard 10m [currently, max 5m] 100m spot on chip = 10 pixels in diameter Image format TIFF (tagged image file format) 16 bit (65’536 levels of grey) 1cm x 1cm image at 16 bit = 2Mb (uncompressed) other formats exist e.g.. SCN (used at Stanford University) Separate image for each fluorescent sample channel 1, channel 2, etc.

Images in analysis software The two 16-bit images (cy3, cy5) are compressed into 8-bit images Goal : display fluorescence intensities for both wavelengths using a 24-bit RGB overlay image RGB image : Blue values (B) are set to 0 Red values (R) are used for cy5 intensities Green values (G) are used for cy3 intensities Qualitative representation of results

Images : examples Pseudo-color overlay cy3 cy5 Spot color Signal strength Gene expression yellow Control = perturbed unchanged red Control < perturbed induced green Control > perturbed repressed

Data : DNA Microarray assay gene 1 gene 2 gene 3 It is far from trivial to predict gene expression from the sequence code alone. The current availability of microarray measurements of thousands of gene expression levels during the course of an experiment or after the knockout of a gene provides a wealth of complementary information that may be exploited to unravel the complex interplay between genes. 23/2/2008

Data Required: Gene Expression Matrix 1 2 g2 g3 1. g4 23/2/2008

Data Required: Gene Expression Matrix 1 2 g2 g3 1. g4 a1 a2 a3 a4 g1 3 1 g2 2 g3 1. g4 Time serious Snap Shot 23/2/2008

World Health Organization