Bioinformatics Lecture to accompany BLAST/ORF finder activity Start with orientation to activity, for taking notes effectively Slide difference between.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Replication, Transcription, and Translation Before a cell can divide, the DNA in the nucleus of the cell must be duplicated. Since the DNA molecule consists.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
CSE182-L12 Gene Finding.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Finding prokaryotic genes and non intronic eukaryotic genes
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Sequence Analysis Determining how similar 2 (or more) gene/protein sequences are (too each other) is a “staple” function in bioinformatics. This information.
How Genes Work. Transcription The information contained in DNA is stored in blocks called genes  the genes code for proteins  the proteins determine.
14.3 Studying the Human Genome
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Lesson 10 Bioinformatics
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
BME 110L / BIOL 181L Computational Biology Tools February 19: In-class exercise: a phylogenetic tree for that.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Chapter 13 Table of Contents Section 1 DNA Technology
Organizing information in the post-genomic era The rise of bioinformatics.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Bioinformatics & Biotechnology Lecture 1 Sequencing BLAST PCR Gel Electrophoresis.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Bioinformatics Lecture to accompany BLAST/ORF finder activity
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Microbiology Chapter 9 Genetics - Science of the study of heredity, variations in organisms that are transferable from generations to generation DNA is.
ESTs Ian Keller Laboratory Techniques in Molecular Bio.
Microbial Genetics.  DNA replication is semi- conservative:  What does it mean? During cell division, each daughter cell inherits 2 DNA strands, One.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
CENTRAL DOGMA, GENES & MUTATIONS. Central Dogma After the discovery of DNA’s structure, scientists turned to investigating how DNA served as a genetic.
Describe the benefits of genetic engineering as they relate to agriculture and industry. Explain how recombinant DNA technology can improve human health.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Bacterial infection by lytic virus
bacteria and eukaryotes
Biotechnology.
Bioinformatics Overview
Bacterial infection by lytic virus
Part 3 Gene Technology & Medicine
RNA.
From DNA to Proteins Transcription.
Bellwork: What is the human genome project. What was its purpose
Chapter 14 Bioinformatics—the study of a genome
Scientists use several techniques to manipulate DNA.
CHAPTER 12 DNA Technology and the Human Genome
Introduction to Bioinformatics II
Transcription is the synthesis of RNA under the direction of DNA
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
How to Use This Presentation
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
Central Dogma
From Mendel to Genomics
Evolution of Genomes Chapter 21.
Presentation transcript:

Bioinformatics Lecture to accompany BLAST/ORF finder activity Start with orientation to activity, for taking notes effectively Slide difference between online and presentation

Genetics Review Central Dogma Transcription Translation Operon Promoter Shine Delgarno/Kozak Consensus Sequence

Source: Daniel Horspool, Wikipedia editor

What the heck is bioinformatics? Interdisciplinary field combining statistics and computer science to understand biology. Came about with the huge increase in biological data with the sequencing of proteins and genomes. Essentially, we generate too much data for humans to parse through and interpret, so computers are used.

Know your -omics Bioinformatics is primarily used to study the “omics” – Genomics: Study of gene sequences, functions, and dynamics. Genome: All DNA in a organism – Proteomics: Study of protein sequences, structures, and functions. Proteome: All Proteins in a organism – Transcriptomics: Study of RNA. Transcriptome: All RNA, including mRNA, tRNA, and rRNA in a organism – Metabolomics: Study of metabolites and biological processes. Metabolome: All metabolites in a organism

Why is this important in everyday life? This kind of –omic research has far reaching implications for medicine and science. Targeted cancer therapies, personalized medicine, rapid infection identification, genetic manipulation, etc. are already in use, though it is early. In the future, who knows what we can find in the depths of the genome of humans and other organisms.

Infection in-first-quick-dna-test-diagnoses-a-boys- illness.html?hpw&rref=us in-first-quick-dna-test-diagnoses-a-boys- illness.html?hpw&rref=us 14 year old male was infected by a bacteria they couldn’t culture. Sequencing and analysis has become fast enough for them to sequence the bacteria’s DNA to identify both the organism and the treatment – Penicillin.

Cancer sequencing-could-change-cancer-treatment- forever/ sequencing-could-change-cancer-treatment- forever/ ok-video-who-knew-cancer-has-an-off-switch- video/ ok-video-who-knew-cancer-has-an-off-switch- video/ New cancer treatments sequence the cancer DNA to find out what went wrong or potential for “on/off switches” for specific, targeted therapies.

Microbiome Many bacteria cannot be cultured. Bioinformatics and metagenomic sequencing allows for insight into such systems as the gut, mouth, and skin microbiome. Implications for treatment for diabetes, skin conditions, dentistry, gastrointestinal disorders etc. -and-intelligence/the-impact-of- human-microbiome-variations-on- systems-pharmacology-and- personalized- therapeutics/ /

SPIDERGOAT!!! environment environment HOLY CRAP! SPIDERGOAT!!!! Utah State University managed to insert a gene into a goat that adds a protein that can be extracted to produce spider silk.

Genetics as a Discovery Science The great new frontier in biology lies on the small scale Bioinformatics is used to discover new genes, microorganisms, proteins, etc. by parsing through the gold mine of genetic data

Important Areas Sequencing Assembly Gene Finding Annotation Alignment Databases Protein Structure/Interaction Prediction Metagenomics

Sequencing In order to obtain genetic information, the DNA must be sequenced first. There are many methods of doing this, each with their own strengths and weaknesses. Most don’t read the DNA continuously, but build it in fragments, which has to be assembled in to longer reads called contigs.

Sanger Sequencing This was the first method of sequencing and was the standard for 25 years. Now supplanted by next gen sequencing, but is still used for small scale projects. Works by adding in a small amount of fluorescent or radioactively labeled nucleotides that cannot be bonded to again, producing fragments of varying length. These fragments then undergo gel electrophoresis to separate the fragments.

Source:

Next Gen Sequencing and the 1000$ genome Right now, sequencing is still too expensive for entire genomes, so much of the work done now sequences smaller sections. The goal of the 1000$ genome is to make sequencing cheap and fast enough so that every person could feasibly have their genome sequenced.

Assembly Most sequencing technologies don’t produce one long read, so the fragments must be assembled. This process often requires powerful computers to do in a timely fashion. There are still a few problems with this, especially when the sequence has repeated regions.

Source:

Gene Finding Once the section of DNA has been sequenced, genes are found using pattern recognition. They look for a start codon downstream of a Shine-Delgarno (prokaryotes) or Kozak (eukaryotes) consensus sequence. Then looks for the potential stop codons as well as introns and exons (eukaryotes)

Annotation Experimentally figuring out what a particular gene does is costly and at times difficult or impossible to do. Annotation is used to make an educated guess about what it does by comparing it to similar sequences, following the axiom of structure yielding function i.e. something with a similar sequence is likely to have a similar function

Protein Structure Prediction Like with gene prediction, the prediction of a proteins structure is difficult and costly, so many algorithms have been developed to attempt to predict structure from a sequence. Pretty good at predicting secondary structure. Not as good at tertiary or quaternary. Also working on predicting protein interactions.

Alignment In order to search for, annotate, or find genes, they have to be aligned against others. This is where things like BLAST come in. BLAST, or Basic Local Alignment Search Tool, uses an pairwise alignment algorithm to compare a sequence to other sequences one at a time in a particular database, whether that sequence be protein or gene. BLAST is the most commonly used one, but isn’t the best.

Multiple Alignment By aligning multiple sequences at once, phylogeny of organisms or proteins can be determined, as long as those sequences really are similar. 16S multiple alignments is often what is used to determine prokaryotic phylogeny, where the definition of species is funky. 16S is used as it is highly conserved and is found in all prokaryotic organisms. Later we will do one with a eukaryotic phylogeny

Databases Bioinformatics also seeks to compile huge collections of similar data in order to consolidate information in one place. It also has to be organized in such a way that is useful to other scientists. These databases include things like UniProt, SEED, KEGG, RCSB/PDB etc. All of those listed make great resources for creating your own activities

Metagenomics Many bacteria can’t be cultured, so much of the picture of the microbiome is hazy. Metagenomics is the shotgun or grenade approach Sequence everything below a certain size cell. Gives insight into the genetics of unculturable organisms

Activity Go on the SM 2 website and locate the instructions for the activity There are 6 different prokaryotic operons, each containing multiple gene sequences. Organize yourselves into 6 groups and try to figure out the probable function of each gene and what organism it comes from. Then search NCBI for papers on one of the genes functions. Discuss with the group and the class what you learned.

NCBI ORF Finder This looks for potential genes in a particular DNA sequence Does this by looking for Shine-Delgarno Sequences upstream of a start codon, then continues until it hits a stop codon Searches in 6 “frames” as codons are triplets and for “backwards sequences”. All are possible sites for translation.

What do the BLAST scores mean? E- Value: The probability of finding an equal or better match by chance. Query Cover: % of the query sequence length that is included in the aligned segment with that result. Identity: percent similarity Total score: Sum of the alignment scores from all segments from the same database that match the query sequence. This will only differ from the mx score if several parts of the database sequence match different parts of the query sequence. Blast Page

Take home message With faster sequencing and increased genetic and proteomic research, bioinformatics becomes more and more important. Right now we are literally producing too much data to go through. Bioinformatics provides computational methods to utilize these massive datasets for practical use in medicine and science. This is the direction that all genetic and proteomic research is going and will eventually become standard practice to teach when teaching genetics. For now, this makes a great addition for teaching genetics by getting the students working with sequences. Activities can be designed around Central Dogma to teach how genetic information flows in living systems and how scientists today gather and use that information.