Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.

Slides:



Advertisements
Similar presentations
Chapter 10 How proteins are made.
Advertisements

Basic biology: A Review. Which half are you? Half of you will already know >90% of this material-- your challege will be to stay awake enough to catch.
Biological Motivation Gene Finding
Protein Targetting Prokaryotes vs. Eukaryotes Mutations
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Xuhua Xia Mutation Xuhua Xia
Elaine Chiu Eden Maloney Nancy Phang
Finding Eukaryotic Open reading frames.
Section 8.6: Gene Expression and Regulation
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
The Molecular Genetics of Gene Expression
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
ECE 501 Introduction to BME
Introduction to BioInformatics GCB/CIS535
Introduction to Molecular Biology. G-C and A-T pairing.
GENE to PROTEIN. Garrod (1909) hypothesized that the symptoms of an inherited disease reflect a person’s inability to make a particular enzyme. The breakthrough.
Biological Motivation Gene Finding in Eukaryotic Genomes
Gene Structure and Identification
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
From Gene To Protein Chapter 17. The Connection Between Genes and Proteins Proteins - link between genotype (what DNA says) and phenotype (physical expression)
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
RNA and Protein Synthesis
5. Point mutations can affect protein structure and function
RNA and Protein Synthesis
Part Transcription 1 Transcription 2 Translation.
Gene Mutations Higher Human Biology Unit 1 – Human Cells.
The Biology and Genetic Base of Cancer. 2 (Mutation)
Mutations.
Sequencing a genome and Basic Sequence Alignment
Chapter 13. The Central Dogma of Biology: RNA Structure: 1. It is a nucleic acid. 2. It is made of monomers called nucleotides 3. There are two differences.
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Main Idea #4 Gene Expression is regulated by the cell, and mutations can affect this expression.
Copyright © 2009 Pearson Education, Inc. Chapter 14 The Genetic Code and Transcription Copyright © 2009 Pearson Education, Inc.
Ch. 17 From Gene to Protein. Genes specify proteins via transcription and translation DNA controls metabolism by directing cells to make specific enzymes.
From Genomes to Genes Rui Alves.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
Mutations in DNA changes in the DNA sequence that can be inherited can have negative effects (a faulty gene for a trans- membrane protein leads to cystic.
D.N.A Describe how you would go about genetically engineering a bacterium to produce human epidermal growth factor (EGF), a protein used in treating burns.
Finding genes in the genome
 During replication (in DNA), an error may be made that causes changes in the mRNA and proteins made from that part of the DNA  These errors or changes.
Genetics. Mutations of Genes Mutation – change in the nucleotide base sequence of a genome; rare Not all mutations change the phenotype Two classes of.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
KEY CONCEPT 8.5 Translation converts an mRNA message into a polypeptide, or protein.
Chapter 13 Test Review.
Pairwise Sequence Alignment
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Biological Motivation Gene Finding in Eukaryotic Genomes Rhys Price Jones Anne R. Haake.
GROUP 2 DNA TO PROTEIN. 9.1 RICIN AND YOUR RIBOSOMES.
bacteria and eukaryotes
AP Biology Crosby High School
Lesson Four Structure of a Gene.
Lesson Four Structure of a Gene.
From Gene to Protein pp Discover Biology: C15 From Gene to Protein pp
Chapter 4 – proteins, mutations & genetic disorders
MUTATIONS.
There are four levels of structure in proteins
Introduction to Bioinformatics II
Mutations changes in the DNA sequence that can be inherited
Chapter 17 From Gene to Protein.
MUTATIONS.
CHAPTER 17 FROM GENE TO PROTEIN.
From Mendel to Genomics
MUTATIONS.
Mutation and DNA repair
Presentation transcript:

Pattern Matching Rhys Price Jones Anne R. Haake

What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence for matches to short sequence patterns (Staden 1990).

Why search for patterns? Usually the sequences of interest (the query sequences) are known to be indicators of some important biological function Search for patterns in nucleotide sequence –DNA or RNA Search for patterns in amino acid sequence

Motif multiples uses of the word Def: a pattern; typically is used to refer to a short (up to ten bases or residues) repeated or conserved pattern in nucleic acids or proteins Def: a short conserved sequence in a protein; usually associated with function –in a broader sense, motif is used for all localized regions of homology, regardless of size

Some examples of patterns in DNA sequence: Restriction sites:recognition sites for the restriction endonucleases Intron splice sites Codons specifying ORFs Promoters DNA binding sites for regulatory proteins

Restriction Sites Why identify them? Exact or inexact matches? Examples: Restriction sites

Splice Sites Splice donor and splice acceptor are consensus sequences –A statistical determination of the pattern;approximates the pattern C(orA)AG/GTA(orG)AGT "donor" splice site T(orC)nNC(orT)AG/G "acceptor" splice site Splice site example

Splice Sites Remember that they are consensus sequences Why are splice sites of interest? –Gene finding –Mutations in consensus sequence at the splice junctions common in many inherited disorders Ex: thalassemias, muscular dystrophy, Tay-Sachs, neurofibromatosis, Darier’s disease…….. One of the thalassemias: mutation at splice acceptor YYYNCAG| normal YYYNCGG| mutant

Codons Specifying ORFs ORFs (open reading frames) Start codon … a.a’s and no stop codon Prokaryotic start codons: ATG, GTG or TTG usually, but is species specific Eukaryotic start: ATG Code table More on this, too, when we discuss gene finding

Promoters Prokaryotic promoters: Consensus sequences –TTGACA ± TATAAT Eukaryotic promoters –TATA box at –25 relative to transcriptional start site consensus is 5’-TATAWAW-3’ (W= A or T) –Initiator sequence(Inr) consensus is 5’-YYCARR-3’ (Y is C or T; R is G or A) the +1 nucleotide (start) is usually the A of the Inr sequence Bind basal transcription factors –We’ll revisit this when we discuss gene finding

Transcription Factor Binding Sites Regulatory transcription factors are sequence-specific DNA-binding proteins; sites are often found in or near gene promoter regions DNA sequence is called the response element What are the DNA sequences like? Response elements

Some examples of patterns in protein sequences (motifs): Prediction of secondary and tertiary structure – e.g. transcription factors helix-turn-helix, b-zip, zinc-finger Examples Presence of active sites of enzymes Presence of cell localization signals

Exact vs Inexact (Approximate) Pattern Matching Exact Pattern Matching –Limited use in bioinformatics –Well-known algorithms (last week) –A common use of exact pattern matching is to compare a sequence against a large number of possible known patterns such as in the identification of restriction sites Approximate –Most of the other examples of pattern matching in bioinformatics

Other uses of exact pattern matching? Check PCR primers? Annotation? (text matching)

Why search for patterns? Pattern matching in sequences is also the basis of searching through a sequence database –Sequence alignment

Pairwise Sequence Alignment An alignment between 2 sequences is a pairwise match between sequences. Pairwise sequence comparison is the primary means of linking biological function to the genome and of propagating known information from one genome to another (Gibas & Jambeck).

Why are inexact pattern matches relevant in sequence alignments? Sequencing errors Mutation –2 primary types point mutations (affect a single nucleotide) segmental mutations (affect a few to hundreds of adjoining nucleotides) –substitutions (transitions, transversions) –insertions, deletions

Mutations Point mutations usually occur from a nucleotide mismatch that becomes “fixed” during the process of replication –Escapes the DNA repair mechanism Significant when occur within a coding region and also cause a change in functionality –Non-synonymous mutation –Synonymous mutation: mutated sequence codes for same amino acid as before mutation –Allowance for synonymous mutation due to wobble and degeneracy of the code Code Table

Evolutionary Considerations Through time mutations tend to be preserved if they are not deleterious Functionally important sequences tend to be conserved Non-functional or non-coding sequences diverge at a high rate

Evolutionary Considerations The tendency of functionally important sequences to remain relatively unchanged over time is the basis for sequence analysis –Allows us to draw evolutionary connections among genes that are related in sequence