Analyzing Promoter Sequences with Multilayer Perceptrons Glenn Walker ECE 539.

Slides:



Advertisements
Similar presentations
Nucleic Acids Nucleic Acid Basics Contain instructions to build proteins 2 types: – DNA – RNA Composed of smaller units called nucleotides – Monomer:
Advertisements

Chromosomes carry genetic information
Nucleic Acids Nucleic Acid Basics Contain instructions to build proteins 2 types: – DNA – RNA Composed of smaller units called nucleotides – Monomer:
How does DNA work? Building the Proteins that your body needs.
How DNA helps make you you. DNA Function Your development and survival depend on… Your development and survival depend on…  which proteins your cells.
DNA
DNA stands for Deoxyribonucleic acid DNA Structure DNA consists of two molecules that are arranged into a ladder-like structure called a Double Helix.
Review Describe the three main difference between RNA and DNA
© N. Kasabov Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering, MIT Press, 1996 INFO331 Machine learning. Neural networks. Supervised.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, February 4, 2000 Lijun.
RNA. ________ are coded DNA instructions that control the ___________ of proteins. Genetic ______________ can be decoded by copying part of the ___________.
DNA Bases. Adenine: Adenine: (A) pairs with Thymine (T) only.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
DNA holds the instructions When proteins are produced in the body, the instructions for how to make it comes from the DNA (deoxyribonucleic acid) BUT,
CHAPTER 12 STUDY GUIDE MATER LAKES ACADEMY MR. R. VAZQUEZ BIOLOGY
Lecture #3 Transcription Unit 4: Molecular Genetics.
RNA Chapter Structure of RNA Ribose- the sugar molecule of every RNA nucleotide Uracil- nitrogen-containing pyrimidine base (replaces thymine) Uracil.
Nucleic Acids.
Protein Synthesis Transcription and Translation DNA Transcription RNA Translation Protein.
Non-Bayes classifiers. Linear discriminants, neural networks.
Nucleic Acids and Protein Synthesis 10 – 1 DNA 10 – 2 RNA 10 – 3 Protein Synthesis.
DNA Structure and Protein Synthesis (also known as Gene Expression)
Chapter 13 –RNA and Protein Synthesis
NUCLEIC ACIDS. The four major classes of macromolecules are: Carbohydrates Proteins Lipids Nucleic acids.
The student is expected to: 4B investigate and explain cellular processes, including homeostasis, energy conversions, transport of molecules, and synthesis.
DNADNA. Structure and replication of DNA - syllabus content Structure of DNA — nucleotides contain deoxyribose sugar, phosphate and base. DNA has a sugar–phosphate.
THE NUCLEIC ACIDS DNA & RNA. DNA-DeoxyriboNucleic Acid  DNA is the genetic material present in chromosomes  Made up of monomers called “nucleotides”
Placed on the same page as your notes Warm-up pg. 48 Complete the complementary strand of DNA A T G A C G A C T Diagram 1 A T G A C G A C T T A A C T G.
RiboNucleic Acid (RNA) -Contrast RNA and DNA. -Explain the process of transcription. - Differentiate between the 3 main types of RNA -Differentiate between.
DNA Structure DNA consists of two molecules that are arranged into a ladder-like structure called a Double Helix. A molecule of DNA is made up of millions.
DNA and RNA Structure and Function Chapter 12 DNA DEOXYRIBONUCLEIC ACID Section 12-1.
The Structure of DNA. DNA is a nucleic acid. There are two types of nucleic acids: __________ or deoxyribonucleic acid __________ or ribonucleic acid.
DNA Structure and Protein Synthesis (also known as Gene Expression)
RNA Ribonucleic Acid Single-stranded
(3) Gene Expression Gene Expression (A) What is Gene Expression?
RNA & Protein synthesis
H.B.2A.1 Construct explanations of how the structures of carbohydrates, lipids, proteins, and nucleic acids (including DNA and RNA) are related.
DNA Structure.
DNA Structure.
DNA, RNA & PROTEINS Part 1 The molecules of life.
RNA (Ch 13.1).
Notes – Protein Synthesis: Transcription
13.1 RNA.
Concept: Explain Transcription Using Models of DNA and RNA
Nucleotide.
12-3 RNA and Protein Synthesis
Transcription.
RNA
RNA carries DNA’s instructions.
AIM: How are DNA molecules structured
DNA: Deoxyribonucleic Acid
Replication, Transcription, Translation
RNA and Transcription DNA RNA PROTEIN.
RNA & Protein synthesis
12-3 RNA and Protein Synthesis
Copyright Pearson Prentice Hall
Copyright Pearson Prentice Hall
RNA Transcription.
REVIEW DNA DNA Replication Transcription Translation.
Making Proteins Transcription Translation.
12-3 RNA and Protein Synthesis
4/6 Objective: Explain the steps and key players in transcription.
Copyright Pearson Prentice Hall
Copyright Pearson Prentice Hall
Copyright Pearson Prentice Hall
DNA Structure.
RNA Transcription.
TRANSCRIPTION DNA mRNA.
DNA Structure.
4/2 Objective: Explain the steps and key players in transcription.
Presentation transcript:

Analyzing Promoter Sequences with Multilayer Perceptrons Glenn Walker ECE 539

Background (DNA) Deoxyribonucleic acid (DNA) is a long molecule made up of combinations of four smaller molecules (base pairs): adenine (A), cytosine (C), guanine (G), thymine (T). These four molecules are combined in an order unique to each living organism. The order of the molecules contains the information to make all the parts necessary for any organism to survive. AGTCAATTGAGACCGATTAGAGATT TCAGTTAACTCTGGCTAATCTCTAA DNA is two-stranded and complementary

Background (DNA) Genes are sections of DNA that can contain from a few hundred base-pairs to tens of thousands. Genes contain instructions on how to make proteins -- molecules necessary for building and maintaining organisms. Three different genes on piece of DNA “junk” DNA

Background Promoters are sequences of DNA to which RNA polymerase can bind and begin transcription of a gene. Transcription is the process of making a complementary copy of the DNA which is then translated into a protein. promoter sequence actual gene information RNA polymerase binds here and begins transcription

Problem Knowing gene locations is desirable for medical reasons One way to find genes is to look for promoter regions How do we find promoter regions?

One Solution Promoter regions are highly conserved -- different regions often contain similar patterns We can train neural networks to recognize promoter regions We choose a multilayer perceptron

Neural Network Configuration The multilayer perceptron (MLP) is a very common neural network configuration We used a MLP with 3 layers -- an input, output, and hidden layer Number of:InputsHiddenOutput 1 115/58 4,8,16, 20,24,28, 32

Neural Network Configuration Two ways of presenting input were tried -- one used 58 inputs and the other 115 Different numbers of hidden nodes were tried to find the optimally structured neural network Only one output was used to indicate whether the input was a promoter sequence or not (1 or 0, respectively)

Neural Network Inputs The inputs consisted of 106 sets of 57 bases of DNA. 53 were promoters and 53 were not. One of the input promoter sequences: TACTAGCAATACGCTTGCGTTCGGTGGTTAAGTATGTATAATGCGCGGGCTTGTCGT The input was presented to the neural network in two ways: A 00 C 01 G 10 T 11 A 0.2 C 0.4 G 0.6 T input neurons 57 input neurons

Neural Network Training Each configuration was run 10 times. Within each of the 10 runs, 106 runs were performed. For each of these, 105 of the promoter sequences were used for training with the 106 th used for testing. The testing sequences were changed for each of the 106 runs so that each sequence was the test sequence only once. Ten runs were necessary since weights for the MLP were initialized to random values which might have led to different classifications for the same input sequence.

Hidden Nodes vs. Classification Rate

Scaled Input vs. Classification Rate

Compared to Others Walker (NN)78% O’Neil (NN)83% Towell (KBANN) > 90% O’Neil (Rule-based)70% ID3 (Decision tree)76%

Conclusion Not the best but not the worst Using a hybrid technique would improve results The MLP is a very useful tool for the field of bioinformatics

References Harley, C. B. and Reynolds, R. P Analysis of E. coli promoter sequences. Nucleic Acids Research, 15(5): O’Neill, M. C Training back-propagation neural networks to define and detect DNA-binding sites. Nucleic Acids Research, 19(2): Quinlan, J Induction of decision trees. Machine Learning, 1: Towell, G. G., Shavlik, J. W., and Noordewier, M. O Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks. AAAI-90,