Download presentation
Presentation is loading. Please wait.
1
CSIS-400: Bioinformatics Dr. Eric Breimer
2
Good News & Bad News Remind me to tell you the good news
3
Quizzes At least 6 pop-quizzes – The lowest one gets dropped 20 minutes long – Not difficult if you come to class I may announce them ahead of time – if attendance is not a problem
4
Projects 3 Projects – Kind of hard Open due dates – Submit early and I’ll give you feedback – Then, submit again to get a better grade – Final submission a week before classes end
5
Exams First exam (early October) – Fail it and you should probably drop the course – Things are only going to get harder Second exam is a warm up for the final – (i.e. late November) Third exam is cumulative – during finals week.
6
Lecture My PowerPoint slides are sometimes bare – Not enough detail to study from Textbook is a reference manual – Look things up, not a tutorial Most of the material comes from lecture. – Miss lecture a lot and you will be lost
7
Grading 25% Cumulative final exam 15% Higher exam grade 10% Lower exam grade 20% Quiz average – 5 quizzes (equally weights) 30% Project average – 3 projects (different weights)
8
Bioinformatics Computer Science/Math meets Biology/Genetics Data-driven Science – Data collection – wet lab – 5% – Analyzing the data – computer lab – 95% Human Genome Project – best example of bioinformatics Machine Learning & Data Mining – used heavily in bioinformatics
9
Science (remind me to tell you a story) What is science? Make an observation Develop a hypothesis – explains the observation Run some experiments – to test your hypothesis Collect enough data to support your hypothesis – and it becomes a Theory
10
Science (again) 1. Observation 2. Hypothesis 3. Experiments 4. Supportive Data 5. Theory
11
Bad Science Collect some data Look at the data exhaustively and try to find some property/hypothesis Develop tools and run experiments to help you discover some property/hypothesis Then develop a hypothesis that ‘makes sense’ (scientifically speaking)
12
Bad Science (example) For years, material science and engineering followed the ‘bad science’ model. The search for better goop 1. Mix materials to make new types of goop 2. Test all the new types of goop 3. Pick the best goop 4. Then, try to explain scientifically: Why is this goop so good?
13
Bad Science (again) 1. Collect Data 2. Experiment 3. Design Tools 4. Analyze Data 5. Hypothesis/Theory
14
Some History Researcher were shunned for conducting bad science – Shunned by academia – Not shunned by industry Then came the Human Genome Project Industry (Celera) kicked Academia’s (NIH) butt. Q:What was their secret weapon? A: Bad Science Please don’t shun me!
15
Analogy Problem: You are in a cage with 500 lbs. hungry tiger Only options: 1. Reason with the tiger 2. Shoot the tiger Nobody wants to hurt a tiger, but… Some problems can’t be reasoned with…
16
Analogy The Human Genome data is like a 500 lbs. hungry tiger. A high-throughput computer is like a shotgun. Pure scientists will die reasoning with the problem Meanwhile, Industry will use computers and ‘bad science’ to perform amazing genetic experiments.
17
The Genome Project (overly simplified version) Biologists/Chemists figured out a way of reading the genetic material in a cell (sequencing) Unfortunately, they couldn’t read it from beginning to end They could only read it in small chunks And, the process was prone to error.
18
The Genome Project (overly simplified version) Over time, Biologists/Chemists sequenced a lot of DNA. Lots of different organisms Lots of different segments Before any ‘real’ hypotheses could be made they had to 1. Assemble the data 2. Correct the errors 3. Put the segments together
19
The Genome Project (overly simplified version) Thus, algorithms, techniques, and methodologies were pioneered just to put together the segments. Assembling entire Genomes is just the beginning The next goal is to generate a complete mapping between sequence and function
20
Heart formation Mapping Which pieces do What function? Glucose metabolism Liver function Brain development Vision Cancer prob. Blaa I’m not good at making stuff up
21
Analogy Imagine a C++ Program Imagine a really big one 4 million lines long… int main(void) { int x = 0; int y = 0; cout << "Enter two numbers: "; cin >> x >> y; int z = x + y; cout >> z; }
22
Analogy Imagine compiling it into assembly language 00424D90 mov eax,dword ptr [ecx] 00424D92 mov edx,7EFEFEFFh 00424D97 add edx,eax 00424D99 xor eax,0FFh 00424D9C xor eax,edx 00424D9E add ecx,4 00424DA1 test eax,81010100h 00424DA6 je main_loop (00424d90) 00424DA8 mov eax,dword ptr [ecx-4]
23
Analogy Imagine what the machine code would look like Shown here in Hex format
24
Analogy Now picture the hex code as a binary sequence – 1010101010001101101011110011000101010100011001001101010 1101101101010101010100011010101000110110101111001100010 1010100011001001010101000110110101110101000110110101111 0011000101010100011001001101010110110110101010101010001 1101010100011011010111100110001010101000110010010101010 0011011010111010100011011010111100110001010101000110010 0110101011011011010101010101000110101010001101101011110 0110001010101000110010010101010001101101011101010001101 1010111100110001010101000110010011010101101101111010101 0001101101011110011000101010100011001001010101000110110 1011101010001101101011110011000101010100011001001101010 1101101101010101010100011010101000110110101111001100010 1010100011001001010101000110110101110101000110110101111 0011000101010100011001001
25
Analogy Now change some of the bits (2% error rate) – 1010101010001101100011110011000101010100011101001101010 1101101101010101010100011000101000110110101111001100010 1000100011001011010101000110110101110101000110110101111 0011000101010100011001001101010110110110101010101010001 1101010100011011010111100110001011101000110010010101010 0011011010111010100011011010110100110001010101000010010 0100101001011011010101010101000110111010001101101011110 0110001010101000110010010101010001101101011101010001101 1010110100110001010101000110010011010101100101111010101 0001101101011110011000101010100011001001010111000110110 1011101010001101101010010011000101010100011001001101010 1111101101010101010100011010101000110110101111001100010 1010100011001001010101000110110101110101010110110101111 0011000101010100001001001
26
Analogy Lets randomly sample segments of the sequences – 1010101010001101100011110011000101010100011101001101010 1101101101010101010100011000101000110110101111001100010 1000100011001011010101000110110101110101000110110101111 0011000101010100011001001101010110110110101010101010001 1101010100011011010111100110001011101000110010010101010 0011011010111010100011011010110100110001010101000010010 0100101001011011010101010101000110111010001101101011110 0110001010101000110010010101010001101101011101010001101 1010110100110001010101000110010011010101100101111010101 0001101101011110011000101010100011001001010111000110110 1011101010001101101010010011000101010100011001001101010 1111101101010101010100011010101000110110101111001100010 1010100011001001010101000110110101110101010110110101111 0011000101010100001001001 1000110110001111001100010101010001 1110101000110110101001001
27
Analogy Finally, lets collect 1,000,000 random segments 1010001000110011011010101000110110 101000100101010101000110110 00101010101010011010101000110110 011010111010101000110110 00101010101010011010101101010101 …
28
Analogy Complete Problem: – Given 1,000,000 random samples. Remember there are random errors – Try to re-construct the original 4 million line program You don’t have to really re-construct it You just have to tell me EXACTLY what it does.
29
Analogy Complete Here’s the catch: – I’m not even going to tell you what programming language I used – In fact, the only thing I’ll tell you is that it’s a language you’ve never seen in your entire life.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.