Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 5290: Algorithms for Bioinformatics Fall 2009

Similar presentations


Presentation on theme: "CSE 5290: Algorithms for Bioinformatics Fall 2009"— Presentation transcript:

1 CSE 5290: Algorithms for Bioinformatics Fall 2009
Suprakash Datta Office: CSEB 3043 Phone: ext 77875 Course page: 4/29/2019 CSE 5290, Fall 2009

2 My research Computer Networks….. Clustering of Biological data, e.g.
Flow cytometry data Microarray data Genomic Signal Processing Convert biological sequences to numerical sequences and apply signal processing tools exon prediction, retroviral insertions 4/29/2019 CSE 5290, Fall 2009

3 Administrivia Lectures: Tue-THu 1:00 - 2:30 pm (Ross S 537)
Textbook: Lectures: Tue-THu 1:00 - 2:30 pm (Ross S 537) Office hours: Wed 1-4 pm, or by appointment. TA: none. Webpage: All announcements/handouts will be published on the webpage -- check often for updates) An Introduction to Bioinformatics Algorithms Neil C. Jones and Pavel A. Pevzner MIT Press, August 2004. 4/29/2019 CSE 5290, Fall 2009

4 Administrivia – contd. Described in more detail on webpage Grading:
Midterms : 30% Homework : 30% Project: 40% Grades: will be on ePost. Project details are on the webpage. 4/29/2019 CSE 5290, Fall 2009

5 Course objectives Familiarity with computational problems in Biology
Applying algorithmic ideas Understand real-life computational challenges Improve understanding of algorithms 4/29/2019 CSE 5290, Fall 2009

6 What I expect from you Some familiarity with undergraduate algorithms
Interest in computational problems Willingness to pick up a little Biology Active interest in your project and assignments 4/29/2019 CSE 5290, Fall 2009

7 What is bioinformatics?
No consensus! Genomics Proteomics Evolutionary biology Clinical trial informatics Epidemiology? Medical image processing? Artificial life? From “Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.” 4/29/2019 CSE 5290, Fall 2009

8 Why Bioinformatics? Make an impact! Interdisciplinary work
Work with real data sets Use algorithmic skills 4/29/2019 CSE 5290, Fall 2009

9 Biological (genomic) data
ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAACTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTGATATGCTTTGCGCCGTCAAAGTTTTGAACGAGAAAAATCCATCCATTACCTTAATAAATGCTGATCCCAAATTTGCTCAAAGGAATCGATTTGCCGTTGGACGGTTCTTATGTCACAATTGATCCTTCTGTGTCGGACTGGTCTAATTACTTTAAATGTGGTCTCCATGTTGCACTCTTTTCTAAAGAAACTTGCACCGGAAAGGTTTGCCAGTGCTCCTCTGGCCGGGCTGCAAGTCTTCTGTGAGGGTGATGTACCATGGCAGTGATTGTCTTCTTCGGCCGCATTCATTTGTGCCGTTGCTTTAGCTGTTGTTAAAGCGAATATGGGCCCTGGTTATCATATCCAAGCAAAATTTAATGCGTATTACGGTCGTTGCAGAACATTATGTTGGTGTTAACAATGGCGGTATGGATCAGGCTGCCTCTGTTTGGTGAGGAAGATCATGCTCTATACGTTGAGTTCAAACCGCAGTTGAAGGCTACTCCGTTTAAATTTCCGCAATTAAAAAACCATGAATAGCTTTGTTATTGCGAACACCCTTGTTGTATCTAACAAGTTTGAAACCGCCCCAACCAACTATAATTTAAGAGTGGTAGAAGTCACCAGCTGCAAATGTTTTAGCTGCCACGTACGGTGTTGTTTTACTTTCTGGAAAAGAAGGATCGAGCACGAATAAAGGTAATCTAAGAGTTCATGAACGTTTATTATGCCAGATATCACAACATTTCCACACCCTGGAACGGCGATATTGAATCCGGCATCGAACGGTTAACAAAGGCTAGTACTAGTTGAAGAGTCTCTCGCCAATAAGAAACAGGGCTTTAGTGTTGACGATGTCGCACAATCCTTGAATTGTTCTCGCGAAATTCACAAGAGACTACTTAACAACATCTCCAGTGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGCGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGCTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTACGAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACAAAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGAAATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCATTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCATCCCATAGAGAAGATCTTTCGGTTCGAAGACATTCCTACGCATAATAAGAATAGGAGGGAATAATGCCAGACAATCTATCATTACATTAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACACAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCCACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCT 4/29/2019 CSE 5290, Fall 2009

10 Annotated data 4/29/2019 CSE 5290, Fall 2009
ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCACTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGAGTTGTATGTATTTGGCCTTATGTAGCTCGCGCCCGTTCGAGATAAGGATGTTTCTAGAAATCCGTAAAGATATAGAGATGTACACACATCTACATTTGTAACTCTATTTATAGTTAGAAACTTGTCCTCGAGGTCTCTCTATAAACCTTTTTGTACGTCCATAAATGTGGAAATCTACCGATCTTTTTGTCTCCGTATATAGGGAAACAGCGTTTTCGCTATACCTGGGTACAAACAGAGTTTTGTAGCTCCACGTTTCGCTGTCTCGTTCCGTGGAGCCCTGGGGGTCCTTAGACATATACTCTTTTTACATAGTTGGATGGGGGTCTGTACCTAGTTAGCTCTAGCTCGAGACAGCGATAGAGAATTTTGTATACTTGTCCGTTTACGTGTACCCGGCGATCTGTCTATGTCTATGGATAGCCACGTTTATGTCGTTTTGTAGGTCGTTGTATATCGATATATAGAGCGCGGATAATTAGGTAGGTCGACCGCGCTGTGGCTCTATCTCTAGTTATTTGTAGGTCGATGTGTAGATGTAATTCTAGCTGGACATCCATACCTACCTGTGTTTGTAGGTATTTCCATAAAACCACAGCGATGTTTGTAGAAAACGCGCGCTACCCCTACACCGCTATATACATAATATATCTCTGTACAAAGATGTATAGAGATAAAGACACAGTTCGAAACCTATCGACTTGGACAAACAGTTGTTTATTTTTAAGTCGCTCGACCGAACTAGTTACACCGAGATCGATTTGTTTCTCTATACACCTCTCTCTGTGGAGAAACAGAGCGAGAAGTAGATTTCGAGAAGCCACCGGGACAATTACAGAAAGCGGTAGATTTACATACAAAGAAGGAGACTTATCGATACACATAGAGGTATCGATAACGATGTATACCTACATCCAGCTCCATACCTAAAGGTAGAAAGACATGTGTCGACATGTTTACGTTTAGATATGGACCTAGATATGTCTACGGACACGAGACTTACACGTCTAGATAGTTGTGTATTTCTCTCTGGACAGATCTGTAAAGGTACGTCTACATGTCGATATCGGTCTGTCGGTAGATAAAAATCTATATTTAGACCGATAACTAGGTCTCGATGTCGTTCTCTAACGATGGACCTGTAGACCGAAAAAGAACTTTTTGTTTTCCACAAGTCTAGACTTTTTTGGGTCTAGACACTTGTGGAATTCGAGATAGGGCTCTCCCTCTGGCTCTATAGATCGACGGGTATCGCTATCGAGACGTGGGTCGAGATGGGTATCTCGCTATACATGGATTTCCAACCTTGTAGGTGTCTCTCGAAGGCGGTAGGGACACAAAATAGCTGTAGCTACAACTACGTATCGATACATAAAGAGCTACAAATTGGGCAGCTGTCTATATGAATTATAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGCTTGGCAAGTTGCCAACTGACGAGATGCAGTAAAAAGAGATTGCCGTCTTGAAACTTTTTGTCCTTTTTTTTTTCCGGGGACTCTACGAACCCTTTGTCCTACTGATTAATTTTGTACTGAATTTGGACAATTCAGATTTTAGTAGACAAGCGCGAGGAGGAAAAGAAATGACAAAAATTCCGATGGACAAGAAGATAGGAAAAAAAAAAAGCTTTCACCGATTTCCTAGACCGGAAAAAAGTCGTATGACATCAGAATGAAATTTTCAAGTTAGACAAGGACAAAATCAGGACAAATTGTAAAGATATAATAAACTATTTGATTCAGCGCCAATTTGCCCTTTTCCATTCCATTAAATCTCTGTTCTCTCTTACTTATATGATGATTAGGTATCATCTGTATAAAACTCCTTTCTTAATTTCACTCTAAAGCATACCGCCAGGTACGTACGTATACAGAAATACATGTATCTGTGGATATCCGTACATCGAGCCACATATCCCTTTAACTGGCGAAATATACTTATACCGAAAATTAGAGGGAACGCGGTATATGTACGACCGACACAATGAAACTAGATTGCGTAATTTCTAGTGTAAACAAATATGGCTATCTAAATGTCTCTAGGTACATCGAAAGAAAGTTACATATATTTAAATCGATAACTACGTAGATGGGTTTCTAGTTGTAGAGCGACAAATCTCGAAAGCTCTTTTTGGAGAGGTAGATATATAGTATATATCGCTGTCGAAGTATACAAATATCTACTTCGATAACTAACCAACGGTATCGGTCTAGAAAAGTCTCGCCAGGTCCGTAAACAAAGAGGTACATAACGAGACCGGTGGGTGTTTCGGTATACACTTGTGGGTATCGAGACATGTATGTTTGTGTGTAACTATATCCAAGGTCTTTGTGGACTTGTAGAGGTGTATCTGTCGATATTTACGTC 4/29/2019 CSE 5290, Fall 2009

11 Importance of algorithms
– Compare human vs. mouse (blocks of 1,000 nucleotides) • 3,000,000*3,000,000 comparisons, each 1,000*1,000 operations (w/dynamic progr.) • At 1 trillion operations per second, it would take 104 days – Search all regulatory motifs of length 20 (11^20) in the human genome • 426 years 4/29/2019 CSE 5290, Fall 2009

12 Clustering flow cytometry data
1 million vectors Each of length 25 (real numbers) Need quick output! Results should be biologically meaningful! 4/29/2019 CSE 5290, Fall 2009

13 R: introduction Why R? Lots of available libraries (statistics, machine learning,…..) Very good visualization capability Free Multiplatform Easy to publish code Biologists use it! 4/29/2019 CSE 5290, Fall 2009

14 R - contd Grew out of a popular statistics package
Used extensively by statisticians and computational biologists Lots of resources (see class web page) Some similarities with MatLab 4/29/2019 CSE 5290, Fall 2009

15 R – strengths and weaknesses
Allows very quick testing of ideas Libraries available for most purposes Allows integration with C code Weaknesses Not as efficient as MatLab on matrix operations Not very good at handling large data sets 4/29/2019 CSE 5290, Fall 2009

16 Next class Ch 3 of text In the meantime… Read Ch 1 and 2 on your own.
Get familiar with R 4/29/2019 CSE 5290, Fall 2009


Download ppt "CSE 5290: Algorithms for Bioinformatics Fall 2009"

Similar presentations


Ads by Google