Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Center for Computational Genomics and Bioinformatics Christopher Dwan Mike Karo Tim Kunau.

Similar presentations


Presentation on theme: "The Center for Computational Genomics and Bioinformatics Christopher Dwan Mike Karo Tim Kunau."— Presentation transcript:

1 The Center for Computational Genomics and Bioinformatics Christopher Dwan Mike Karo Tim Kunau

2 Outline Perspective Processing tasks & requirements Computational solutions Interesting issues

3 Funding chart

4

5 The “Bioinformatics” component “Pipeline” data processing and storage 100Kb data <5sec processing time 10,000+ / month The problem: Interface (batch & dependancy management) Similarity search Search against one or more ~10GB databases The Problem: Data movement & memory »(much easier on dedicated resources)

6 The “bioinformatics” component “Unigene” assembly Traditional long run, big memory compute problem Comes at the end of the other two types The problem: algorithms Clustering / Pattern Discovery Conference driven Causes us to redo the other tasks

7 The “bioinformatics” component “Data warehouses” –Mirroring and cross checking other public resources –Local Oracle implementation of public databases for local users (Genbank / Swiss- PROT / Medicago …)

8 The “bioinformatics” component Microarray data Image data (~1MB per image) requires processing and storage Unknown normalization, errors, etc. requires that we simply keep all the raw data. Web based display of results Visualization…

9 Computational resources ~100 CPU Opportunistic Condor “Flock” Not dedicated Configuration can change without warning No permanent local data storage Machines sit on desks. “flocking” with Madison, CS dept, other labs Reciprocity can hurt a LOT. Server farms Intel / Alpha Hard to find money to buy dedicated machines, esp. on single organism projects.

10 Software and user issues An intuitive interface to parallel and batch systems gives uninformed users a great deal of power. Tools from outside: Poor scalability Tools from inside: Poor portability

11 Heuristic algorithms Many bioinformatics tools are heuristic rather than complete searches. These searches can return different results on different machines (dynamic thresholds, 32 vs. 64 bit math, …) How do we tell “different” from “erroneous?”

12 Thank you: The Condor team at Madison Sanger Center

13 Collaborations are the key Christopher Dwan cdwan@ahc.umn.educdwan@ahc.umn.edu Mike Karo mek@ahc.umn.edumek@ahc.umn.edu Tim Kunau kunau@ahc.umn.edukunau@ahc.umn.edu


Download ppt "The Center for Computational Genomics and Bioinformatics Christopher Dwan Mike Karo Tim Kunau."

Similar presentations


Ads by Google