Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown.

Slides:



Advertisements
Similar presentations
Sanjay Agrawal Microsoft Research Surajit Chaudhuri Microsoft Research Gautam Das Microsoft Research DBXplorer: A System for Keyword Based Search over.
Advertisements

CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Indexing DNA Sequences Using q-Grams
Hashing as a Dictionary Implementation
Advanced Data Structures
Jeroo Chapter 3 Problem Solving and Algorithms. Problem Solving and Algorithms  The story of Aunt Kay  A computer is a tool used to solve problems 
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
1 Advanced Data Structures. 2 Topics Data structures (old) stack, list, array, BST (new) Trees, heaps, union-find, hash tables, spatial, string Algorithm.
Distributed Systems CS Naming – Part II Lecture 6, Sep 26, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.
DOMAIN NAME SYSTEM. Introduction  There are several applications that follow client server paradigm.  The client/server programs can be divided into.
Conceptual Architecture of PostgreSQL PopSQL Andrew Heard, Daniel Basilio, Eril Berkok, Julia Canella, Mark Fischer, Misiu Godfrey.
MS Access: Database Concepts Instructor: Vicki Weidler.
Data Structures Introduction Phil Tayco Slide version 1.0 Jan 26, 2015.
A Randomized Approach to Robot Path Planning Based on Lazy Evaluation Robert Bohlin, Lydia E. Kavraki (2001) Presented by: Robbie Paolini.
Distributed Deadlocks and Transaction Recovery.
1 Pattern Matching Using n-grams With Algebraic Signatures Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas Schwarz[2] [1] Université Paris Dauphine.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Chapter 17 Domain Name System
IP Address Lookup Masoud Sabaei Assistant professor
CMPE 421 Parallel Computer Architecture
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
ADOBE CAPTIVATE Rapid e-Learning Development. What is Captivate  Captivate is a program that lets you build assessments into your presentations  It.
Database Management. ICT5 Database Administration (DBA) The DBA’s tasks will include the following: 1. The design of the database. After the initial design,
Solving Systems of 3 or More Variables Why a Matrix? In previous math classes you solved systems of two linear equations using the following method:
Introduction  Client/Server technology is seen by many as the solution to the difficulty of linking together the various departments of corporation.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
CS 162 Intro to Programming II Searching 1. Data is stored in various structures – Typically it is organized on the type of data – Optimized for retrieval.
Siebel 8.0 Module 5: EIM Processing Integrating Siebel Applications.
1 Pattern Matching Using n-gram Sampling Of Cumulative Algebraic Signatures : Preliminary Results Witold Litwin[1], Riad Mokadem1, Philippe Rigaux1 & Thomas.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Comp 335 File Structures Hashing.
Week 7 : Chapter 7 Agenda SQL 710 Maintenance Plan:
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Advanced Computer Architecture & Processing Systems Research Lab Framework for Automatic Design Space Exploration.
CS 206 Introduction to Computer Science II 09 / 18 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 01 / 30 / 2009 Instructor: Michael Eckmann.
Privacy-Preserving Location- Dependent Query Processing Mikhail J. Atallah and Keith B. Frikken Purdue University.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
Doug Raiford Phage class: introduction to sequence databases.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
University of Macau Faculty of Science and Technology Programming Languages Architecture SFTW 241 spring 2004 Class B Group 3.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Example  Software for a virtual library (borrowing books using the Internet) Internet terminal DB.
Statistics Monitor of SPMSII -High level and detailed design Warrior Team Pu Su Heng Tan Kening Zhang.
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
1 Chapter 4 Unordered List. 2 Learning Objectives ● Describe the properties of an unordered list. ● Study sequential search and analyze its worst- case.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
M M Waseem Iqbal.  Cause: Unverified/unsanitized user input  Effect: the application runs unintended SQL code.  Attack is particularly effective if.
Domain Name System: DNS To identify an entity, TCP/IP protocols use the IP address, which uniquely identifies the Connection of a host to the Internet.
Microsoft Office Access 2010 Lab 3
Indexes By Adrienne Watt.
Function Tables.
LEARNING OBJECTIVES O(1), O(N) and O(LogN) access times. Hashing:
Net 323 D: Networks Protocols
Objective of This Course
A Penguin Attack AI Jason Buck CS
Architecture Competency Group
Distributed Systems CS
Distributed Systems CS
Data Structures Introduction
Objectives In this lesson, you will learn to:
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown

The Problem  An exhaustive search of proteins against a known database  Each string is between 400 and 600 characters long  Comparing a 10,000 strings against 1,000,000 random strings would take 44 days with a 1Ghz processor

Why an exhaustive search?  Initial intent was to analyze proteins to determine 3-Dimensional structure  Exhaustive search is required to ensure that the match that found is the best match

Solution  Distribute the search among many PCs to obtain an answer faster.  Solution raises more problems, however

How to Distribute  Distributing search strings is not enough  Also distribute the search space  Must find efficient way to distribute 1GB of data without duplication

Program Details  Client/Server architecture  Uses proprietary protocol over TCP/IP to distribute data  Server uses SQL database to store a list of ‘jobs’ and ‘known’ sequences

Program Details  Server issues a single ‘job’ to each client upon request  Client may also request a batch of data for comparison.  Server marks which data has been sent to clients and avoids resending that data to new clients

The Server

The Client

Problems  Server is slow at updating its database.  This is only seen once for each client however.

Performance Analysis  1 client = 44 days (best algorithm)  2 clients = 26 days  3 clients = 19 days  Adding more than 1 client increases time almost linearly, though distribution is expensive

Graphs

Graphs

Graphs

Graphs

Notes about the graphs  Graphs do not include initial distribution, since this is only done once per client  If search data distribution were to be included, efficiency would start at about 70% and increase to ~90% over time

Verifying Data Accuracy  Add an entry into the search table with a known score  Ensure that the result returned by the client is the known entry in the database

Lessons Learned  Reading and writing single elements to an SQL database can be very expensive.  Even the best designs aren’t perfect, especially when the problem is not fully understood.

Notes  Biggest problem was distribution of data  Distribution was very costly, so we tried to reuse data that we already distributed  Program is pluggable, so any comparison algorithm can be used

Question/Comments