Download presentation
Presentation is loading. Please wait.
Published byJeffrey Martin Modified over 9 years ago
1
source: Alex Szalay
2
Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by querying the database, not through a zero-sum wrestling match for telescope time Managed by an RDBMS (MS SQL Server), equipped with a hierarchical triangular mesh index, among other customizations 15 TB in the final release in 2007 818 GB in the RDBMS (13.6B tuples)
3
source: Alex Szalay
4
Drowning in data; starving for information Acquisition eventually outpaces analysis Medicine: Online publishing, digital charts Astronomy: Big telescopes (more in a bit) Genetics: PCR, Shotgun Sequencing Oceanography: ?? Marine Microbiology: ?? Empirical X Analytical X Computational X X-informatics “Increase Data Collection Exponentially in Less Time, with FlowCAM”
5
Cyber-Observatories Arctic Observing Network (AON) Ocean Observing Initiative (OOI) National Ecological Observatory Network (NEON) The Waters Network The Long-Term Ecological Research (LTER) network The Geosciences Network (GEON) Earthscope/Incorporated Research Institutions for Seismology (IRIS) Virtual Solar-Terrestrial Observatory (VSTO) Linked Environments for Atmospheric Discovery (LEAD)
6
source: Alex Szalay
7
source: Jim Gray
8
Relational Databases (In Codd we Trust…) At IBM Almaden in 60s and 70s, Ted Codd worked out a formal basis for tabular data representation, organization, and access 1. The early systems were buggy and slow (and sometimes reviled), but programmers only had to write 5% of the code the previously did. Key Idea: Programs that manipulate tabular data exhibit an algebraic structure; proposed a relational algebra to manipulate these data sets in their logical form, indpendently of their physical representation 1 E. F. Codd, “A Relational Model of Data for Large Shared Data Banks”, Communications of the ACM 13(6), pp 377-387, 1970 phsyical data independence logical data independence
9
source: Raghu Ramakrishnan
10
Characteristicsof Cloud Computing Virtual – Physical location and underlying infrastructure details are transparent to users Scalable – Able to break complex workloads into pieces to be served across an incrementally expandable infrastructure Efficient – “Services Oriented Architecture” for dynamic provisioning of shared compute resources Flexible – Can serve a variety of workload types – both consumer and commercial
11
Cloud Computing as Hosted Data Management Services Yahoo Yahoo Distributed Hash Tables: Key/value pairs Yahoo Distributed Ordered Tables: Ordered ranges PNUTS: Relational-style storage, indexing and query Amazon S3: Simple Storage SimpleDB: Quasi-Relational features Google APIs for: Storage, Visualization, Document processing, Images, Mail Microsoft: CloudDB: Relational-style features
12
Workflow at CMOP Cloning/ cDNA/… Sequencing plates Inspection FASTA files OHSU Washington University BLAST FASTA files PNNL Post processing Hit tables Cleaning e.g., trim bad reads at the end Link Shared Knowledge Analyze synopsis Cloud Hit tables + metadata
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.