Presentation is loading. Please wait.

Presentation is loading. Please wait.

Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by.

Similar presentations


Presentation on theme: "Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by."— Presentation transcript:

1 source: Alex Szalay

2 Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by querying the database, not through a zero-sum wrestling match for telescope time Managed by an RDBMS (MS SQL Server), equipped with a hierarchical triangular mesh index, among other customizations 15 TB in the final release in 2007 818 GB in the RDBMS (13.6B tuples)

3 source: Alex Szalay

4 Drowning in data; starving for information Acquisition eventually outpaces analysis  Medicine: Online publishing, digital charts  Astronomy: Big telescopes (more in a bit)  Genetics: PCR, Shotgun Sequencing  Oceanography: ??  Marine Microbiology: ?? Empirical X  Analytical X  Computational X  X-informatics “Increase Data Collection Exponentially in Less Time, with FlowCAM”

5 Cyber-Observatories Arctic Observing Network (AON) Ocean Observing Initiative (OOI) National Ecological Observatory Network (NEON) The Waters Network The Long-Term Ecological Research (LTER) network The Geosciences Network (GEON) Earthscope/Incorporated Research Institutions for Seismology (IRIS) Virtual Solar-Terrestrial Observatory (VSTO) Linked Environments for Atmospheric Discovery (LEAD)

6 source: Alex Szalay

7 source: Jim Gray

8 Relational Databases (In Codd we Trust…) At IBM Almaden in 60s and 70s, Ted Codd worked out a formal basis for tabular data representation, organization, and access 1. The early systems were buggy and slow (and sometimes reviled), but programmers only had to write 5% of the code the previously did. Key Idea: Programs that manipulate tabular data exhibit an algebraic structure; proposed a relational algebra to manipulate these data sets in their logical form, indpendently of their physical representation 1 E. F. Codd, “A Relational Model of Data for Large Shared Data Banks”, Communications of the ACM 13(6), pp 377-387, 1970 phsyical data independence logical data independence

9 source: Raghu Ramakrishnan

10 Characteristicsof Cloud Computing Virtual – Physical location and underlying infrastructure details are transparent to users Scalable – Able to break complex workloads into pieces to be served across an incrementally expandable infrastructure Efficient – “Services Oriented Architecture” for dynamic provisioning of shared compute resources Flexible – Can serve a variety of workload types – both consumer and commercial

11 Cloud Computing as Hosted Data Management Services Yahoo  Yahoo Distributed Hash Tables: Key/value pairs  Yahoo Distributed Ordered Tables: Ordered ranges  PNUTS: Relational-style storage, indexing and query Amazon  S3: Simple Storage  SimpleDB: Quasi-Relational features Google  APIs for: Storage, Visualization, Document processing, Images, Mail Microsoft:  CloudDB: Relational-style features

12 Workflow at CMOP Cloning/ cDNA/… Sequencing plates Inspection FASTA files OHSU Washington University BLAST FASTA files PNNL Post processing Hit tables Cleaning e.g., trim bad reads at the end Link Shared Knowledge Analyze synopsis Cloud Hit tables + metadata


Download ppt "Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by."

Similar presentations


Ads by Google