Presentation is loading. Please wait.

Presentation is loading. Please wait.

Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

Similar presentations


Presentation on theme: "Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,"— Presentation transcript:

1 Non-Traditional Databases

2 Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad, Randal Burns, Michael Kazhdan, Charles Meneveau, Alex Szalay, Andreas Terzis, February 2011 SIGMOD Record, Volume 39 Issue 3, http://dl.acm.org/citation.cfm?id=1942776.1942782&coll =DL&dl=ACM&CFID=66206057&CFTOKEN=48992457 http://dl.acm.org/citation.cfm?id=1942776.1942782&coll =DL&dl=ACM&CFID=66206057&CFTOKEN=48992457 http://dl.acm.org/citation.cfm?id=1942776.1942782&coll =DL&dl=ACM&CFID=66206057&CFTOKEN=48992457 2. Migrating a (large) science database to the cloud Ani Thakar, Alex Szalay, June 2010 HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, http://dl.acm.org/citation.cfm?id=1851539&bnc=1 http://dl.acm.org/citation.cfm?id=1851539&bnc=1 FarkasCSCE 824 - Spring 20112

3 Reading 3.M. Stonebaker, U. Cetintemel, One Size Fits All": An Idea Whose Time Has Come and Gone, in Proceeding of CDE '05 Proceedings of the 21st International Conference on Data Engineering, IEEE Computer Society Washington, DC, USA, 2005, http://www.computer.org/portal/web/csdl/abs/pro ceedings/icde/2005/2285/00/22850002abs.htm http://www.computer.org/portal/web/csdl/abs/pro ceedings/icde/2005/2285/00/22850002abs.htm http://www.computer.org/portal/web/csdl/abs/pro ceedings/icde/2005/2285/00/22850002abs.htm FarkasCSCE 824 - Spring 20113

4 Traditional Database Management Systems Focus on business data management Focus on business data management Provide uniform capabilities regardless of the data characteristics Provide uniform capabilities regardless of the data characteristics Need: capabilities to meet new application requirements Need: capabilities to meet new application requirements FarkasCSCE 824 - Spring 20114

5 Examples of New Needs Stream Data Processing Stream Data Processing Large scale scientific databases Large scale scientific databases Data warehousing Data warehousing FarkasCSCE 824 - Spring 20115

6 Streaming Data Sensor-based applications Sensor-based applications –Real-time systems: sophisticated alerting, location-based services, –Historical data Financial applications Financial applications –Support applications, such as electronic trading, legal compliance, real-time marker analysis, etc. Performance requirements Performance requirements FarkasCSCE 824 - Spring 20116

7 Performance SDMS vs. RDMS Empirical results (see reference paper #3) Empirical results (see reference paper #3) Issues: Issues: –Inbound processing model –Correct primitives for stream processing (aggregates, “timeout,” “slack”) –Seamless integration of DBMS processing with application processing (client-server vs. embedded applications) –Transactional behavior (weaker notion of recovery, tolerance, no ACID requirements) FarkasCSCE 824 - Spring 20117

8 Security for Streaming Data? What is the difference between the security needs of streaming vs. traditional (e.g., relational) data? What is the difference between the security needs of streaming vs. traditional (e.g., relational) data? How to enforce security? How to enforce security? –Security punctuation FarkasCSCE 824 - Spring 20118

9 Scientific Databases Massive amount of data Massive amount of data Heterogeneous data Heterogeneous data –Sensor data, satellite, scientific simulation data, etc. Goal: better understanding of physical phenomena Goal: better understanding of physical phenomena –Genomic database, geological exploration, astronomy, etc. FarkasCSCE 824 - Spring 20119

10 Scientific Databases Need efficient analysis and querying capabilities Need efficient analysis and querying capabilities –Multi-dimensional indexing (e.g., genomic sequence indexing) –Specific applications (e.g., visualization of seismic data) –Specific aggregations (e.g., data mining for biological correlation) –Efficient data archiving, staging, lineage, and error propagation techniques FarkasCSCE 824 - Spring 201110

11 Example Scientific Data Management Reference #1 Reference #1 Basic research: Basic research: 1.formation of hypotheses and theories 2.designing experiments for their validation 3.collecting data by experimentation 4.analyzing data to guide new insights for further research FarkasCSCE 824 - Spring 201111

12 Scientific Computing Steps 3 and 4 are data intensive Steps 3 and 4 are data intensive Need to improve computational power Need to improve computational power –Parallel processing –Grid and supercomputers –Special application logic –Preservation of scientific data FarkasCSCE 824 - Spring 201112

13 Current Technologies and Scientific Databases Reference #2: How to migrate large scale scientific database to cloud environment? Reference #2: How to migrate large scale scientific database to cloud environment? Difficult engineering process Difficult engineering process Limited capabilities of database user Limited capabilities of database user Based on commercial cloud Based on commercial cloud FarkasCSCE 824 - Spring 201113

14 Data Warehousing Repository of data providing organized and cleaned enterprise- wide data (obtained form a variety of sources) in a standardized format Repository of data providing organized and cleaned enterprise- wide data (obtained form a variety of sources) in a standardized format –Data mart (single subject area) –Enterprise data warehouse (integrated data marts) –Metadata FarkasCSCE 824 - Spring 201114

15 Data Warehousing Difference between OLTP and OLAP Difference between OLTP and OLAP Data management: updates, indexing, dependencies, etc. Data management: updates, indexing, dependencies, etc. OLAP: needs Read Optimized storage OLAP: needs Read Optimized storage FarkasCSCE 824 - Spring 201115

16 FarkasCSCE 824 - Spring 201116 Next Class Geographical Databases


Download ppt "Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,"

Similar presentations


Ads by Google