Download presentation
Presentation is loading. Please wait.
Published byAndrew Walker Modified over 9 years ago
1
FROM NOSQL TO NEWSQL TO IMMUTABLE DATABASES Prof. Stefan Keller Geometa Lab University of Applied Science Rapperswil (Switzerland) 19th AGILE Helsinki, June 14, 2016 Pre-Conference Workshop „GIS with NoSQL“
2
From NoSQL to NewSQL to Immutable Databases 1. NoSQL 2. NewSQL 3. What is special about spatial? 4. Immutable databases 5. Use Cases June 12, 2016: From NoSQL to NewSQL to Immutable Databases 2
3
Disclaimer Full Professor for Information Systems Background in Geography and Computer Science University of Applied Sciences Rapperswil HSR (Switzerland) since year 2000 Department Informatics Lectures on Databases and Geoinformationsystems Geometa Lab at Institute for Software at HSR Geoinformationsystems (GIS), Data Curation, Open Data (OpenStreetMap), Data Engineering, Geo-Visualization Disclaimer: I‘m … the local organiser of Swiss Postgres Conf. in Rapperswil (pgday.ch) member no.7 of the Swiss PostgreSQL Users Group June 12, 2016: From NoSQL to NewSQL to Immutable Databases 3
4
Reasons for NoSQL Demands from Web 2.0 and Big Data Huge amount of data („Big Data“) to store High availabilty Rapid SW development, cutting costs (Open Source) In come use cases many write operations, in others many read (only) Definition of „Store“: A database management system in a broader sense Definitions of Big Data: The 4 V‘s (IBM, Zikopoulos & Eaton 2011) Volume: Scale of data Variety: Different forms of data Velocity: Analysis of streaming data Veracity: Uncertainty of data June 12, 2016: From NoSQL to NewSQL to Immutable Databases 4
5
Looking at Contemporary GI Research Geo-Visualization incl. decision making Crowd Sourced Data Tracks Social Media Virtual Geographic Information (OpenStreetMap) Just Big Geo Data Imagery Spatio-Temporal Data OpenStreetMap June 12, 2016: From NoSQL to NewSQL to Immutable Databases 5
6
Problems and Solutions Problems of RDBMS on logical layer: Impedance Missmatch: O/R Mapping Schema changes Problems of RDBMS on logical layer: Availability Scaling (Scale Up) Solution Approaches Scaling out Distribute Parallelize Simple to use for developers June 12, 2016: From NoSQL to NewSQL to Immutable Databases 6
7
NoSQL Databases: Name and Categories Better name for NoSQL would be „Not only SQL“ Categories: 1. Column Oriented Stores 2. Graph Stores 3. Key/Value Stores 4. Document Stores 5. Others Not mentioned in common NoSQL literatures: Object Oriented Stores (zoodb) Array Stores In-Memory Stores June 12, 2016: From NoSQL to NewSQL to Immutable Databases 7
8
1. Column Oriented Stores Column Oriented Databases, Wide Column Stores Each key is associated with multiple attributes (i.e. Columns) Nice properties for compression and indexing
9
2. Graph Databases Inspired by Euler Graph Theory G = (E,V) Relationship as a first class object Nice fit for Semantic Nets (RDF Property Graph): Nodes (e.g. Person) and Relationships (e.g. lives, ikes, owns, …) Properties on both NoSQL
10
3. Key/Value Databases Key/Value Pairs Key-value pairs (KVP), dictionary or associative array, map, hash An Abstract Data Structure (ADT). Most modern scripting languages support dictionaries/associative arrays as a primary container type In memory or on disk Nice fit for OpenStreetMap (VGI)
11
4. Document Stores Similar to Key/Value databases but value is a document Document has Identifier Nested structure Fields can habe data types (number, string, more?) Flexible Schema Every document can have different fields Fields can be added Document often stored in JSON or BSON formats
12
NoSQL – Common Properties Open source roots Often non-relational, „schema-less“ Polyglot persistence High Availability Simple to use for developers: Simple API, REST/HTTP Javascript/JSON Easy configuration (“Zero”, Defaults) June 12, 2016: From NoSQL to NewSQL to Immutable Databases 12
13
High Availability and Scalability Availability: BASE instead ACID ACID = Atomicity, Consistency, Isolation, Durability BASE = Basically Available, Soft state, Eventual consistency => Favor availability and partition tolerance over consistency (CAP theorem) But BASE is not the only solution! Still recommended is Replication: Servers can work together to allow a second server to take over if the primary server fails Scalability Caching Load-balancing: Servers (clusters) can allow several computers to serve the same data => Performance: A complex topic! See next… June 12, 2016: From NoSQL to NewSQL to Immutable Databases 13
14
Better Performance (DeWitt, Madden and Stonebraker, 2006) Better Performance… 1. through hardware (HW) acceleration 2. through parallelism 3. through software The problem Large volume of data uses disk and large main memory I/O bottleneck (or memory access bottleneck) speed(disk) << speed(ram) << speed(CPU) The opportunities Many CPUs / cores; GPU, Cheap disk and memory June 12, 2016: From NoSQL to NewSQL to Immutable Databases 14
15
A Solution (VoltDB) SQL access from within pre-compiled Java stored procedures (=unit of transaction) in memory single thread without locking Horizontal partitioning down to the individual hardware thread to scale Synchronous replication for high availability Continuous snapshots and command logging for durability (crash recovery) Note: Problem of benchmarking Many benchmarks available „The HSR Texas Geospatial Database Benchmark“ June 12, 2016: From NoSQL to NewSQL to Immutable Databases 15
16
Aggregation Databases (“NoSQL Distilled”, Sadalage&Fowler, 2012) „Aggregation Databases“: Dictionnaries => Key/Value stores Nested structures => Document stores Array structures => Array stores Aggregate: A notion from Domain Driven Design: “… a collection of related objects that we wish to treat as a unit” Unit: for data manipulation, for consistency, for storage Properties of aggregations/collections RDBMS’s have no concept of aggregates Aggregates reduce the need for ACID Aggregates maintain relationships naturally Good for clusters, can be distributed easily June 12, 2016: From NoSQL to NewSQL to Immutable Databases 16
17
NoSQL – Intermediate summary Arrays (not explicitly mentioned by Sadalage & Fowler) Multi-Dimensional Arrays SQL/MDA Some databases support Multi-Dimensional Arrays Summary: NoSQL databases mark the end of the era of relational database But NoSQL databases will not become the new dominators Relational will still be popular, and used in the majority of situations Relational however, will no longer be the automatic choice June 12, 2016: From NoSQL to NewSQL to Immutable Databases 17
18
NewSQL: Who said you can‘t be big and nimble? Folie 18 Prof. Stefan Keller, 2015 © CC-BY The "Surfing Elephant" - a (commercial) video: https://vimeo.com/61706961https://vimeo.com/61706961
19
NewSQL - Weaknesses No SQL, no common / standardized language No database join (*) No transaction (ACID) Schema-less => means implicit schema in code Long running processes (Eliot Horowitz 2014, CTO MongoDB) (*)Remark on Joins: There‘s lookup or mergeCursors (MongoDB) But better use aggregate structures RDBMS to the rescue! Database evolution! June 12, 2016: From NoSQL to NewSQL to Immutable Databases 19
20
NewSQL – The PostgreSQL case PostgreSQL 9.5! JSON / JSON Binary Key/Value Type (hstore) Window Functions for temporal queries MongoDB connects to PostgreSQL 2015: PostgreSQL pulls data out of MongoDB (mongoose_fdw) Greenplum (data warehouse) and cstore (Column Store) became open source PostgreSQL near future: “Plugin” mechanism for data types and indexes New „Block Range Index“ (BRIN) Distribution of processing / parallelization: Sequential scans, joins and aggregates Consistent, read-scaling clusters Execute sorts, joins, UPDATE, DELETE, on remote postgreSQL server (postgres_fdw ) June 12, 2016: From NoSQL to NewSQL to Immutable Databases 20
21
What is special about spatial? “Spatial is not special!” (James Fee 2009) „GIS Software“ is not special - But Geoinformation is! E.g. Constraints (Autocorrelation) Remember this definition of 4 V‘s? Volume: Scale of data Variety: Different forms of data (Velocity: Analysis of streaming data) Veracity: Uncertainty of data Geoinformation has always been Big Data - at least parts of! We combine Geographic Information and Computer Science June 12, 2016: From NoSQL to NewSQL to Immutable Databases 21
22
Looking at Contemporary GI Research Geo-Visualization Crowd Sourced Data Tracks Social Media Virtual Geographic Information (OpenStreetMap) Just Big Geo Data Imagery OpenStreetMap … applying Computer Science, Data Science, Mathematics, Statistics… June 12, 2016: From NoSQL to NewSQL to Immutable Databases 22
23
Architectural and Type implementation choices Architectural (pattern): Split manangement (inserts/updates) and queries => see later Data type choices Affects both logical and physical layer of a store What is a data type anyway? Defines possible values and operations on it (including how values are stored and indexed). More implementation oriented: Syntax, Constructor(s), Type casts, Operators, Accessors, functions and helper functions – and Indexes Data type candidates: N-Dim Arrays, Pointcloud, Vector Tiles June 12, 2016: From NoSQL to NewSQL to Immutable Databases 23
24
A partial survey of stores with geo support June 12, 2016: From NoSQL to NewSQL to Immutable Databases 24 NameIndex strategyData typesQuery types Amazon DynamoDB geohashpointBBOX, radius GeoCouch (CouchDB/Couchbase) R-treepoint/line/polyBBOX, radius IBM Cloudant (CouchDB) R*-treeGeoJSON typesBBOX, radius, arbitrary shape Lucene/Solr geohashpoint (JTS adds more) BBOX, radius (JTS adds polygon) Orchestrate.io geohashpointBBOX, radius Microsoft DocumentDB --- MongoDB geohash/quadtreeGeoJSON typesBBOX, radius, arbitrary shape "The NoSQL Geo Landscape“, Raj Singh, IBM 2014, slideshare.net
25
Immutable Databases Splitting architecture for inserts/updates and queries! Search engines (e.g. Lucene Solr/ElasticSearch) for queries alongside with an RDMBS for inserts/updates Immutable Databases Writes are not update-in-place all data is retained by default Provides built-in auditing and the ability to query history Immutable data means strong consistency combined with horizontal read scalability plus built-in caching. Again: Evolution versus Revolution? June 12, 2016: From NoSQL to NewSQL to Immutable Databases 25
26
Immutable Databases Use Case 1: Analysing log files from OpenStreetMap views Use Case 2: Vector Tiles June 12, 2016: From NoSQL to NewSQL to Immutable Databases 26
27
UC1: OpenStreetMap Hotspots and Trends (Big Data, Geo-Viz) Folie 27 Prof. Stefan Keller, 2015 © CC-BY Map Tile Views of www.osm.org (aka G* Maps) in Switzerland January 2014 – March 2015 http://bit.ly/OSM_Tile_Views_CH_Timeline_2014 Lukas Martinelli+ Geometa Lab at HSR
28
UC1: Trending Places in OpenStreetMap Prof. Stefan Keller, 2015 © CC-BY Seite 28 Bot tweeting daily about trending places in #OpenStreetMap worldwide with 2 days delay. A @GeometaLab experiment
29
UC2 - Vector Tiles Tiling vector data Natural partition schema, ready for parallelization Geometry Compression scheme Applied by Google, Mapbox (Mapbox Vector Tiles), and Esri Vector Tiles New data type? Vector or Raster/Grid? Also good for analysis? See QA Tiles by MapBox Delivering world wide OpenStreeMap data OSM2VectorTiles.org – A project by Klokan Tech. and Geometa Lab HSR Vector tiles worldwide from OpenStreeMap Regular updates using PostgreSQL June 12, 2016: From NoSQL to NewSQL to Immutable Databases 29
30
Recommendation Use modern Software Development technologies Encapsulate data access through services Apply continuous integration and test Release early and often Choose the right tool for a problem: Use different data storage technology for varying needs Use different architectures to different for different needs GI specific References and Inter-aggregate relations are still hard to maintain. So use spatial relationship if possible instead of links (Semantic Net!) June 12, 2016: From NoSQL to NewSQL to Immutable Databases 30
31
Looking at Contemporary GI Research Revisited Geo-Visualization incl. decision making Crowd Sourced Data Tracks Social Media Virtual Geographic Information (OpenStreetMap) Just Big Geo Data Imagery Spatio-Temporal Data OpenStreetMap … applying Computer Science, Data Science, Cognitive, Mathematics, … June 12, 2016: From NoSQL to NewSQL to Immutable Databases 31
32
Key Points There‘s more to come from Computer Science through better performance with hardware, parallelism and (mainly) software: Polyglot persistence! New data types New Databases Many GIS aspects have been Big Data since beginning There‘s more to come from GIS => Database technology and GIS has never so exciting as now! June 12, 2016: From NoSQL to NewSQL to Immutable Databases 32
33
Prof. Stefan Keller University of Applied Science Rapperswil (Switzerland) www.gis.ch/geometalab DISCUSSION
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.