Presentation is loading. Please wait.

Presentation is loading. Please wait.

The NoSQL movement or the dawn of the post-relational age.

Similar presentations

Presentation on theme: "The NoSQL movement or the dawn of the post-relational age."— Presentation transcript:

1 The NoSQL movement or the dawn of the post-relational age

2 What is the buzz? Job Trends Search Trends Twitter search

3 Something for your CV

4 NoSQL Not only SQL or No Sql - No SQL support Support for the full SQL language imposes constraints on datastores. So does ACID compliance. So does the need for a fixed database schema. Many applications need more specialised datastores. A movement for choice in database architecture CouchBase survey Mike Loukides at O'ReillyMike Loukides at O'Reilly an excellent overview Polyglot Persistance by Martin Fowler Wikipedia Comparision - a rather terrifying set of resources. Tim Anglade's compilation of Interviews

5 NoSQL is not new Despite the wide-spread adoption of the relational data model for business application, there have always been a wide variety of specialised databases: Geographic Information Systems - complex spatial relationships - ArcGIS e.g. BCC KnowYourPlaceArcGIS OLAP - OnLine Analytic Processing - for analysis of transaction data Free Text databases eg. LexisNexis for legal documents Multi-dimensional sparse arrays - Pick and MUMPS Object-oriented databases - eg ZOPE for the Plone CMS These databases were directed at the need for complex and flexible data structures.

6 Forces for change Volume of data - Facebook has over 30 Petabytes - 30,000 terabytes or 30 million Gigabytes Volume of transactions - order of 1 million writes/sec Changeability/flexibility of schema - constant beta Complexity of data - UK Legislation

7 Use case: Terabytes of data need to be stored reliably with no schema requirements Reliability is a big problem when volumes are large. In a farm of say, 1000 servers, each with 8 spindles, there is a high probability that one disk will be down at any time. Random access update is too slow - append new data and merge in batch BigTable from Google HBase from Apache Dynamo from Amazon Doug Cutting on Apache's Hadoop

8 Use case: Batch data analysis Where very large transaction datasets need to be filtered and summarised, for example to analysis log files by IP location. In the past these could have been overnight jobs,now they need to be done in at most minutes. Map-Reduce is an architecture for large-scale distributed computation. MapReduce should be called MapMergeReduce. Each MapReduce task is written in Java (or a high-level language like Pig). The operating system (like Hadoop) coordinates the distribution of the map, merge and reduce jobs and the dataflows. input is a database of key-value pairs which are split ('sharded') over many spindles on many servers. the user's map operation runs on every server hosting the shards and transforms each key/value input into 0,one or more key/value outputs. Merge (shuffle) merges all pairs for the same key and distributes them (e.g. by hashing the keys) to multiple Reduce servers. This to can be user configurable. the user's reduce takes each group of values for the same key and produces zero, one or more key/values for each group. Successive MapMergeReduce operations can be chained together in a pipeline.

9 Use case: Document storage and retrieval Document store Complex hierarchical documents present problems for storing in a relational database. Every repeated part of the document would stored in its own table -Shredding; each repeated part would need to be link to is parent with a key; to reconstruct the document would require multiple joins from data distributed all over the file system. Platforms: eXist eXist open source XML store - query with XQueryeXist MarkLogic MarkLogic commercial XML storeMarkLogic CouchDb JSON store - query with JavaScriptCouchDb MongoDb JSON store Telemetric data precessingMongoDbTelemetric data precessing

10 Use case: Fast put/get of keyed data Key-value store Where complex data is to be stored but the database is not interested in the internal structure. For example storing session data, user profiles, shopping carts The only operations are value = store.get(key) store.put(key, value) store.delete(key) Platforms: Project Voldemort Rhino

11 Use case: Page Caching Key-value cache Where the generation of a page takes a significant time, it is better to cache the pages as key/value pairs where the key is a URI and the value is the HTML page. As much of the cache as poosible is kept in RAM for rapid access Issues: cache flushing For example this site views summarized data from an eXist document store: AidViewAidView Platforms: Memecached

12 Use case: Linked data Graph Database Where data is composed of simple, highly interrelated facts. For example, there is an RDF version of Wikipedia called dbpedia. Some use available databases such as MySQL, but the specific form of the data and the queries on the data suggest native Triple (usually quad) stores to support RDF - Jena, Sesame Virtuoso- query with SPARQL. RDF has a rigid data model : [graph] subject- predicate- object and is widely used for linked dataJenaSesame Virtuoso Custom Graph stores - Neo4J non standard interfacesNeo4J

13 XML/XQuery for graphs tutorial for using Neo4j to compute relationships in a graph Friends relationship Some friends as XML a bit of XQuery The knows relationship expanded Permissions People Roles a bit of XQuery People and permissions Shortest Path is difficult - Dijkstra's algorithm is tricky to implement in functional languagesDijkstra's algorithm

14 Dan McCreary's Overview The CIO's Guide to NoSQL

15 Risks Lack of standardisation New technology Design cul-de-sac - requirements change Lack of available developer skills. R DMBS like Oracle and SQL Server are changing too - but just get more complex. A dissenting view - warning - NSFW

Download ppt "The NoSQL movement or the dawn of the post-relational age."

Similar presentations

Ads by Google