Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Big Data” - Technical Architecture Roni Schuling - Enterprise Architecture Tom Scroggins – IS Domain Architecture Principal Financial Group.

Similar presentations


Presentation on theme: "“Big Data” - Technical Architecture Roni Schuling - Enterprise Architecture Tom Scroggins – IS Domain Architecture Principal Financial Group."— Presentation transcript:

1 “Big Data” - Technical Architecture Roni Schuling - Enterprise Architecture Tom Scroggins – IS Domain Architecture Principal Financial Group

2 “Big Data” - Technical Architecture Foundational Definitions & where these technologies came from Big Data NoSQL Hadoop Business & Technical Drivers How they are being used in many companies Predictions for the future Challenges & Obstacles Questions AGENDA

3 “Big Data” - Technical Architecture Big data is an evolving term that describes any voluminous amount of structured, semi- structured and unstructured data that has the potential to be mined for information. Big data can be characterized by 3Vs: the extreme volume of data, the wide variety of types of data and the velocity at which the data must be must processed. There are many other aspects as well such as: Viscosity, Complexity, Ambiguity. Foundational Definition – Big Data Data in a corporation that cannot be processed using traditional data management techniques and technologies can be broadly classified as Big Data.

4 “Big Data” - Technical Architecture

5 Big Data ≠ Hadoop Hadoop & NoSQL are key technologies for working with Big Data effectively. Big Data ≠ NoSQL Hadoop ≠ NoSQL

6 “Big Data” - Technical Architecture

7 NoSQL database, also called Not Only SQL, is an approach to data management and database design that's useful for very large sets of distributed data. NoSQL seeks to solve the scalability and big data performance issues that relational databases weren’t designed to address. NoSQL is especially useful when an enterprise needs to access and analyze massive amounts of unstructured data or data that's stored remotely on multiple virtual servers in the cloud However - NoSQL is not just about Big Data Foundational Definition - NoSQL

8 “Big Data” - Technical Architecture Where this technology came from - NoSQL 2010 2014+ Flat Files Rise of Relational Databases Rise of Object Databases Relational Database Dominance Polygot Persistence Enterprise will have a variety of different data storage technologies for different kinds of data & application needs 2005 Document DB Inspired by Lotus Notes Need to Store Tabular Data in Distributed System Key Value Store Replicate Data during 24x7 Availability 2007 1970198019902000 Many Innovators In The 2005 to 2010 Timeframe

9 “Big Data” - Technical Architecture Market view of what’s out there – we do NOT have all of these at PFG today. There are over 150 NoSQL databases in the market – these are just a few of the top ones.

10 “Big Data” - Data Architecture at PFG Hadoop is a open source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative. Foundational Definition - Hadoop

11 “Big Data” - Data Architecture at PFG Where this technology came from - Hadoop 2010 2014+ 1995 – 2005: Yahoo! Search team builds 4+ generations of systems to crawl & index the WWW. 20 Billion pages! 2005 1995 Yahoo! Staffs ‘Juggernaut’, open source DFS & MapReduce Doug Cutting builds Nutch DFS & MapReduce, joins Yahoo! Juggernaut & Nutch join forces – Hadoop is born! Other Internet companies add tools / frameworks to enhance Hadoop 2006 Service providers step into the market – provide training, support, & hosting ‘Enterprise Grade’ Security Mass Adoption Google publishes Google File System & MapReduce papers 2004 Analytic Tool Interoperability

12 “Big Data” - Technical Architecture The Hadoop Vendor Landscape - 2014

13 “Big Data” - Technical Architecture

14 Business Drivers Provide access to all data needed for analytics (internal or external) Provide the ability to realistically interact with greater ‘depths’ of data – IE: tens of years instead of a couple of months Provide a greater “speed to insight” for all types of requests Lower the total cost of ownership across the enterprise for analytics Allow for exploration of our data in ways we never anticipated to identify differentiating understanding of customers and markets There’s an Imbalance today….

15 Current technical capabilities don’t align with changing expectations “Big Data” - Technical Architecture Technical Drivers -

16 “Big Data” - Technical Architecture NoSQLHadoop Not focused on Big Data….yet Many companies using or at least experimenting with MongoDB Document store for web applications that only needs to persist the content for the lifespan of that interaction. Using NoSQL stores for user preferences to personalize what is presented on a web page for their interaction. Beginning to organization social streams of data Interrogating our web logs to better understand the behavior of people interacting with a website. Merging that semi-structured web activity with other structured legacy data. Massive storage of data for exploration and discovery – often using interoperability with analytic consumption tools. How they are being used today

17 “Big Data” - Technical Architecture Database for web applications that need that speed of development and nimbleness. Layering of NoSQL solutions on top of Hadoop to improve searchability and performance. Exploration of Graph NoSQL solutions for analytics on hierarchical type data. Plans for the future NoSQL Hadoop Expansion of web activity data (more logs, more data in logs, more use cases.) Speech-to-text translation of Call Recordings and text analysis/Natural Language processing to determine call topics and caller sentiment. Extraction of text from documents to aid in analysis. ‘Data Lake’ solutioning – both for ingestion and archive.

18 “Big Data” - Technical Architecture Lake of Data Data Refinery

19 “Big Data” - Technical Architecture Data Refinery

20 “Big Data” - Technical Architecture Many Kinds of data in our organization Conceptually for illustration – not a vetted/approved picture of the PFG environment

21 “Big Data” - Technical Architecture Conceptual Workload Isolation Today… Conceptually for illustration – not a vetted/approved picture of the PFG environment

22 “Big Data” - Technical Architecture Conceptual Workload Isolation in the Future… Conceptually for illustration – not a vetted/approved picture of the PFG environment

23 “Big Data” - Technical Architecture

24 Big Data technologies are broader than just Hadoop & NoSQL – but those are the key starting points for us. Market view of what’s out there – we do NOT have all of these at PFG today.

25 “Big Data” - Technical Architecture Security Governance Clear Use Cases Integration Points Hosting models Challenges and Obstacles to overcome

26 “Big Data” - Technical Architecture Q&A Kapur.Gurwinder@principal.com

27 NoSQL Data Architecture& Best Practices Data View - Overview We are in a Database Revolution Existing paradigms are being challenged o Models o Hardware o Software o Languages Will tweaking current data solutions be enough?

28 NoSQL Data Architecture& Best Practices Data View - Overview

29 NoSQL Data Architecture& Best Practices Data View – Five Data Paradigms

30 Relational Model PROs Most flexible queries & updates Reuse data structures in any context Great DB-to-DB integration Mature tools Standard query language Easy to hire expertise CONs Design-time, static relationships Design-time, static structures: design first then load data Hard to normalize model Requires code to integrate relational data with object-oriented code Cannot query for relevance

31 NoSQL Data Architecture& Best Practices Data View – Five Data Paradigms Dimensional Model PROs Queries facts in context Self-service, ad hoc queries High-performance platforms Mature tools and integration Standard query language Turns data into information CONs Expensive platforms Design-time, static relationships Design-time, static structures: design first then load data Cannot query for relevance Cannot query for answers that are not built into the model

32 NoSQL Data Architecture& Best Practices Data View – Five Data Paradigms What’s wrong (aka challenging) with SQL DB’s?

33 NoSQL Data Architecture& Best Practices Data View – Five Data Paradigms Key Value / Column Family Models PROs Fast puts and gets Massive scalability Easy to shard & replicate Data colocation Simple to model Inexpensive Data in transactional context Developer in control CONs Carefully design key Shred JSON into flat columns Secondary indexes required to query outside of hierarchical key No standard query API or language Hand code all joins in app Immature tools and platform Hard to integrate and hire

34 NoSQL Data Architecture& Best Practices Data View – Five Data Paradigms Document Model PROs Fast development “Schemaless”, run-time designed, rich, JSON and/or XML data structures Queries everything in context Self-service, ad hoc queries Turns data into information Can query for relevance CONs Defensive programming for unexpected data structures Expensive platforms, immature tools, and hard to integrate Non-standard Query Languages, and hard to hire expertise Not as fast as Column-Family / Key-Value databases

35 NoSQL Data Architecture& Best Practices Data View – Five Data Paradigms Graph Model CONs Hard to model at such a low level Hard to integrate with other systems Immature tools Hard to hire expertise Cannot query for relevance because original document context is not preserved PROs Unlimited flexibility – model any structure Run time definition of types & relationships Relate anything to anything in any way Query relationship patterns Standard Query Language (SPARQL) Creates maximum context around data

36 NoSQL Data Architecture& Best Practices Data View.. NoSQL Data Architecture& Best Practices Data View – Five Data Paradigms What’s wrong (aka challenging) with NoSQL DB’s?

37 NoSQL Data Architecture& Best Practices Data View NoSQL Data Architecture& Best Practices Data View – Five Data Paradigms

38 NoSQL Data Architecture& Best Practices Data View Modeling Takeaways Each model has a specialized purpose DimensionalBusiness intelligence reporting and analytics RelationalFlexible queries, joins, updates, mature, standard Column / Key-ValueSimple, fast puts and gets, massively scalable DocumentFast Development, “schemaless” JSON/XML, searchable Graph / RDFModeling anything at runtime including relationships

39 NoSQL Data Architecture& Best Practices Data View.. How much Durability do you need?  Durable data survives system failures & can be recovered after unwanted deletion How much Atomicity do you need?  An atomic transaction is all or nothing, sets of data and/or sets of commands. How much Isolation do you need?  Isolation prevents concurrent transactions from affecting each others. How much Consistency do you need (or when do you need it)?  Consistency exists when data is committed and consistent with all data rules at a point in time. NoSQL Data Architecture& Best Practices Data View – How do you choose? How do you choose?

40 NoSQL Data Architecture& Best Practices Data View.. Can you live with writing advanced code to compensate? o Trusting all developers to properly check for partial transaction failures, current physical layout of the data cluster, and write code to propagate data across the cluster. Can you live with lost data? o No logs, archives, mirroring, etc…. Can you live with accidental deletion of data? o No point in time recovery feature Can you live with scripting your own backup & recovery solutions? NoSQL Data Architecture& Best Practices Data View – How do you choose? Durability

41 NoSQL Data Architecture& Best Practices Data View.. Can you live with modifying single documents at a time? Can you live with partially successful transactions? o You can achieve higher availability because transactions can partially succeed. Can you live with inconsistent and incomplete data? o Is it OK to not know when data anomalies are caused by bugs in your code or are temporarily inconsistent because they haven’t been synchronized yet? Can you live with writing advanced code to compensate? o Custom solutions for atomic rollback, handling of transactions that fail, find & fix inconsistent data. NoSQL Data Architecture& Best Practices Data View – How do you choose? Atomicity

42 NoSQL Data Architecture& Best Practices Data View.. Can you live with modifying single documents at a time? Can you live with inaccurate queries? o Without isolation, query results are inaccurate because concurrent transactions can change data while processing it. Can you live with race conditions and dead locks? Can you live with writing advanced code to compensate? o Your own versioning system, code to hide concurrent updates, inserts and deletes from queries, handle race conditions and deadlocks. NoSQL Data Architecture& Best Practices Data View – How do you choose? Isolation

43 NoSQL Data Architecture& Best Practices Data View.. Not necessarily – instead, you may prefer: Absolute fastest performance at lowest hardware cost Highest global data availability at lowest hardware cost Working with one document at a time Writing advanced code to create your own consistency model Eventually consistent data Some inconsistent data that can’t be reconciled Some missing data that can’t be recovered Some inconsistent query results NoSQL Data Architecture& Best Practices Data View – How do you choose? Consistency - Do you need complete consistency?

44 NoSQL Data Architecture& Best Practices Data View.. Highest performance for queries and transactions Highest data availability across multiple data centers Less data loss (eg. Durability) More query accuracy & less deadlocks (eg. Isolation) More data integrity (eg. Atomicity) Less code to compensate for lack of ACID compliance NoSQL Data Architecture& Best Practices Data View – How do you choose? What do you need most?

45 NoSQL Data Architecture& Best Practices Key Points RDBM’s will always have an important place in our architecture. NoSQL implementations have a benefit to our future. Once you have a list of NoSQL databases that meet your modeling needs, choose the one that best meets your need for velocity and volume. It is not a one-or-the-other ‘all in’ choice to make.


Download ppt "“Big Data” - Technical Architecture Roni Schuling - Enterprise Architecture Tom Scroggins – IS Domain Architecture Principal Financial Group."

Similar presentations


Ads by Google