Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Kognitio 06 January 2012 Michael Hiskey Vice President, Marketing and Business Development Sachin Sangtani Senior Technical Consultant.

Similar presentations


Presentation on theme: "Introduction to Kognitio 06 January 2012 Michael Hiskey Vice President, Marketing and Business Development Sachin Sangtani Senior Technical Consultant."— Presentation transcript:

1 Introduction to Kognitio 06 January 2012 Michael Hiskey Vice President, Marketing and Business Development Sachin Sangtani Senior Technical Consultant

2 Kognitio is an analytical accelerator Built from the ground-up to satisfy large and complex analytics on big data sets A massively parallel, in-memory analytical data warehouse that interoperates with your existing infrastructure

3 Why rip out and replace systems? Kognitio: Accelerate your analytical ability without disruption to existing systems Lower your hardware and software costs, while increasing performance x

4 Agenda Company Overview WX2 Overview –DW Origins and evolution –In Memory: What, Why, Where it helps Technical Overview –Software Features –Performance –Integration –Appliance –DR/Backup/Recovery –Timescale Product Roadmap What To Look For In Potential Clients Q&A

5 About Kognitio Software company founded in the UK 20+ year heritage focused on h igh performance analytical solutions –Solid reference client successes WX 2 Analytical Data Warehouse –In-memory processing –Massively parallel (MPP) –Cloud deployment model –Mature product (v7) –Open standards & linear scalability –Multi-threaded high-volume data loads –Ease of administration & low TCO: NO indexes, partitions, aggregates, etc. Flexible delivery model –Software, Appliance –DaaS TM -Data warehousing as a Service Commodity x86/Linux servers BI / Analytics Underlying Systems, Hadoop Clusters, Enterprise Data Warehouses Direct 3 rd Party Services/Apps Kognitio legacy EDW ETL

6 Kognitio WX 2 Overview

7 Traditional data warehouse/BI solution stack ETL Product Data Customer Data Transactional Data Data Warehouse Operational Systems BI Analysis Application BI Reporting Application Data Mart BI Analysis Application BI Reporting Application BI Analysis Application BI Reporting Application Depend. Database Cube “Single version of the truth” However: Data duplication in marts and cubes Massive cube proliferation Complex and time consuming data extract operations High Admin. Cost (DBAs, Sys Admins, Cube) Often specialized HW and Storage required Pros –Relatively simple –Single copy of the data Cons –Slow (performance limited by data warehouse) –Unpredictable performance, depends on # of users, query load, data feeds etc.

8 Why do we need such complex solutions? Poor performance: Data held on slow mechanical disk Queries run against disk-based data (I/O Contention) Data resident in high speed memory Mechanical Disk RAM (fast access) Data Queries Results Queries run against data in memory Persistent data store on disk as well

9 Add MPP to in-memory Industry-standard x86 Servers, Storage, Memory. Results Queries RAM merged together across servers into a shared fabric

10 Parallelism in every direction = linear scalability massively parallel processing (MPP) …and analytical workload So what’s required? Array of industry standard servers Standard operating system An in-memory capable relational database management system (such as Kognitio WX 2 ) Allows systems to be “scaled out” to accommodate any data size…

11 How do MPP systems differ from traditional databases? $$$ Traditional systems have to develop cubes, indices, and tune for performance Traditional systems have to focus on making subsets of data available at acceptable performance levels Traditional Relational Database Projects Target High Performance Database Projects High performance MPP No cubes, No indices. Performance to burn. Focus on results (MPP – Massively Parallel Processing)

12 © Kognitio 2010 How are cost savings achieved ? In development Let’s compare a typical implementation using traditional resources with that of in memory MPP platforms Business & data understanding Data preparation 100% 50% Circa >55% saving Circa >60% saving With high performance and no indices, the need to create sophisticated schemas to match current business needs while also anticipating change is almost eliminated extent. Saving 60% of pre-business analytics With parallel loading and the very high speed processing, ETL becomes ELT for easier data manipulation. Fast analytics saves users developing queries based on what can be achieved rather than what they want to do. Traditional Relational Database Projects Target High Performance Database Projects

13 Why? Because traditional database management needs all this… New Data Extract Request User Community Operational Systems Management New Data Feed Database Administration Analytical Database Data LoadETL DB Development $$ Index Management Partition Table Space Management Temp Table Management Aggregation $$$$$ New Analytical Scope Significant DBA Cost

14 We provide data agility…eliminating partitioning, indexing, space and aggregation management Analytical Database Data LoadETL DB Development $$ New Data Feed Database Administration New Data Extract Request User Community Operational Systems Management Simply Load and Go New Analytical Scope

15 Technical Overview

16 Typical Analysis/Reporting Query -- Balance information of targeted accounts obtained from transaction table -- select C.Client_ID, D.Demog_Group, D.Demog_Desc, 1+avg(F.Credit_Limit_Changes) CL_Issued, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) - sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Balance, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) Total_Credit, sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Total_Debit, min(case when T.Trans_Type='C' then date ' ' - T.Effective_Date else 365*10 end) Days_Last_Credit, min(case when T.Trans_Type='D' then date ' ' - T.Effective_Date else 365*10 end) Days_Last_Debit from DEMO_FS.V_FIN_ACCOUNT F, DEMO_FS.V_FIN_CLIENT C, DEMO_FS.V_FIN_CLIENT_ACCOUNT_LINK L, DEMO_FS.V_FIN_ADD_CLIENT A, DEMO_FS.V_FIN_DEMOG_DESCS D, DEMO_FS.V_FIN_CC_TRANS T, -- --Query to produce campaign planning -- ( select Account_ID, count(Trans_Year) Years_Present, sum(No_Trans) No_Trans, sum(Total_Spend) Total_Spend, case count(Trans_Year) when 1 then 'One-off' else 'Repeat‘ end Behavior_Flag from ( select * from ( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) No_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from DEMO_FS.V_FIN_CC_TRANS where extract(year from Effective_Date) and actionid in ( select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date ) ) Acc_Summary where No_Trans in (3,4,5,6) and Avg_Spend>1000 and Trans_Year between 2004 and 2008 ) Target_Accs group by Account_ID ) Campaign_Grouping where Campaign_Grouping.Account_ID=L.Account_ID and L.Client_ID=C.Client_ID and C.Client_ID=A.Client_ID and A.Demog_Code=D.Demog_Code and D.Demog_code in (1,4,5,9,10,11,50,55) and Campaign_Grouping.Account_ID=F.Account_ID and Campaign_Grouping.Account_ID=T.Account_ID and T.Effective_Date < date ' ' group by C.Client_ID, Demog_Group, Demog_Desc order by Days_Last_Debit; 6 Tables plus inline subqueries 4 nested subqueries CASE Statements Numerous Predicates Aggregation BETWEEN IN NOT EQUAL TO 11 BILLION row fact :: seconds * * on different sized machines / different volumes Multiple passes through fact

17 Single X86 Linux Server/ Blade Single X86 Linux Server/ Blade WX2 :: Building Block Linux WX2 Data Storage Data Storage RAM Tables & Views Tables & Views Processing Tables/Views pinned in memory Query Processing Messaging Queue/Resource Management Database Processes (Compiler, Optimizer, etc.) Data Files (Persistent data) Database Software Operating System

18 WX2 :: Appliance Linux WX2 Data Storage Data Storage Tables & Views Tables & Views Processing Commodity X86 Linux hardware Standard form factor for most data centers Redundant Network and Hardware components Standard Appliance built on HP Blades; delivered pre-configured 10GBe networking Heavy use of RAM and CPU Applicance can be: Carved into multiple instances Strung together with other appliances to scale horizontally Used together or separately for configuring resiliency Rapids – High Performance (high use of RAM) Rivers – Medium Capacity (Mostly RAM, some Disk) Lakes – High Capacity Simple reporting, lower performance)

19 WX2 :: Appliance High speed data loads Into ~ 8 TB/Hour Onto ~1.5TB/hour Linear File System Create/Refresh images in RAM High speed access to hot data Complex/nested views/images ELT Manage massive amounts of data in RAM Utilize RAM for query processing Access RAM-based views/images Process in memory, no disk I/O All nodes in appliance participate equally MPP Message Passing Kernel optimizes communication Queries executed in machine code (jump offsets to access columns) Machine-code-level utilization of offsets to optimize access of RAM Mature RAM management techniques

20 WX2 :: Software :: Performance Row based scanning technology in common with other DWA technologies All server nodes participate equally and maximally in a query Enormous brute force processing from arrays of commodity servers with lots of CPU cores In-memory data can feed CPU cores without I/O wait ~650 million rows per second per server –10 servers = 6.5 BILLION rows / second –100 servers = 65 BILLION rows /second Load Rates of over 8TB/Hour to RAM; 1.5TB/Hour to Disk Effective and mature memory and resource management

21 Getting smaller, getting faster Retail analytics with ~24 Billion EPOS records POC in 2005 required 125 blade server system –Platform was physically located in Germany –WX 2 installation and data load was done remotely from the UK – no Kognitio or customer employees on site –Installation took one day, data load 4 hrs –System scanned all 24 billion records in 0.8s –Complex basket analysis queries took seconds Clients in 2008/9 purchase 64 blade appliances for similar production volume Today can demo on16 blade servers with better performance! On-going increase in CPU cores and RAM per server WX2 requires no tweaks or changes for different scale systems – Kognitio benefits greatly by exploiting the commodity computing development curve

22 WX2 :: Software :: Standards :: Integration SQL: ODBC/JDBC MDX ETL / ELT DataSources SQDR DataStage

23 WX2 :: Resiliency Disk Node Hardware Network Software

24 WX2 :: DR/Backup/Recovery ProductionDR Flexibility in system configurations – Instance B need not have same configuration as Instance A (primary instance) Parallel operations for bulk import and export of data Multi-versioning file system – row changes are kept until reclaim/repack event and historical changes can be queried Exploited by incremental backup – change- only backup Incremental backup can lock transaction history via transaction marker Queries against in-memory data isolated from disk I/O of backup operations Approaches Full + Incremental Simultaneous load to two environments Dual feed Dual ETL instances Incremental backup to smaller environment Hybrid Incremental backup of dimensions; dual feed of Facts Snapshot/Clone disk volumes (Prod has to be stopped) SAN to SAN mirror

25 Timescale Netezza Vertica ParAccel Greenplum Teradata SAP HANA Oracle Exalytics Company Clients (post-2005) Product Releases Competition: Approximate year of founding 1979

26 Thank you Q&A

27 Thank You! connect kognitio.com kognitio.tel kognitio.com/blog twitter.com/kognitio linkedin.com/companies/kognitio tinyurl.com/kognitio youtube.com/user/kognitiowx2 contact Michael Hiskey Vice President, Marketing & Business Development Sach Sangtani Senior Technical Consultant


Download ppt "Introduction to Kognitio 06 January 2012 Michael Hiskey Vice President, Marketing and Business Development Sachin Sangtani Senior Technical Consultant."

Similar presentations


Ads by Google