Presentation is loading. Please wait.

Presentation is loading. Please wait.

ORNL is managed by UT-Battelle for the US Department of Energy The Big-Data “App-Store” Democratizing Data-Science with On-Demand Analytics Principal Investigator:

Similar presentations


Presentation on theme: "ORNL is managed by UT-Battelle for the US Department of Energy The Big-Data “App-Store” Democratizing Data-Science with On-Demand Analytics Principal Investigator:"— Presentation transcript:

1 ORNL is managed by UT-Battelle for the US Department of Energy The Big-Data “App-Store” Democratizing Data-Science with On-Demand Analytics Principal Investigator: Rangan Sukumar Commercialization Manager : David Sims

2 2 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Customer Pain in the Big Data Era… Big Data to Big Decisions For similar data volumes, our IT and ‘Analytics’ budget is ~$1.5 Billion and ORNL’s HPC operations budget is ~$100 million. Why? How can we use computers for better policy, integrity and quality? “…Our data has grown big over the years but we still want answers in a few seconds….” “…We want to do what Kroger did with their coupon loyalty program to understand our patients better. How about the Netflix-like recommendation algorithm?...” “…We have data about criminals in 20 different databases that do not talk to each other….” “…Our business is acquiring a competitor and we want to consolidate customers and suppliers from the mergers….”

3 3 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Technology Description: Abstraction Interrogation Association Prediction Validation Simulation Querying and Retrieval e.g. Databases, Google Index Data-fusion e.g. Mashup Predictive Modeling e.g. Fraud Prevention Better Data Collection There is a need for scalable data-science tools that are faster, cheaper and efficient. Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics What happened? Why did it happen? What will happen? How can we make it happen? Concept adapted from Gartner’s Webinar on Big Data Hindsight Insight Foresight

4 4 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics State-of-the-practice –Information Technology (IT) infrastructure investment (~ $100K) –Software license (~ 10-50K /year) –Data scientist and staff (100K /year) –Expensive “utility-bills” for data-centers (~ $10K /year) –6 -24 month lead time to build “apps” Suppose a company wanted these capabilities ….

5 5 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Opportunity Non-profits, city officials, research projects, small businesses,……… are unable to afford data-science. Vision for a Start-up: Democratize Data-Science

6 6 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Technology Opportunity: Our Solution The Big Data “App-Store” Offer “algorithms-as-a-service” Offer on-demand pay-for-use subscription model SaaS e.g. Citrix PaaS e.g. Amazon EC2 Top Free Apps Top Pay-per-use Apps PageRank Community Shortest Path Coupon $50/TB-hour $40/TB-hour $100/TB-hour K-MeansK-NN DaaS e.g. Acxiom DbaaS e.g. EMC Big-Data as a service

7 7 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Technology Description: Details SEEKER: Schema Exploration and Evolving Knowledge Recorder Data, Schema and Automated Meta-data Integration SNAKE: Social/Network Analytic Knowledge Extractor Data association, Record- Linkage, Saliency extraction PAUSE: Predictive Analytics Using Software-Endpoints Pattern DiscoveryPattern Recognition Automated Data Harmonization Seamless inter and intra-agency enterprise data integration Schema-free flexible data transformation and representation Active Master Data Management Entity-resolution (discovering the ‘Golden Record’) Graph-theoretic and domain-aware discovery High-performance Data Analysis Scalable Machine Learning Algorithms Rule-based Models Anomaly-detection Models Predictive Models Social-network Models Knowledge Catalysts Suite Enables…

8 8 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Technology Description: In Action Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics What happened?Why did it happen? What will happen?How to stop it from happening? Accuracy of model : ~80% Specificity :~76% Sensitivity : ~85%

9 9 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Technology Leadership: Capability –ORNL’s HPC/Data resources ahead of market, therefore our software is ahead as well. –50+ ‘Apps’ with many more in the works… –Demonstrated prototype on ~10 TBs of data at a time. –Parallelized formulation of algorithms and code- optimization –Results are x faster depending on the algorithm over state-of-the-art. –Handle 1000x the size of data that desktops can handle for the same latency.

10 10 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Technology Leadership: Intellectual property –2 provisional patents,1 patent pending –7 ORNL-IDSA disclosures –2 software copyright applications –16 peer-reviewed publications

11 11 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Technology Application: Healthcare George Thomas, Chief Architect - Healthdata.gov Department of Health and Human Services Don Berwick, Administrator, Centers for Medicare and Medicaid Services 27 Sep 2011 Oak Ridge is not just a Department of Energy supplier. They work with other government agencies that want to contract with them to do essentially analytics and data mining. The one place I saw analytics working was in our early work on predictive analytics for fraud……. ….At CMS we did do some early trials with ORNL. CMS gave them access to privacy-protected Medicare information. They have tremendous analytic capacity, and it was stunning what they did. 15 May 2012 The agency looked to the Energy Department's Oak Ridge National Laboratory for help creating a consolidated warehouse environment that can utilize more data tools and services. After a year of testing technologies, such as Hadoop, at Oak Ridge, CMS is bringing the strategy to its Baltimore data center to see how the tests work within a different security infrastructure, at a different scale and for CMS's specific operational needs. Tony Trenkle, Chief Information Officer, Centers for Medicare and Medicaid Services 22 Oct 2012 …Based on its analysis of the provider files submitted by the participating states, Oak Ridge National Labs (ORNL) has identified probable risk factors and high-risk vendors. The findings from the pilot have been validated and risk-scored by CMS in partnership with Rainmakers, an independent contractor. 1 Oct 2013 Collaborative Forum Pilot : Provider Risk Assessment Office of Management and Budget Appeared in Health Leaders Media Appeared in Fierce Government IT Filed as part of the 5 th Semi-annual Report to the Congress….

12 12 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Technology Leadership: Feedback ~100 downloads as of 20 March 2015 Feb 8, 2015: Six Free-Apps Made https://github.com/ssrangan/gm-sparql

13 13 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Innovation/Commercialization Roadmap 2015: 50 ‘Apps’ available today 2016: Support for futuristic hardware architectures 2017: Semantic, Logical and Statistical Reasoning Apps 2018: Artificial Intelligence: Hypothesis Generation 2015: Create an “App-Store” web- interface 2016: Apply to healthcare use- cases 2017: Develop common core for Big Data Market 2018: Partner with cloud providers such as Amazon, Microsoft and Google. Research Commercialization Startup

14 14 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Commercialization Path Application DescriptionTarget CustomersCurrent Practice HEALTHCARE Health insurance and health claim processing companies Fraud, waste, abuse models reduces cost of healthcare. Emdeon, Summit Health, AETNA, CIGNA, Blue Cross etc. No pro-active actions. Insurance companies are looking for CMS to publish “black-list”. BIG DATA “Big Data as a Service” Startups “Algorithms-as-a-service”OpenCore, Qubole, AYASADI, Acxiom, GoodData, etc. Analysis requires expensive software licenses and insights rely on the creativity of analysts. CLOUD COMPUTING Third-party cloud integrators for Amazon AWS, Microsoft Azure and Google Compute Engine Inter- and Intra-agency analytics on the cloud. AppNexus, JasperSoft, Cloudreach, CodeEnvy, Twin Strata, New Relic, Algorithmica Business intelligence tools such as Tableau, Pentaho, etc. for data retrieval and reports.

15 15 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Competitive Differentiation

16 16 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Market Opportunity Fraud, Waste and Abuse Market (2017) : $7.5 B Health Applications (2017) : $ 6.7 B Major Health Insurance Companies: 35 (1200) Health Informatics Companies (>1000 employees): 102 FWA Estimates for 2012 in Healthcare: $270 B Health informatics market penetration goal of 20% by 2017 yields ~ $20 Million : $53 CAGR 16% “Big Data as a Service” (2017) : $20 B Number of startups: ~100 Number of cloud-service providers: ~10 “Algorithms-as-a-service” market penetration goal of 10% by 2017 yields ~ $30 Million : $60 CAGR 38%

17 17 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Summary Value (choice) BI Dashboards Reports smalllarge Historical/Practice Pro-active/Predictive Size (inevitable) “Excel, Workstation, R, SAS, Databases” + Analyst 90% of the market 5% of the market Big Data BI Hadoop, Tableau, Pentaho Infrastructure: Clusters Database: SQL, No SQL, MPP, Hadoop. Graphs Programming: MPI, RDF, CUDA etc. Algorithms: Matrix inversion, machine learning Application dictates choice of tools Data dictates choice of tools and algorithms In 5 years 5% of the market Transition path

18 18 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Thank You Questions ?

19 19 The Big-Data “App-Store”: Democratizing Data-Science with On-Demand Analytics Technology Description: Details Knowledge discovery is a “greedy” and ”never ending” thirst. Big Data produces “Bigger Data” “Lifecycle” management vs. “Project” management Big Data comes with Big Expectations Data sets are expected to answer more than one question. “We have lots of data – we do not know what questions to ask.” If Big Data => Smarter decisions, we need “Smarter Methods” Discover “newer” insights in context with evolving new knowledge Methods that can work well when there is more noise than signal.


Download ppt "ORNL is managed by UT-Battelle for the US Department of Energy The Big-Data “App-Store” Democratizing Data-Science with On-Demand Analytics Principal Investigator:"

Similar presentations


Ads by Google