IEEE BigData Overview October 9 2013 NIST Big Data Public Working Group NBD-PWG Based on September 30, 2013 Presentations at one day workshop at NIST Leaders.

Slides:



Advertisements
Similar presentations
NIST Big Data Public Working Group Technology Roadmap Subgroup Presentation September 30, 2013 Carl Buffington (Vistronix) David Boyd (Data Tactic) Dan.
Advertisements

The Role of Environmental Monitoring in the Green Economy Strategy K Nathan Hill March 2010.
NIST Big Data Public Working Group & Standardization Activities
NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence)
State of Indiana Business One Stop (BOS) Program Roadmap Updated June 6, 2013 RFI ATTACHMENT D.
NIST Cloud Computing Program 1 NIST Cloud Computing Program - Highlights & Next Steps NIST Mission: To promote U.S. innovation and industrial competitiveness.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
NIST Big Data Public Working Group Security and Privacy Subgroup Presentation September 30, 2013 Arnab Roy, Fujitsu Akhil Manchanda, GE Nancy Landreville,
NIST Big Data Public Working Group Big Data PWG Overview Presentation September 30, 2013 Wo Chang, NIST Robert Marcus, ET-Strategies Chaitanya Baru, UC.
Current NIST Definition NIST Big data consists of advanced techniques that harness independent resources for building scalable data systems when the characteristics.
September 30, 2011 OASIS Open Smart Grid Reference Model: Standards Landscape Analysis.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Reference Architecture Subgroup NIST Big Data Public Working Group Reference Architecture Subgroup September 30, 2013 Co-chairs: Orit LevinMicrosoft James.
Open Government Vlora Ademi, Business Development Manager-Edu, Microsoft Macedonia &Kosovo
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
SpaceGRID and EGSO Satu Keski-Jaskari Maria Vappula Parallal Computing – Seminar
1 3 rd SG13 Regional Workshop for Africa on “ITU-T Standardization Challenges for Developing Countries Working for a Connected Africa” (Livingstone, Zambia,
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
NIST Big Data Public Working Group Reference Architecture Subgroup September 30, 2013 Co-chairs: Orit LevinMicrosoft James KetnerAT&T Don KrapohlAugmented.
NIST BIG DATA WG Reference Architecture Subgroup Meeting Agenda Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence)
NIST Big Data Public Working Group
8/15/2013NIST Big Data WG / Ref Arch Subgroup1 NIST Big Data Program Alignment: Roadmap & Reference Architecture Version 1.3 Roadmap Subgroup NIST Big.
NIST Big Data Public Working Group
A Robust Health Data Infrastructure P. Jon White, MD Director, Health IT Agency for Healthcare Research and Quality
Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October Chicago Geoffrey Fox
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
Lee Kinsman (soon to be) Consultant, Chamonix IT Consulting
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
NIST BIG DATA WG Reference Architecture Subgroup Draft Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence) August.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Streaming Applications in NIST Public Big Data Working Group
National Institute of Standards and Technology Information Technology Laboratory 1 USG Cloud Computing Technology Roadmap Next Steps NIST Mission: To promote.
Privacy Communication Privacy Confidentiality Access Policies Systems Crypto Enforced Computing on Encrypted Data Searching and Reporting Fully Homomorphic.
Data discovery and data processing for environmental research infrastructures Roberto Cossu ENVRI WP4 leader ESA.
NIST Big Data Public Working Group Security and Privacy Subgroup Presentation September 30, 2013 Arnab Roy, Fujitsu Akhil Manchanda, GE Nancy Landreville,
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
Geneva, Switzerland, April 2012 Introduction to session 7 - “Advancing e-health standards: Roles and responsibilities of stakeholders” ​ Marco Carugi.
NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.
NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence)
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
8/20/2013NIST Big Data WG / Roadmap Subgroup1 Architecture Storage Architecture Processing Architecture Resource Managers Architecture Infrastructure Architecture.
51 Use Cases and implications for HPC & Apache Big Data Stack Architecture and Ogres International Workshop on Extreme Scale Scientific Computing (Big.
Jake F. Weltzin United States Geological Survey USA National Phenology Network Integrating phenology data across spatial and temporal scales.
November Geoffrey Fox Community Grids Lab Indiana University Net-Centric Sensor Grids.
51 Detailed Use Cases: Contributed July-September 2013 Covers goals, data features such as 3 V’s, software, hardware
NIST BIG DATA WG Reference Architecture Subgroup Draft Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence) August.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
© Cloud Security Alliance, 2015 Wilco van Ginkel, Co-Chair BDWG.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
Role Activity Sub-role Functional Components Control Data Software.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
Big Data Security Issues in Cloud Management. BDWG Big Data Working Group Researchers 1: Data analytics for security 2: Privacy preserving 3: Big data-scale.
EDISON Data Science Framework: Building the Data Science Profession
EOSC MODEL Pasquale Pagano CNR - ISTI
Volume 3, Use Cases and General Requirements Document Scope
Data Quality: Practice, Technologies and Implications
Grid Application Model and Design and Implementation of Grid Services
Digital Science Center
Bird of Feather Session
Presentation transcript:

IEEE BigData Overview October NIST Big Data Public Working Group NBD-PWG Based on September 30, 2013 Presentations at one day workshop at NIST Leaders of activity Wo Chang, NIST (Should present but shut down) Robert Marcus, ET-Strategies Chaitanya Baru, UC San Diego Note web site is shut down and I relied on incomplete documentation (Geoffrey Fox)

IEEE BigData Overview October /29/13 NBD-PWG Charter Launch Date: June 26, 2013; Public Meeting with interim deliverables: September, 30, 2013; Edit and send out for comment Nov-Dec 2013 The focus of the (NBD-PWG) is to form a community of interest from industry, academia, and government, with the goal of developing a consensus definitions, taxonomies, secure reference architectures, and technology roadmap. The aim is to create vendor-neutral, technology and infrastructure agnostic deliverables to enable big data stakeholders to pick-and-choose best analytics tools for their processing and visualization requirements on the most suitable computing platforms and clusters while allowing value-added from big data service providers and flow of data between the stakeholders in a cohesive and secure manner. Note identify common/best practice; includes but not limited to discussing standards (S in NIST) 2

IEEE BigData Overview October /29/13 NBD-PWG Subgroups & Co-Chairs 3 Requirements and Use Cases SG – Geoffrey Fox, Indiana U.; Joe Paiva, VA; Tsegereda Beyene, Cisco Definitions and Taxonomies SG – Nancy Grady, SAIC; Natasha Balac, SDSC; Eugene Luster, R2AD Reference Architecture SG – Orit Levin, Microsoft; James Ketner, AT&T; Don Krapohl, Augmented Intelligence Security and Privacy SG – Arnab Roy, CSA/Fujitsu Nancy Landreville, U. MD Akhil Manchanda, GE Technology Roadmap SG – Carl Buffington, Vistronix; Dan McClary, Oracle; David Boyd, Data Tactic

IEEE BigData Overview October /29/13 Requirements and Use Case Subgroup 4 The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus list of Big Data requirements across all stakeholders. This includes gathering and understanding various use cases from diversified application domains. Tasks Gather use case input from all stakeholders Derive Big Data requirements from each use case. Analyze/prioritize a list of challenging general requirements that may delay or prevent adoption of Big Data deployment Work with Reference Architecture to validate requirements and reference architecture Develop a set of general patterns capturing the “essence” of use cases (to do)

IEEE BigData Overview October /29/13 Use Case Template 26 fields completed for 51 areas Government Operation: 4 Commercial: 8 Defense: 3 Healthcare and Life Sciences: 10 Deep Learning and Social Media: 6 The Ecosystem for Research: 4 Astronomy and Physics: 5 Earth, Environmental and Polar Science: 10 Energy: 1

IEEE BigData Overview October /29/13 51 Detailed Use Cases: Many TB’s to Many PB’s Government Operation: National Archives and Records Administration, Census Bureau Commercial: Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search, Digital Materials, Cargo shipping (as in UPS) Defense: Sensors, Image surveillance, Situation Assessment Healthcare and Life Sciences: Medical records, Graph and Probabilistic analysis, Pathology, Bioimaging, Genomics, Epidemiology, People Activity models, Biodiversity Deep Learning and Social Media: Driving Car, Geolocate images, Twitter, Crowd Sourcing, Network Science, NIST benchmark datasets The Ecosystem for Research: Metadata, Collaboration, Language Translation, Light source experiments Astronomy and Physics: Sky Surveys compared to simulation, Large Hadron Collider at CERN, Belle Accelerator II in Japan Earth, Environmental and Polar Science: Radar Scattering in Atmosphere, Earthquake, Ocean, Earth Observation, Ice sheet Radar scattering, Earth radar mapping, Climate simulation datasets, Atmospheric turbulence identification, Subsurface Biogeochemistry (microbes to watersheds), AmeriFlux and FLUXNET gas sensors Energy: Smart grid Next step involves matching extracted requirements and reference architecture Alternatively develop a set of general patterns capturing the “essence” of use cases

IEEE BigData Overview October /29/13 Definitions and Taxonomies Subgroup 7 The focus is to gain a better understanding of the principles of Big Data. It is important to develop a consensus-based common language and vocabulary terms used in Big Data across stakeholders from industry, academia, and government. In addition, it is also critical to identify essential actors with roles and responsibility, and subdivide them into components and sub-components on how they interact/ relate with each other according to their similarities and differences. Tasks For Definitions: Compile terms used from all stakeholders regarding the meaning of Big Data from various standard bodies, domain applications, and diversified operational environments. For Taxonomies: Identify key actors with their roles and responsibilities from all stakeholders, categorize them into components and subcomponents based on their similarities and differences

IEEE BigData Overview October /29/13 Data Science Definition (Big Data less consensus) Data Science is the extraction of actionable knowledge directly from data through a process of discovery, hypothesis, and analytical hypothesis analysis. A Data Scientist is a practitioner who has sufficient knowledge of the overlapping regimes of expertise in business needs, domain knowledge, analytical skills and programming expertise to manage the end-to-end scientific method process through each stage in the big data lifecycle. 8 Big Data refers to digital data volume, velocity and/or variety whose management requires scalability across coupled horizontal resources

IEEE BigData Overview October /29/13 Reference Architecture Subgroup 9 The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus-based approach to orchestrate vendor-neutral, technology and infrastructure agnostic for analytics tools and computing environments. The goal is to enable Big Data stakeholders to pick-and- choose technology-agnostic analytics tools for processing and visualization in any computing platform and cluster while allowing value-added from Big Data service providers and the flow of the data between the stakeholders in a cohesive and secure manner. Tasks Gather and study available Big Data architectures representing various stakeholders, different data types,’ use cases, and document the architectures using the Big Data taxonomies model based upon the identified actors with their roles and responsibilities. Ensure that the developed Big Data reference architecture and the Security and Privacy Reference Architecture correspond and complement each other.

IEEE BigData Overview October /29/13 List Of Surveyed Architectures Vendor-neutral and technology-agnostic proposals – Bob MarcusET-Strategies – Orit LevinMicrosoft – Gary MazzaferroAlloyCloud – Yuri DemchenkoUniversity of Amsterdam Vendors’ Architectures – IBM – Oracle – Booz Allen Hamilton – EMC – SAP – 9sight – LexusNexis 10

IEEE BigData Overview October /29/13 11 Management Security & Privacy Big Data Application Provider Visualization Access Analytics Curation Collection System Orchestrator DATA SW DATA SW INFORMATION VALUE CHAIN IT VALUE CHAIN Data Consumer Data Provider Horizontally Scalable (VM clusters) Vertically Scalable Horizontally Scalable Vertically Scalable Horizontally Scalable Vertically Scalable Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing, etc.) DATA SW

IEEE BigData Overview October /29/13 Security and Privacy Subgroup 12 The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus secure reference architecture to handle security and privacy issues across all stakeholders. This includes gaining an understanding of what standards are available or under development, as well as identifies which key organizations are working on these standards. Tasks Gather input from all stakeholders regarding security and privacy concerns in Big Data processing, storage, and services. Analyze/prioritize a list of challenging security and privacy requirements from ~10 special use cases that may delay or prevent adoption of Big Data deployment Develop a Security and Privacy Reference Architecture that supplements the general Big Data Reference Architecture

IEEE BigData Overview October /29/ )Secure computations in distributed programming frameworks 2)Security best practices for non- relational datastores 3)Secure data storage and transactions logs 4)End-point input validation/filtering 5)Real time security monitoring 6)Scalable and composable privacy- preserving data mining and analytics 7)Cryptographically enforced access control and secure communication 8)Granular access control 9)Granular audits 10)Data provenance CSA (Cloud Security Alliance) BDWG: Top Ten Big Data Security and Privacy Challenges

IEEE BigData Overview October /29/13 Top 10 S&P Challenges: Classification 14 Infrastructure security Secure Computations in Distributed Programming Frameworks Security Best Practices for Non- Relational Data Stores Data Privacy Privacy Preserving Data Mining and Analytics Cryptographically Enforced Data Centric Security Granular Access Control Data Management Secure Data Storage and Transaction Logs Granular AuditsData Provenance Integrity and Reactive Security End-point validation and filtering Real time Security Monitoring

IEEE BigData Overview October /29/13 Use Cases 15 Retail/Marketing –Modern Day Consumerism –Nielsen Homescan –Web Traffic Analysis Healthcare –Health Information Exchange –Genetic Privacy –Pharma Clinical Trial Data Sharing Cyber-security Government –Military –Education

IEEE BigData Overview October /29/13 Technology Roadmap Subgroup 16 The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus vision with recommendations on how Big Data should move forward by performing a good gap analysis through the materials gathered from all other NBD subgroups. This includes setting standardization and adoption priorities through an understanding of what standards are available or under development as part of the recommendations. Tasks Gather input from NBD subgroups and study the taxonomies for the actors’ roles and responsibility, use cases and requirements, and secure reference architecture. Gain understanding of what standards are available or under development for Big Data Perform a thorough gap analysis and document the findings Identify what possible barriers may delay or prevent adoption of Big Data Document vision and recommendations

IEEE BigData Overview October /29/13 Some Identified Features 17 Technology Roadmap FeatureRolesReadinessRef Architecture Mapping Storage FrameworkTBD Capabilities Processing FrameworkTBD Capabilities Resource Managers FrameworkTBD Capabilities Infrastructure Framework TBD Capabilities Information FrameworkTBD Data Services Standards Integration Framework TBD Data Services Applications FrameworkTBD Capabilities Business OperationsTBD Vertical Orchestrator

IEEE BigData Overview October /29/13 Interaction Between Subgroups 18 Technology Roadmap Requirements & Use Cases Definitions & Taxonomies  Reference Architecture    Security & Privacy   Due to time constraints, activities were carried out in parallel.