Overview SCALE14x 2016. Agenda/Schedule -Apache Bigtop Overview -Apache Spark Overview/Getting Started -Lunch Break -Apache Ignite -Workshop, tutorial,

Slides:



Advertisements
Similar presentations
Apache Bigtop Working Group Cluster stuff. Cloud computing.
Advertisements

Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.
Big Data Training Course for IT Professionals Name of course : Big Data Developer Course Duration : 3 days full time including practical sessions Dates.
Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera.
Multi-Data-Center Hadoop in a Snap Dr. Konstantin Boudnik Vice President, Open Source Development.
Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013.
Hortonworks Eric Baldeschwieler – CEO © Hortonworks Inc Architecting the Future of Big Data June 29, 2011.
AStudy on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data Mentor: Je’aime Powell, Dr. Mohammad.
StorIT Certified - Big Data Sales Expert Name of the course: StorIT Certified Bigdata Sales Expert Duration: 1 day full time Date: November 12, 2014 Location:
© 2013 MediaCrossing, Inc. All rights reserved. Going Live: Preparing your first Spark production deployment Gary Malouf Architect,
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Build a SharePoint App with Microsoft Access. About me.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Introduction to Apache Hadoop CSCI 572: Information Retrieval and Search Engines Summer 2010.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
HADOOP ADMIN: Session -2
Apache Spark and the future of big data applications Eric Baldeschwieler.
Thank You ©2012, Cognizant. Rapido has been created by the Research and Development team from QE&A Technology CoE Rapido is continuously enhanced and.
HAMS Technologies 1
Our Experience Running YARN at Scale Bobby Evans.
Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache.
Penwell Debug Intel Confidential BRIEF OVERVIEW OF HIVE Jonathan Brauer ESE 380L Feb
Presented by John Dougherty, Viriton 4/28/2015 Infrastructure and Stack.
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
An Introduction to HDInsight June 27 th,
MERCURY BUSINESS PROCESS TESTING. AGENDA  Objective  What is Business Process Testing  Business Components  Defining Requirements  Creation of Business.
How Users Can Help a Project? Samisa Abeysinghe. Who Am I? Samisa Abeysinghe –ASF Member WS PMC –Projects Started with Apache Axis/C++ Now with Apache.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary June, 2011 Made available under the Eclipse Public License v Mobile.
© 2006 DTP PMC; made available under the EPL v1.0 | July 12, 2006 | DTP Enablement Project Creation Review Creation Review: Eclipse Data Tools Platform.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Hadoop implementation of MapReduce computational model Ján Vaňo.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
Stairway to the cloud or can we take the highway? Taivo Liik.
© Hortonworks Inc Hadoop: Beyond MapReduce Steve Loughran, Big Data workshop, June 2013.
Breaking points of traditional approach What if you could handle big data?
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Spark and Jupyter 1 IT - Analytics Working Group - Luca Menichetti.
Learn Hadoop and Big Data Technologies. Hadoop  An Open source framework that stores and processes Big Data in distributed manner on a large groups of.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
1 Divya Jain Oct 10 th, 2014 Big Data Products: Where do I start?
This is a free Course Available on Hadoop-Skills.com.
Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc.
Data Science Hadoop YARN Rodney Nielsen. Rodney Nielsen, Human Intelligence & Language Technologies Lab Outline Classical Hadoop What’s it all about Hadoop.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Esri UC 2014 | Technical Workshop | Address Maps and Apps for State and Local Government Allison Muise Nikki Golding Scott Oppmann.
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
OMOP CDM on Hadoop Reference Architecture
Connected Infrastructure
Continuous Delivery- Complete Guide
Hadoop and Analytics at CERN IT
Introduction to ODPi Roman VP of
Apache Bigtop Practical Workshop.
Connected Infrastructure
Contributing to Open Source Projects
Contributing to Open Source Projects
Hadoop Clusters Tess Fulkerson.
Ministry of Higher Education
Introduction to Apache
Overview of big data tools
Vision on the future development of R4BP 3
Charles Tappert Seidenberg School of CSIS, Pace University
Apache Oozie What is it ? Why use it ? Architecture Examples
Presentation transcript:

Overview SCALE14x 2016

Agenda/Schedule -Apache Bigtop Overview -Apache Spark Overview/Getting Started -Lunch Break -Apache Ignite -Workshop, tutorial, open time (click on Agenda button)

What is Bigtop? Setting the standard for testing, packaging and integration of leading big/fast data components

and many other… Components as Building Blocks

Dependency Hell!! hdfs zookeeper hbase kafka spark. mapred oozie hive etc Build all the Things!!!

The BOM Build of Materials (BOM) * List of >=1 components * Gradle for build/actions * Produce sets of debs/rpms

Bigtop Origins Yahoo!, 2010 Created, fostered early Hadoop community Working on Hadoop 0.20 stack 2011 Yahoo!’s to Cloudera, solving early problems of packaging and maintaining first commercial supported Hadoop distro

Early value add Provide a common foundation for proper integration of growing number of Hadoop family components Foundation provides solid base for validating applications running on top of the stack(s) Provide neutral packaging and deployment/config

Early Mission Accomplished Foundation for commercial Hadoop distros/services Leveraged by app providers…

What now? We are done right?1?!?

Industry/Ecosystem Evolution & New Community Needs/Ideas

Where should we spend our time?, which users should benefit?

Moving beyond oob mapreduce…

Lambda/Stream Architectures HDFS + Zookeeper +

Get out from the Apache dome

New focus and target end users Data engineers vs distro builders Enhance Operations/Deployment Reference implementations & tutorials

Laying new foundation with 1.0+ Self-starter, non-kitchen sink building -Making gradle tooling smarter -Jenkins job autogen -leveraging containers for parallelization

Data data data… Smarter/Realistic test data -bigpetstore -bigtop-bazaar -weather data gen Tutorial/Learning Data sets -githubarchive.org -more tbd…

Deployment/Mgmt Updated puppet modules -newest best practices -next level enhanced security options Wider range of starter deployment topologies Include some handling of test/tutorial data

More components…

Sounds interesting, how can I help? *Join mailing list, ask questions, suggest features, etc *Contribute (components, tutorials, docs) *Report bugs

Thank You, Q&A Nate