Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

Slides:

Advertisements

Similar presentations

Starfish: A Self-tuning System for Big Data Analytics.

Advertisements

Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html

HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.

Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.

 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)

Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.

Cloud Computing Other Mapreduce issues Keke Chen.

DAvinCi: A Cloud Computing Framework for Service Robots

PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.

Introduction to Apache Hadoop CSCI 572: Information Retrieval and Search Engines Summer 2010.

HADOOP ADMIN: Session -2

Committed to Deliver….  Android OTR(Offline Time Recording) is Calendar application based on Google Android operating system.  It was developed based.

U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.

Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.

SharePoint 2010 Business Intelligence Module 6: Analysis Services.

USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.

MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …

H ADOOP DB: A N A RCHITECTURAL H YBRID OF M AP R EDUCE AND DBMS T ECHNOLOGIES FOR A NALYTICAL W ORKLOADS By: Muhammad Mudassar MS-IT-8 1.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.

Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

HAMS Technologies 1

Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)

An Introduction to HDInsight June 27 th,

Spatial Tajo Supporting Spatial Queries on Apache Tajo Slideshare Shorten URL : goo.gl/j0VLXpgoo.gl/j0VLXp.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.

Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

Hadoop implementation of MapReduce computational model Ján Vaňo.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.

Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.

1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.

BIG DATA/ Hadoop Interview Questions.

Data Science Hadoop YARN Rodney Nielsen. Rodney Nielsen, Human Intelligence & Language Technologies Lab Outline Classical Hadoop What’s it all about Hadoop.

Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.

Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit

Big Data is a Big Deal!.

INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER

Open Source distributed document DB for an enterprise

Extraction, aggregation and classification at Web Scale

Central Florida Business Intelligence User Group

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

Database Applications (15-415) Hadoop Lecture 26, April 19, 2016

CS110: Discussion about Spark

Introduction to Apache

Overview of big data tools

Enterprise Program Management Office

Charles Tappert Seidenberg School of CSIS, Pace University

Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.

UNIT 6 RECENT TRENDS.

Pig Hive HBase Zookeeper

Presentation transcript:

Committed to Deliver…

 We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop, Facebook version or Cloudera version in your own data center, or n cluster of machines Amazon EC2, Rackspace etc  We provide Scalable End-to-end Solution: Solution that can scale of large data set (Tera Bytes or Peta Bytes)  Low Cost Solution: Based on open source Framework currently used by Google, Yahoo and Facebook.  Solution optimized for minimum SLA and maximize performance

–Project Initiation Project Planning Requirement Collection POC using Hadoop technology –Team Building Highly skilled Hadoop experts Dedicated team for project –Agile Methodology Small Iterations Easy to implement changing requirement –Support Long term relationship to support developed product Scope to change based on business/technical need

The combined experience has led to the adoption of unique methodology that ensures quality work. We:  Evaluating the hardware available and understand the clients requirements.  Peeking through the data.  Analyzing data, prototype using M/R code. Show the results to our clients.  Iterative - and continuous improvement and develop better understanding of data.  Parallel development of various tasks: ◦ Data Collection ◦ Data Storage in HDFS ◦ M/R Analytics jobs. ◦ Scheduler to run M/R jobs and bring coordination. ◦ Transform output into OLAP cubes (Dimension and Fact Table) ◦ Provide a custom interface to retrieve the M/R output

 We are expert in time series data, in other words we receive time-stamp data.  We have ample experience in writing efficient fast and robust Map/Reduce code which implement ETL functions.  We have massaged Hadoop to enterprise standard provided features like High Availability, Data Collection, data Merging.  Writing Map/Reduce is not enough. We wrote layers on top of Hadoop which uses Hive, Pig to transform data in OLAP cubes for easy UI consumption.

 We provide a brief about our clients.

Collector Hadoop Cluster Map / Reduce Output UI Display UI Display Thrifit Service Thrifit Service Training Data Web UI Web UI

External News Collector Map/ Reduce Categorization Index Map/ Reduce (Filtering, Term Freq Collection) Map/ Reduce (Training Set) Training Data DFS Client Hive Interface

 We were asked to analyze their sales data and extract valuable information from the data.  The Data was in form of 9-tuple format:  We were asked to provide information like unique subscribers count (used address), per day transactions amount  We deployed the Hadoop cluster on three machines ◦ Deployed our collector to pump data from DB into HDFS ◦ Wrote M/R jobs to generate OLAP cubes. ◦ Provided Hive Interface to extract and show in UI.

OrderID IDMobile Num Payable Amount Delivery Charges Mode of Payment Order Status Order Site Day Granularity Actual Number of Customers Forecast Number of Customers Total Aggregate d Amount Forecast Aggregate d Amount IDPayable Amount

 We delivered end-to-end reporting solution to Guavus.  The Data was provided by Sprint Network (Tier 1 Company) we had to develop a reporting engine to analyze and generate OLAP cubes.  We were asked to provide evaluate Peta Bytes of data provide ETL solution  We deployed the Hadoop cluster on 10 Linux machines.  We wrote our collector which read Binary Data and pushed into Hadoop Cluster.  We wrote M/R jobs (which run for 4 hrs) every day The idea was to provide provide analytics on stream data  We generate OLAP cubes and storing results in Infinity DB (column DB), Hive.

Reporting UI/ Web Interface Report Generation Task (Map / Reduce Framework) Data Collector Query Engine Hadoop Configuration Distributed Storage Framework (Hadoop / HDFS) Infinity DB / Hive / Pig

Hadoop Infrastructure Map / Reduce Tasks Data Collector Monitor / Overall Scheduler Infinity DB / Hive / Pig Rubix Framework Rubix Framework UI Display

 For HT we are developing a syndication clustering algorithm.  We have large amount of old news document and we were asked cluster. Manually clustering was nearly impossible  We implement a clustering Map/Reduce algorithm using Cosine Similarity and clustered the documents.

XML files/ Documents V1 V2 VN List of XML News Files Transformed into Integer Vector. One XML news file maps to One Vector. Apply CoSine Similarity Between Vectors Get the Minimum Distance Pair of Vector News Files News Files News Files News Files News Files News Files Create List of closely related stories HADOOP PLATFORM MAP FunctionalityREDUCE Functionality Cluster Algorithm C-Bayes Classification Categorize Documents

 Office Location:  India A-82, Sector 57, Noida, UP,  Japan ,Higashi Tabata Kita-ku,Tokyo,Japan  General Inquiries  Sales Inquiries