Presentation is loading. Please wait.

Presentation is loading. Please wait.

Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,

Similar presentations


Presentation on theme: "Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,"— Presentation transcript:

1 Committed to Deliver…

2  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop, Facebook version or Cloudera version in your own data center, or n cluster of machines Amazon EC2, Rackspace etc  We provide Scalable End-to-end Solution: Solution that can scale of large data set (Tera Bytes or Peta Bytes)  Low Cost Solution: Based on open source Framework currently used by Google, Yahoo and Facebook.  Solution optimized for minimum SLA and maximize performance

3 –Project Initiation Project Planning Requirement Collection POC using Hadoop technology –Team Building Highly skilled Hadoop experts Dedicated team for project –Agile Methodology Small Iterations Easy to implement changing requirement –Support Long term relationship to support developed product Scope to change based on business/technical need

4 The combined experience has led to the adoption of unique methodology that ensures quality work. We:  Evaluating the hardware available and understand the clients requirements.  Peeking through the data.  Analyzing data, prototype using M/R code. Show the results to our clients.  Iterative - and continuous improvement and develop better understanding of data.  Parallel development of various tasks: ◦ Data Collection ◦ Data Storage in HDFS ◦ M/R Analytics jobs. ◦ Scheduler to run M/R jobs and bring coordination. ◦ Transform output into OLAP cubes (Dimension and Fact Table) ◦ Provide a custom interface to retrieve the M/R output

5  We are expert in time series data, in other words we receive time-stamp data.  We have ample experience in writing efficient fast and robust Map/Reduce code which implement ETL functions.  We have massaged Hadoop to enterprise standard provided features like High Availability, Data Collection, data Merging.  Writing Map/Reduce is not enough. We wrote layers on top of Hadoop which uses Hive, Pig to transform data in OLAP cubes for easy UI consumption.

6  We provide a brief about our clients.

7 Collector Hadoop Cluster Map / Reduce Output UI Display UI Display Thrifit Service Thrifit Service Training Data Web UI Web UI

8 External News Collector Map/ Reduce Categorization Index Map/ Reduce (Filtering, Term Freq Collection) Map/ Reduce (Training Set) Training Data DFS Client Hive Interface

9  We were asked to analyze their sales data and extract valuable information from the data.  The Data was in form of 9-tuple format:  We were asked to provide information like unique subscribers count (used email address), per day transactions amount  We deployed the Hadoop cluster on three machines ◦ Deployed our collector to pump data from DB into HDFS ◦ Wrote M/R jobs to generate OLAP cubes. ◦ Provided Hive Interface to extract and show in UI.

10 OrderIDEmailIDMobile Num Payable Amount Delivery Charges Mode of Payment Order Status Order Site Day Granularity Actual Number of Customers Forecast Number of Customers Total Aggregate d Amount Forecast Aggregate d Amount Email IDPayable Amount

11  We delivered end-to-end reporting solution to Guavus.  The Data was provided by Sprint Network (Tier 1 Company) we had to develop a reporting engine to analyze and generate OLAP cubes.  We were asked to provide evaluate Peta Bytes of data provide ETL solution  We deployed the Hadoop cluster on 10 Linux machines.  We wrote our collector which read Binary Data and pushed into Hadoop Cluster.  We wrote M/R jobs (which run for 4 hrs) every day The idea was to provide provide analytics on stream data  We generate OLAP cubes and storing results in Infinity DB (column DB), Hive.

12 Reporting UI/ Web Interface Report Generation Task (Map / Reduce Framework) Data Collector Query Engine Hadoop Configuration Distributed Storage Framework (Hadoop / HDFS) Infinity DB / Hive / Pig

13 Hadoop Infrastructure Map / Reduce Tasks Data Collector Monitor / Overall Scheduler Infinity DB / Hive / Pig Rubix Framework Rubix Framework UI Display

14  For HT we are developing a syndication clustering algorithm.  We have large amount of old news document and we were asked cluster. Manually clustering was nearly impossible  We implement a clustering Map/Reduce algorithm using Cosine Similarity and clustered the documents.

15 XML files/ Documents V1 V2 VN List of XML News Files Transformed into Integer Vector. One XML news file maps to One Vector. Apply CoSine Similarity Between Vectors Get the Minimum Distance Pair of Vector News Files News Files News Files News Files News Files News Files Create List of closely related stories HADOOP PLATFORM MAP FunctionalityREDUCE Functionality Cluster Algorithm C-Bayes Classification Categorize Documents

16  Office Location:  India A-82, Sector 57, Noida, UP, 201301  Japan 2-8-6-405,Higashi Tabata Kita-ku,Tokyo,Japan  General Inquiries info@techcompiler.com  Sales Inquiries sales@techcompiler.comsales@techcompiler.com


Download ppt "Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,"

Similar presentations


Ads by Google