2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference.

Slides:

Advertisements

Similar presentations

29/1/2014 Efficient Updates for a Shared Nothing Analytics Platform Katerina Doka, Dimitrios Tsoumakos, Nectarios Koziris {katerina, dtsouma,

Advertisements

Dan Bassett, Jonathan Canfield December 13, 2011.

The 7 th Ultrascale Visualization Workshop November 12, 2012 Salt Lake City.

R and HDInsight in Microsoft Azure

INTEGRATING BIG DATA TECHNOLOGY INTO LEGACY SYSTEMS Robert Cooley, Ph.D.CodeFreeze 1/16/2014.

Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.

Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.

FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)

Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.

 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)

KDD’14 Debrief 24 th April - 27 st, th April - 27 st August, 2014 New York City, US WING Monthly Meeting (Oct 24, 2014) Presented by Xiangnan He.

Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.

Big Data A big step towards innovation, competition and productivity.

SM STRATA PRESENTATION Tim Garnto - SVP Engineering, edo Interactive Rob Rosen – Big Data Field Lead, Pentaho.

SYSTEMS SUPPORT FOR GRAPHICAL LEARNING Ken Birman 1 CS6410 Fall /18/2014.

Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion.

This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.

Ch 4. The Evolution of Analytic Scalability

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

8/9/2015 Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, dtsouma, Computing Systems Laboratory.

Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce

SYSTEMS SUPPORT FOR GRAPHICAL LEARNING Ken Birman 1 CS6410 Fall /18/2014.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.

© Hortonworks Inc Hortonworks Page 1. © Hortonworks Inc Big Data Changes the Game Megabytes Gigabytes Terabytes Petabytes Purchase detail.

Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.

Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.

L/O/G/O 云端的小飞象系列报告之二 Cloud 组. L/O/G/O Hadoop in SIGMOD

Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.

Hadoop Ali Sharza Khan High Performance Computing 1.

An Introduction to HDInsight June 27 th,

CSM06 Information Retrieval Lecture 1a – Introduction Dr Andrew Salway

Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.

© 2007 IBM Corporation IBM Information Management Accelerate information on demand with dynamic warehousing April 2007.

Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

© 2013 IBM Corporation 1 Title of presentation goes Elisa Martín Garijo IBM Distinguish Engineer and CTO for IBM Spain. Global Technology.

ISQS 3358, Business Intelligence Anatomy of Business Intelligence Zhangxi Lin Texas Tech University 1.

Big Data Analytics with Excel Peter Myers Bitwise Solutions.

Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.

Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.

Computing & Information Sciences Kansas State University An Overview of Big Data Analytics: Challenges & Selected Applications Guest Seminar Drake University.

B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.

Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,

Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.

Hive Big data for CSci 4707 students! Eric Atherton and Henry Hoang.

Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.

Data Analytics (CS40003) Introduction to Data Lecture #1

Big Data & Test Automation

INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER

Hadoopla: Microsoft and the Hadoop Ecosystem

Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.

Central Florida Business Intelligence User Group

© 2016 Global Market Insights, Inc. USA. All Rights Reserved Fuel Cell Market size worth $25.5bn by 2024Low Power Wide Area Network.

New Frontiers in Computing 2010 August 14, 2010 San Jose State University Howard Ho, Steve Chang, Frank Nothaft.

Introduction to Spark.

IEEE 2018 Emerging Technologies Reliability Roundtable

Business Intelligence for Project Server/Online

Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.

Microsoft Connect /22/2018 9:50 PM

Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.

Ch 4. The Evolution of Analytic Scalability

Big Data Young Lee BUS 550.

Big-Data Analytics with Azure HDInsight

UNIT 6 RECENT TRENDS.

Presentation transcript:

2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference (WWW2010)

Dashboards Embedded Analytics Financial Planning Mash ups Scorecards Search Making Sense of Mountains of Data Billions of mobile devices Semi-struct ClickSteam, CRM Claim data (text, picture, video) Call data records Location Tracking (GPS), iPhone, Vehicle Use Data, $ Trans tracking (Across borders & IP providers), Feeds: Census Bureau Data Market Data, Weather Data Sensors data Online Transaction Processing System PetaBytes -> Exabytes Auto/Cross Correlation Analytics, Predictive Analytics Deep & Wide Analytics Fine grained – individual product and customer at a time and place Feedback/Action Semi-Un-struct Structured Continuous arrival of high volume information (evolving, highly variant) (struct-/semi--/un-structured Web Data (for search) Web Buz data (for reputation analysis) Semi-Un-struct

Massive Data Analytic Platforms Google: Original MapReduce implementation Microsoft: Dryad Yahoo!, Facebook, and many others: Hadoop Ecosystems: Hive, Pig, Jaql, Zookeeper, Alternatives to Map/Reduce, e.g. Pregel M M M R R Partition Sort C C C Easy parallelism Scalability Fault-Tolerance Elastic Flexibility Cost / Performance 1000s processors Petabytes of data …and growing

Chairpeople Perspective Other parallel systems technology and customers –Parallel Database – enterprise data warehousing –Parallel ETL (extraction, transformation, load) –Search and text analytics Hadoop and related technologies –Finance, Telco, Healthcare, Retail, Government, …

Questions Posed in Call For Papers What kinds of problems are people trying to solve? How are existing massive-scaleout platforms used, and what extensions would be helpful? Other kinds of platforms for different problems? How to integrate with existing environments such as data warehouses? Challenges in managing massive datasets? Legal/moral challenges associated with mining these data sets?

Agenda (morning) 9: :30: Session 1 Introduction and Welcome Invited Talk: "Hadoop: An Industry Perspective" Dr. Amr Awadallah, CTO, VP-Engineering, Cloudera 10: :00: Coffee Break* 11: :30: Session 2 Distributed Indexing of Web Scale Datasets for the Cloud Ioannis Konstantinou, Evangelos Angelou, Dimitrios Tsoumakos, Nectarios Koziris; National Technical University of Athens Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-Reduce Joos-Hendrik Böse 1, Artur Andrzejak 2, Mikael Högqvist 2 ; 1 Intl. Comp. Sci. Institute, 2 Zuse Institute Berlin (ZIB) Efficient Updates for a Shared Nothing Analytics Platform Katerina Doka 3, Dimitrios Tsoumakos 4, Nectarios Koziris 3 ; 3 National Technical University of Athens, Greece, 4 University of Cyprus 12:30 - 1:30: Lunch*

Agenda (afternoon) 1:30 - 3:30: Session 3 Invited Talk: "Large Scale Applications on Hadoop in Yahoo" Dr. Vijay Narayanan, Yahoo! Labs Silicon Valley, Extracting User Profiles from Large Scale Data Michal Shmueli-Scheuer, Haggai Roitman, David Carmel, Yosi Mass, David Konopnicki; IBM Research, Haifa A Novel Approach to Multiple Sequence Alignment using Hadoop Data Grids Sudha Sadasivam, G. Baktavatchalam; PSG College of Technology 3:30 - 4:00: Coffee Break* 4:00 - 5:30: Session 4 Towards Scalable RDF Graph Analytics on MapReduce Padmashree Ravindra, Vikas Deshpande, Kemafor Anyanwu; North Carolina State University SPARQL Basic Graph Pattern Processing with Iterative MapReduce Jaeseok Myung, Jongheum Yeon, Sang-goo Lee; Seoul National University Parallelizing Random Walk with Restart for Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng; National Chiao Tung University Hsinchu, Taiwan

Acknowledgements Workshop Chairs Ullas Nambiar, IBM India Research Lab, New Delhi, India John McPherson, IBM Almaden Research Center, USA David Konopnicki, IBM Haifa Research Lab, Israel Steering Committee Rakesh Agrawal, Microsoft Search Labs, Mountain View, CA, USA Alon Halevy, Google Inc., Mountain View, CA, USA Invited Speakers Amr Awadallah, CTO, VP-Engineering, Cloudera, "Hadoop: An Industry Perspective" Vijay Narayanan, Yahoo! Labs Silicon Valley, "Large Scale User Modeling on Hadoop" Program Committee Amr Awadallah, Cloudera, USA Andrew McCallum, University of Massachusetts Amherst, USA Assaf Schuster, Technion - Israel Institute of Technology Gautam Das, University of Texas, Arlington, USA Jimeng Sun, IBM Watson Research Center, USA John Shafer, Microsoft Search Labs, USA Kevin Chang, University of Illinois at Urbana-Champaign, USA Kun Liu, Yahoo! Labs, USA Louiqa Raschid, University of Maryland, College Park, USA Michal Shmueli-Scheuer, IBM Haifa Research Lab, Israel Michael Sheng, University of Adelaide, Australia Mong Li Lee, National University of Singapore, Singapore Rajeev Gupta, IBM India Research Lab, India Vanja Josifovski, Yahoo Research, USA Yannis Sismanis, IBM Almaden Research Center, USA Yi Chen, Arizona State University, USA Wen-syan Li, SAP, China