Presentation is loading. Please wait.

Presentation is loading. Please wait.

2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference.

Similar presentations


Presentation on theme: "2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference."— Presentation transcript:

1 2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference (WWW2010)

2 Dashboards Embedded Analytics Financial Planning Mash ups Scorecards Search Making Sense of Mountains of Data Billions of mobile devices Semi-struct ClickSteam, CRM Claim data (text, picture, video) Call data records Location Tracking (GPS), iPhone, Vehicle Use Data, $ Trans tracking (Across borders & IP providers), Feeds: Census Bureau Data Market Data, Weather Data Sensors data Online Transaction Processing System PetaBytes -> Exabytes Auto/Cross Correlation Analytics, Predictive Analytics Deep & Wide Analytics Fine grained – individual product and customer at a time and place Feedback/Action Semi-Un-struct Structured Continuous arrival of high volume information (evolving, highly variant) (struct-/semi--/un-structured Web Data (for search) Web Buz data (for reputation analysis) Semi-Un-struct

3 Massive Data Analytic Platforms Google: Original MapReduce implementation Microsoft: Dryad Yahoo!, Facebook, and many others: Hadoop Ecosystems: Hive, Pig, Jaql, Zookeeper, Alternatives to Map/Reduce, e.g. Pregel M M M R R Partition Sort C C C Easy parallelism Scalability Fault-Tolerance Elastic Flexibility Cost / Performance 1000s processors Petabytes of data …and growing

4 Chairpeople Perspective Other parallel systems technology and customers –Parallel Database – enterprise data warehousing –Parallel ETL (extraction, transformation, load) –Search and text analytics Hadoop and related technologies –Finance, Telco, Healthcare, Retail, Government, …

5 Questions Posed in Call For Papers What kinds of problems are people trying to solve? How are existing massive-scaleout platforms used, and what extensions would be helpful? Other kinds of platforms for different problems? How to integrate with existing environments such as data warehouses? Challenges in managing massive datasets? Legal/moral challenges associated with mining these data sets?

6 Agenda (morning) 9:00 - 10:30: Session 1 Introduction and Welcome Invited Talk: "Hadoop: An Industry Perspective" Dr. Amr Awadallah, CTO, VP-Engineering, Cloudera 10:30 - 11:00: Coffee Break* 11:00 - 12:30: Session 2 Distributed Indexing of Web Scale Datasets for the Cloud Ioannis Konstantinou, Evangelos Angelou, Dimitrios Tsoumakos, Nectarios Koziris; National Technical University of Athens Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-Reduce Joos-Hendrik Böse 1, Artur Andrzejak 2, Mikael Högqvist 2 ; 1 Intl. Comp. Sci. Institute, 2 Zuse Institute Berlin (ZIB) Efficient Updates for a Shared Nothing Analytics Platform Katerina Doka 3, Dimitrios Tsoumakos 4, Nectarios Koziris 3 ; 3 National Technical University of Athens, Greece, 4 University of Cyprus 12:30 - 1:30: Lunch*

7 Agenda (afternoon) 1:30 - 3:30: Session 3 Invited Talk: "Large Scale Applications on Hadoop in Yahoo" Dr. Vijay Narayanan, Yahoo! Labs Silicon Valley, Extracting User Profiles from Large Scale Data Michal Shmueli-Scheuer, Haggai Roitman, David Carmel, Yosi Mass, David Konopnicki; IBM Research, Haifa A Novel Approach to Multiple Sequence Alignment using Hadoop Data Grids Sudha Sadasivam, G. Baktavatchalam; PSG College of Technology 3:30 - 4:00: Coffee Break* 4:00 - 5:30: Session 4 Towards Scalable RDF Graph Analytics on MapReduce Padmashree Ravindra, Vikas Deshpande, Kemafor Anyanwu; North Carolina State University SPARQL Basic Graph Pattern Processing with Iterative MapReduce Jaeseok Myung, Jongheum Yeon, Sang-goo Lee; Seoul National University Parallelizing Random Walk with Restart for Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng; National Chiao Tung University Hsinchu, Taiwan

8 Acknowledgements Workshop Chairs Ullas Nambiar, IBM India Research Lab, New Delhi, India John McPherson, IBM Almaden Research Center, USA David Konopnicki, IBM Haifa Research Lab, Israel Steering Committee Rakesh Agrawal, Microsoft Search Labs, Mountain View, CA, USA Alon Halevy, Google Inc., Mountain View, CA, USA Invited Speakers Amr Awadallah, CTO, VP-Engineering, Cloudera, "Hadoop: An Industry Perspective" Vijay Narayanan, Yahoo! Labs Silicon Valley, "Large Scale User Modeling on Hadoop" Program Committee Amr Awadallah, Cloudera, USA Andrew McCallum, University of Massachusetts Amherst, USA Assaf Schuster, Technion - Israel Institute of Technology Gautam Das, University of Texas, Arlington, USA Jimeng Sun, IBM Watson Research Center, USA John Shafer, Microsoft Search Labs, USA Kevin Chang, University of Illinois at Urbana-Champaign, USA Kun Liu, Yahoo! Labs, USA Louiqa Raschid, University of Maryland, College Park, USA Michal Shmueli-Scheuer, IBM Haifa Research Lab, Israel Michael Sheng, University of Adelaide, Australia Mong Li Lee, National University of Singapore, Singapore Rajeev Gupta, IBM India Research Lab, India Vanja Josifovski, Yahoo Research, USA Yannis Sismanis, IBM Almaden Research Center, USA Yi Chen, Arizona State University, USA Wen-syan Li, SAP, China


Download ppt "2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference."

Similar presentations


Ads by Google