Big Data Analytics in Parallel Systems

Slides:



Advertisements
Similar presentations
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Advertisements

Preface Exponential growth of data volume, steady drop in storage costs, and rapid increase in storage capacity Inadequacy of the sequential processing.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro.
CS346: Advanced Databases Graham Cormode Term 2.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
Life Sciences Integrated Demo Joyce Peng Senior Product Manager, Life Sciences Oracle Corporation
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
HadoopDB Presenters: Serva rashidyan Somaie shahrokhi Aida parbale Spring 2012 azad university of sanandaj 1.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
DELIVERING THE ENTERPRISE FABRIC FOR BIG DATA Aiaz Kazi SVP, Platform Strategy and Adoption
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Database Systems Carlos Ordonez. What is “Database systems” research? Input? large data sets, large files, relational tables How? Fast external algorithms;
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)
Data Visualization Michel Bruley Teradata Aster EMEA Marketing Director April 2013 Michel Bruley Teradata Aster EMEA Marketing Director.
1 Beginning & Intermediate Algebra – Math 103 Math, Statistics & Physics.
1 Database Systems Group Research Overview OLAP Statistical Tests Goal: Isolate factors that cause significant changes in a measured value – Ex:
Mining of Massive Datasets Edited based on Leskovec’s from
CS 784: Advanced Topics in Data Management This semester’s focus: Data Science AnHai Doan.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Knowledge Discovery in a DBMS Data Mining Computing models and finding patterns in large databases current major challenge in database systems & large.
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
Book web site:
Sub-fields of computer science. Sub-fields of computer science.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
CSCI5570 Large Scale Data Processing Systems
Image taken from: slideshare
Data Mining – Intro.
NOSQL.
Introduction to R Programming with AzureML
Big Data Machine Learning using Apache Spark MLlib
قاعدة البيانات Database
Jiawei Han Department of Computer Science
The R language and its Dynamic Runtime
Spark Software Stack Inf-2202 Concurrent and Data-Intensive Programming Fall 2016 Lars Ailo Bongo
CS7280: Special Topics in Data Mining Information/Social Networks
قاعدة البيانات Database
机器感知与智能教育部重点实验室学术报告 Key Laboratory of Machine Perception (Minister of Education) Peking University Scalable, Robust and Integrative Algorithms for Analyzing.
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
1.1 The Evolution of Database Systems
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Data Warehousing and Data Mining
Others Structure Prediction Clustering DATA MINING Association Rules
Topics Covered in COSC 6340 Data models (ER, Relational, XML)
Data Mining: Concepts and Techniques
Parallel Analytic Systems
Data Mining: Concepts and Techniques
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Course Introduction CSC 576: Data Mining.
Data Mining: Introduction
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Data Mining: Concepts and Techniques
Wellington Cabrera Carlos Ordonez
Wellington Cabrera, Carlos Ordonez (presenter)
Wellington Cabrera Advisor: Carlos Ordonez
Welcome! Knowledge Discovery and Data Mining
Carlos Ordonez, Javier Garcia-Garcia,
CS 239 – Big Data Systems Fall 2018
Presentation transcript:

Big Data Analytics in Parallel Systems Big Data Analytics: Data Mining + parallelism + text Machine learning models Graph algorithms Search engine technology Why perform analytics inside a parallel DBMS? Queries, speed, user/space management, consistency Security, fault tolerance, concurrency How? CS: Scalable algorithms, external data structures, relational algebra, SQL query optimization, UDFs Programming: C++, C, Java, Unix Math: linear algebra, graphs, numerical methods (Read title) Our motivation: Databases are getting larger, most data mining algorithms work on flat files, data coming in/out of the DBMS is time-consuming and error-prone. (then read the bullet points) Contributor: C. Ordonez Email: carlos@uh.edu 1

Recent projects Percentage cubes for DSS Graph analytics: beating Spark PCA with multicore CPUs R on streaming network data (read slide title) Our research coverd the entire spectrum of data mining, going from exploratory OLAP analysis up to predictive models. (then read the titles of each application) Contributor: C. Ordonez Email: carlos@uh.edu 2

Why we are different Parallel analytics on big data Applications: Dimensionality reduction (PCA, factor analysis) Classification, regression, time series, histograms Graphs (page rank, reachability, clique detection) Patterns (association rules, OLAP cubes) Applications: Corporate databases and data lakes Medical: microarray data, heart, cancer diseases Network data, files, documents Expertise on both Parallel Data Systems Machine learning, advanced statistics, graphs (read slide title) We already have a set of fundamental algorithms working on several public and commercial DBMSs; our application areas are mainly biomedical, but can be applied anywhere where there is a large database that needs to analyzed. (then read the bullet points) Contributor: C. Ordonez Email: carlos@uh.edu 3