Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Analytics in Parallel Systems

Similar presentations


Presentation on theme: "Big Data Analytics in Parallel Systems"— Presentation transcript:

1 Big Data Analytics in Parallel Systems
Big Data Analytics: Data Mining + parallelism + text Machine learning models Graph algorithms Search engine technology Why perform analytics inside a parallel DBMS? Queries, speed, user/space management, consistency Security, fault tolerance, concurrency How? CS: Scalable algorithms, external data structures, relational algebra, SQL query optimization, UDFs Programming: C++, C, Java, Unix Math: linear algebra, graphs, numerical methods (Read title) Our motivation: Databases are getting larger, most data mining algorithms work on flat files, data coming in/out of the DBMS is time-consuming and error-prone. (then read the bullet points) Contributor: C. Ordonez 1

2 Recent projects Percentage cubes for DSS Graph analytics: beating Spark PCA with multicore CPUs R on streaming network data (read slide title) Our research coverd the entire spectrum of data mining, going from exploratory OLAP analysis up to predictive models. (then read the titles of each application) Contributor: C. Ordonez 2

3 Why we are different Parallel analytics on big data Applications:
Dimensionality reduction (PCA, factor analysis) Classification, regression, time series, histograms Graphs (page rank, reachability, clique detection) Patterns (association rules, OLAP cubes) Applications: Corporate databases and data lakes Medical: microarray data, heart, cancer diseases Network data, files, documents Expertise on both Parallel Data Systems Machine learning, advanced statistics, graphs (read slide title) We already have a set of fundamental algorithms working on several public and commercial DBMSs; our application areas are mainly biomedical, but can be applied anywhere where there is a large database that needs to analyzed. (then read the bullet points) Contributor: C. Ordonez 3


Download ppt "Big Data Analytics in Parallel Systems"

Similar presentations


Ads by Google