Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data, Data Mining, Tools

Similar presentations

Presentation on theme: "Big Data, Data Mining, Tools"— Presentation transcript:

1 Big Data, Data Mining, Tools

2 N = ALL



5 Data Sources...

6 Data Creation, Storage, Costs

7 Infrastructure

8 NoSQL Flavors

Not Only SQL (sort of) Greater scalability Designed with distributed computing and commodity (not cheap) hardware. Variety of flavors

10 Topic: Algorithms

11 Tools

12 Speaking of the Cloud

13 High Level Flow Example

14 Hadoop MapReduce

15 HDFS Distributed file system. Write-once/read many
Fault tolerance / Redundance Processing logic close to data

16 Traditional word count in Java

17 Hive CREATE TABLE docs (line STRING); CREATE TABLE word_counts AS
SELECT word, count(1) as count FROM (SELECT explode(split(line, ' ')) AS word FROM docs) w GROUP BY word ORDER BY word;

18 Hive with Some Structure
Data 123 F 456 M 789 M 111 M 222 M 333 F 444 F 555 M create table if not exists p_genders ( p_id string, gender string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; SELECT * from p_genders;

19 Pig Latin A = load 'S3://pmb4bucket/input/bleakhouse/bleakhouse.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word; C = group B by word; D = foreach C generate COUNT(B), group; store D into 's3://pmb4hadoop/output/bleakhouse';

20 Complex Event Processing

21 Tools

22 Data Scientist Not just a bean counter - it’s about modeling
General skill set: Math (linear algebra, statistics, calculus, discrete math) Business sense Programming skills Communication etc, etc, etc

23 Our Schedule Setting the goals for a data mining project.
Setting up KNime Gathering and preparing data. Visualization Machine Learning Naïve Bayes Clustering and Classification Dimension reduction

24 But first…

Download ppt "Big Data, Data Mining, Tools"

Similar presentations

Ads by Google