Presentation is loading. Please wait.

Presentation is loading. Please wait.

B.Ramamurthy. Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference.

Similar presentations


Presentation on theme: "B.Ramamurthy. Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference."— Presentation transcript:

1 B.Ramamurthy

2 Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference Decisions/ Answers/ Results * *

3  Pipelines to prepare data  Three types: 1. Data preparation algorithms such as sorting, workflows 2. Optimization algorithms stochastic gradient descent, least squares… 3. Machine learning algorithms…

4  Comes from Artificial Intelligence  No underlying generative process  Build to predict or classify something  Three basic algorithms:  linear regression, k-nn, k-means  We already looked at linear regression as a case study for R/Rstudio  We will start with k-means…

5  K-means is unsupervised: no prior knowledge of the “right answer”  Goal of the algorithm Is to determine the definition of the right answer by finding clusters of data  Kind of satisfaction survey data, incident report data,  Assume data {age, gender, income, state, household, size}, your goal is to segment the users.  K-means is the simplest of the clustering algorithms.  Lets understand kmeans using an example.

6  {Age, income range, education, skills, social, paid work}  Lets take just the age { 23, 25, 24, 23, 21, 31, 32, 30,31, 30, 37, 35, 38, 37, 39, 42, 43, 45, 43, 45}  Classify this data using K-means  Lets assume K = 3 or 3 groups  Give me a guess of the centroids? Lets assume initial value of centroids to {21, 30, 40}  First lets hand calculate and then use R-Studio

7  Supervised ML  You know the “right answers” or at least data that is “labeled”: training set  Set of objects have been classified or labeled (training set)  Another set of objects are yet to be labeled or classified (test set)  Your goal is to automate the processes of labeling the test set.  Intuition behind k-NN is to consider most similar items --- similarity defined by their attributes, look at the existing label and assign the object a label.

8 Lets look at an example AgeLoan (X1000)Default 2540N 3560N 4580N 20 N 35120N 5218Y 2395Y 4062Y 60100Y 48220Y 33150Y


Download ppt "B.Ramamurthy. Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference."

Similar presentations


Ads by Google