Data Resource Management – MGMT 4170. An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.

Data Resource Management – MGMT 4170

An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical Data OLAP CUBE 2 Warehouse Cube OLAP CUBE 3 Store Cube Transaction Processing System -RDBMS- Relational Data ETL Packages Cube Developer DW DeveloperETL Developer Operational Users OLAP client Excel OLAP client Application OLAP client web based DM Model 1 DM Model 2 DM Model 3 Data Miner

Information Retrieval Evolution

Predictive Analysis PresentationExplorationDiscovery Passive Interactive Proactive Role of Software Business Insight Static Reports Dynamic Reports - Parameters OLAP Data mining DM Enables Predictive Analysis copyright Rafal Lukawiecki Cross tabbing - Pivoting

Basic Data Mining Terminology Data mining needs historical data. Hence the importance of the data warehouse. The DW does not only provide historical data; it is designed to respond to business needs by OLAP cubes and data mining models. The data mining software would use this historical data to build prediction models like customer behavior or product sales. We build a model, we test it in cases we know the answer, we verify its predictive power and then we apply the model in situations where we do not know the answer.

Data Mining Modeling Data warehouse historical data Data Mining Algorithm Data Mining Model We build a model from data we have in the data warehouse. We then test the model to verify its predictive power. We can then apply the model in new data.

Historical Data

Data Mining Models

What skills a data miner needs Good business sense and good-to-excellent working relationships with the business folks. Good-to-excellent knowledge of Integration Services and SQL. A good understanding of statistics and probability. Data mining experience.

Basic Data Mining Terminology Dependent variable(s): The variable we are trying to predict like the likelihood of purchase. Independent variable(s): The variables which provide the data used to build the model like home ownership, education level, cars owned, etc. Algorithm: The programmatic technique used to identify the relationships or patterns in the data.

Basic Data Mining Terminology Continuous variables: variables with decimal numbers or uncertain quantities are continuous. A column in an employee table such as Salary that contains a variety of actual salary values is a continuous variable. Discrete variables: You can add a column to the table during data preparation called SalaryRange, containing integers to represent encoded salary ranges (1 = "0 to $25,000"; 2 = "between $25,000 and $50,000"; and so on). This is a discrete variable.

Data Mining Models The basic algorithms include: Classification Estimation Prediction Affinity grouping Clustering Description and profiling.

Classification Definition Classification is the task of assigning each item in a set to one predetermined set based on its attributes or behaviors (buyer or non buyer). We can identify classes of consumers who have common geographic, demographic, economic, and behavioral attributes and can be expected to respond to certain opportunities in a similar way. Classification assigns an item to a specific class based on a discrete variable value like 0 or 1. Determining whether someone is likely to respond to a direct mail piece involves putting them in the category of Likely Responder or not. Algorithms to build the models: Decision Trees Neural Networks Naïve Bayes

Estimation (likelihood to respond) Definition Estimation is the continuous version of classification. That is to say, where classification returns a discrete value like 0 or 1, estimation returns a continuous number. For example, a promotions manager with a budget for 200,000 pieces and a list of 12 million prospects would use the predicted Response_Likeiihood variable to limit the target subset. Including only those prospects with a Response_Likelihood greater than some number, say 0.80, would give the promotions manager a target list of the top 200,000 prospects. Most of the estimation algorithms are based on regression analysis techniques. As a result, this category is often called regression. Algorithms to use: Decision Trees Neural Networks

Prediction (Predicting a value) Definition Prediction seeks to determine a value as accurately as possible before the value is known. This future- oriented element is what places prediction in its own category. For example, a lending company offering mortgages might want to predict the market value of a piece of property before it's sold regardless of the actual amount has been offered for the given property. In order to build a predictive data mining model, the company needs a training set that includes predictive attributes that are known prior to the sale, such as total square footage, number of bathrooms, city, school district, and the actual sale price of each property in the training set. The data mining algorithm uses this training set to build a model based on the relationships between the predictive variables and the known historical sale price. The model can then be used to predict the sale price of a new property based on the known input variables about that property. One feature of predictive models is that their accuracy can be tested. At some point in the future, the actual sale amount of the property will become known and can be compared to the predicted value. Algorithms to use: Decision Trees Neural Networks When prediction involves time series data, it is often called forecasting. Time Series is the first choice algorithm for predicting time series data, like monthly sales forecasts.

Association (market basket analysis) Definition Association looks for correlations among items. E-commerce systems are big users of association models in an effort to increase sales. This can take the form of an association modeling process known as market basket analysis. The online retailer first builds a model based on the contents of recent shopping carts and makes it available to the web server. As the shopper adds products to the cart, the system feeds the contents of the cart into the model. The model identifies items that commonly appear with the items currently in the cart. Most recommendation systems are based on association algorithms. Algorithms to use: Association Decision Trees

Clustering (Segmentation) Definition Clustering can be thought of as auto-classification. Clustering algorithms group cases into clusters that are as similar to one another, and as different from other clusters, as possible. The clusters are not predetermined, and it's up to the data miner to examine the clusters to understand what makes them unique. When applied to customers, this process is also known as customer segmentation. The idea is to segment the customers into smaller, homogenous groups that can be targeted with customized promotions and even customized products. One form of clustering is to identify frequent sequences in the data. For example, a consumer electronics product manufacturer's website might identify several clusters of users based on their browsing behavior. Algorithms to use: Clustering Sequence Clustering

Data Resource Management – MGMT 4170. An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.

Similar presentations

Presentation on theme: "Data Resource Management – MGMT 4170. An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Resource Management – MGMT 4170. An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.

Similar presentations

Presentation on theme: "Data Resource Management – MGMT 4170. An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical."— Presentation transcript:

Similar presentations

About project

Feedback