DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Published byModified over 5 years ago
Presentation on theme: "DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced."— Presentation transcript:
A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced back along three family lines: classical statistics, artificial intelligence, and machine learning. Union of historical and recent developments in statistics, artificial intelligence and machine learning.
Data Mining Overview Process of automatically searching large volumes of data for relationships and patterns. Discovery of information in terms of patterns or rules from vast amounts of data. Attempts to discover rules and patterns from data. Deals with “knowledge discovery” in databases. A valuable tool for business.
Goals Of Data Mining Prediction Involves using some variables or fields in the data set to predict unknown or future values of other variables. Data Mining can show how certain attributes within the data will behave in the future. Ex: Analysis of buying transactions to predict what customers will buy under certain discounts, how much sales a store will generate, and whether deleting a product would yield more profits. Ex: Credit card company predicting if a person is a good credit risk by looking at certain known attributes such as age, income, debts, and past debt repayment history.
Goals Of Data Mining cont.. Identification Data patterns can be used to identify the existence of an item, event, or activity. Ex: Hackers/Intruders trying to break into a system may be identified by the programs executed, files accessed, etc.. Biological Applications- existence of a gene may be identified by certain sequences of nucleotide symbols in the DNA sequence.
Goals Of Data Mining cont.. Classification Data Mining can partition data so that different categories can be identified based on combinations of parameters. Can be done by finding rules that partition given data into groups. Ex: Customers in a supermarket can be categorized into discount-seeking shoppers, shoppers in a rush, loyal regular shoppers, and infrequent shoppers. Ex: A credit card company wants to decide whether or not to give a credit card to an applicant.
Example To decide whether or not to give a credit card to applicant, the company assigns a credit worthiness level of good, average or bad to current customers. Therefore, rules are applied to this situation. Consider 2 attributes: Education level and Income Rules could be of the following: Ұ person P, P.degree=masters and P.income>75,000 >P.credit=excellent Ұ person P, P.degree=bachelors or (P.income>=25,000 and P.income<=75,000) >P.credit=good
Goals cont.. Optimization optimize the use of limited resources such as time, space, money, or materials and to maximize output variables such as sales or profits under a given set of constraints.
What Data Mining can do Enables companies to determine relationships among “internal” and “external” factors. Predict cross-sell opportunities and make recommendations Segment markets and personalize communications. Predicts outcomes of future situations
The process Of Data Mining There are 3 main steps in the Data Mining process: –Preparation: data is selected from the warehouse and “cleansed”. –Processing: algorithms are used to process the data. This step uses modeling to make predictions. –Analysis: output is evaluated.
Reasons for growing popularity Growing data volume- enormous amount of existing and appearing data that require processing. Limitations of Human Analysis- humans lacking objectiveness when analyzing dependencies for data. Low cost of Machine Learning- the data mining process has a lower cost than hiring highly trained professionals to analyze data.
Data Mining Techniques Association Rule- is to discover interesting associations between attributes that are contained in a database. Clustering- finds appropriate groupings of elements for a set of data. Sequential patterns-looking for patterns where one event leads to another later event. Classification- looking for new patterns.
Applications of Data Mining Data Mining is applied in the following areas: –Prediction of the Stock Market: predicting the future trends. –Bankruptcy prediction: prediction based on computer generated rules, using models –Foreign Exchange Market: Data Mining is used to identify trading rules. –Fraud Detection: construction of algorithms and models that will help recognize a variety of fraud patterns.