Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.
Project Proposal.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
1. Abstract 2 Introduction Related Work Conclusion References.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
/faculteit technologie management Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro)
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Introduction to Data Mining with Case Studies
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Business Intelligence
Data Mining and Decision Tree CS157B Spring 2006 Masumi Shimoda.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Introduction to Data Mining Engineering Group in ACL.
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
CSc288 Term Project Data mining on predict Voice-over-IP Phones market Huaqin Xu.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Data Mining Techniques
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining Chun-Hung Chou
Introduction: The essential background
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Course Title Database Technologies Instructor: Dr ALI DAUD Course Credits: 3 with Lab Total Hours: 45 approximately.
Chapter 9 Neural Network.
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
CS525 DATA MINING COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Data Mining with Oracle using Classification and Clustering Algorithms Presented by Nhamo Mdzingwa Supervisor: John Ebden.
Machine Learning Lecture 1. Course Information Text book “Introduction to Machine Learning” by Ethem Alpaydin, MIT Press. Reference book “Data Mining.
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10
An Investigation of Oracle and SQL Server with respect to Integrity, and SQL Language standards Presented by: Paul Tarwireyi Supervisor: John Ebden.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
An Evaluation of Commercial Data Mining Proposed and Presented by Emily Davis Supervisor: John Ebden.
General Information 439 – Data Mining Assist.Prof.Dr. Derya BİRANT.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
COMP53311 Knowledge Discovery in Databases Overview Prepared by Raymond Wong Presented by Raymond Wong
October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Project Seminar on STABLE CLUSTERING ALGORITHM TO IDENTIFY CPU USAGE OF COMPUTERS BEHAVIOR IN GRID ENVIRONMENT Under the guidance of Prof. Lakshmi Rajamani.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Data Mining Copyright KEYSOFT Solutions.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
DATA MINING: LECTURE 1 By Dr. Hammad A. Qureshi Introduction to the Course and the Field There is an inherent meaning in everything. “Signs for people.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
1 SBM411 資料探勘 陳春賢. 2 Lecture I Class Introduction.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
MULTI DISEASE CLASSIFICATION BASED ON EFFECTIVE ANALYTICAL TECHNIQUES Guide: Mr.R. Nandhi kesavan S.Aabitha Banu A.Karthika.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
DATA MINING © Prentice Hall.
Data Mining 101 with Scikit-Learn
Data Mining: Concepts and Techniques Course Outline
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Performance And Scalability In Oracle9i And SQL Server 2000
Comparative Evaluation of SOM-Ward Clustering and Decision Tree for Conducting Customer-Portfolio Analysis By 1Oloyede Ayodele, 2Ogunlana Deborah, 1Adeyemi.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Dept. of Computer Science University of Liverpool
Presentation transcript:

Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden

Presentation Outline  Problem Statement  Objective  Background  Expected Results  Possible Extensions  Plan of action  Timeline  Literature Survey  Questions

Problem Statement The commercial world is fast reacting to the growth & potential in the DM area, as a wide range of tools are being marketed as DM suites. Examples of these are: Oracle DM DB2’s Intelligent Miner Informix’s Data Mine SQL Data miner Ghost miner Clementine 9.0 (SPSS) SAS Gornish systems, etc

Problem It is vital to know the algorithms a DM suite uses and which algorithm to use on a particular data set. Secondly, how well each algorithm performs in terms of accuracy, efficiency and effectiveness when using a particular DM suite e.g. Oracle DM.

Objective Investigate two types of algorithms available in Oracle for data mining (ODM). Apply the two algorithms to actual data. Analyse & Evaluate results in terms of performance.

What is Data Mining? (Background) Simply put, DM is knowledge discovery. DM is the process of automatic discovery of [hidden] patterns and relationships within enormous amounts of data. It is a powerful & new technology that allows businesses to make proactive, knowledge-driven decisions as it tries to predict the future. Data (represents knowledge) normally stored in databases and data warehouses ( typical size in tera- bytes).

Automatic discovery is implemented by the use of algorithms provided by DM suites E.g. oracle offers:  Adaptive Bayes Network supporting decision trees (classification)  Naive Bayes (classification) 1. Model Seeker (classification) 2. k-Means (clustering) 3. O-Cluster (clustering) 4. Predictive variance (attribute importance) 5. Apriori (association rules)

Algorithms are grouped as either supervised or unsupervised learning strategies. DM strategies Unsupervised learning Supervised learning Classification Naive Bayes Model Seeker Adaptive Bayes Estimation Prediction Predictive variance Clustering k-Means O-Cluster Input attributes and output one or more attributes Input attributes but have no output attributes

The data mining process involves a series of steps to define a business problem, gather and prepare the data, build and evaluate mining models, and apply the models and disseminate the new information.

Expected Results Aim at conclusively saying which algorithm will be most effective and suitable for the process of data mining on any dataset - since datasets are different.

Possible Extensions to the Project: testing of the same algorithms with different tools offered by other vendors. e.g. testing with the DM suite in SQL and checking if the results are similar. If not, investigating why the results are different, could be another extension.

Plan of Action Carry out a literature search: mainly to obtain background knowledge and understanding of field. Get to know Oracle DM Suite: Do DM tutorials provided by oracle. The server Ora1 is the machine I’ll be working with. It is already installed with JDeveloper & oracle 10g database, oracle 9i DM.

Continuation from literature and tutorials done Investigate Clustering & Classification algorithms (theory) 2nd term- 15 to 30 April Find suitable computerised case studies of the use of above algorithms – with or without Oracle. 2nd term- End of May Search databases for testing (possibilities: AIDS data & faculty data) 2nd term- End of May Apply algorithms to data found then Critically Analyse & assess results Second semester Write up paperSeptember vacation and 3rd term Final project write up Due 7/11 Timeline

Literature Survey Richard J. Roiger and Michael W. Geatz, Data mining: a tutorial- based primer. Boston, Massachusetts, Addison Wesley, 2003; This book will provide the necessary background and practical knowledge required for the project research and also presents different methodologies used in data mining that may be useful.

David Hand, Heikki Mannila and Padhraic Smyth, Principles of data mining. Cambridge Massachusetts, MIT Press, Jesus Mena, Data mining your website. Digital Press, Jiawei Han and Micheline Kamber, Data mining: concepts and techniques San Francisco, California, Morgan Kauffmann, 2001 Robert P. Trueblood and John N. Lovett, Jnr. Data Mining and Statistical Analysis Using SQL, USA, Apress, tm tm

Questions? Thank you