DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
AEB 37 / AE 802 Marketing Research Methods Week 7
Classification and Decision Boundaries
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
6/3/2015 T.K. Cocx, Prediction of criminal careers through 2- dimensional Extrapolation W. Kosters et al.
K nearest neighbor and Rocchio algorithm
Learning from Observations Chapter 18 Section 1 – 4.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Case-based Reasoning System (CBR)
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Recommender systems Ram Akella November 26 th 2008.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
CS Instance Based Learning1 Instance Based Learning.
Algorithms for Data Analytics Chapter 3. Plans Introduction to Data-intensive computing (Lecture 1) Statistical Inference: Foundations of statistics (Chapter.
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Data Mining Techniques
B.Ramamurthy. Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference.
SWARM INTELLIGENCE IN DATA MINING Written by Crina Grosan, Ajith Abraham & Monica Chis Presented by Megan Rose Bryant.
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
Data mining and machine learning A brief introduction.
K Nearest Neighborhood (KNNs)
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.
K Nearest Neighbors Saed Sayad 1www.ismartsoft.com.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Data Warehousing.
NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Knowledge Learning by Using Case Based Reasoning (CBR)
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Clustering Prof. Ramin Zabih
Selecting Diverse Sets of Compounds C371 Fall 2004.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Chapter 6 - Basic Similarity Topics
Machine Learning Queens College Lecture 7: Clustering.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
K nearest neighbors algorithm Parallelization on Cuda PROF. VELJKO MILUTINOVIĆ MAŠA KNEŽEVIĆ 3037/2015.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Mining Job Monitoring Data Automatic Error.
KNN & Naïve Bayes Hongning Wang
Jawad Tahsin Danish Mustafa Zaidi Kazim Zaidi Zulfiqar Hadi.
Module 11: File Structure
Machine Learning Clustering: K-means Supervised Learning
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Mining Practical Machine Learning Tools and Techniques
Nearest-Neighbor Classifiers
Research Areas Christoph F. Eick
Prepared by: Mahmoud Rafeek Al-Farra
COSC 4335: Other Classification Techniques
CREATING A GOOD POSTER FOR PRESENTATION
Text Categorization Berlin Chen 2003 Reference:
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

2 Agenda General description of the problem Functionality Data Mining aspects Algorithm and optimisation Data Base aspects General entities scheme

3 General Description Universal Tool Different kinds of objects (e.g. preprocessed photos, hospital patients data) Finding similar objects Decision problems

4 Functionality Independent system – user operated Using sets of data already provided or uploading new types Influence on the way data is processed Possible usage in bigger systems as a processing engine Additional module used as a helping tool in more complex systems

5 General Use Case

6 Data Mining General Ideas Description of a object Definition of a distance K-NN algorithm Brief explanations of the algorithm Optimization Problem of comparing large number of objects Optimized solution – using grouping idea

7 Definitions Objects

8 K-NN K – Nearest Neighbors Idea standing behind k-nn Aim - finding k-similar objects to the one we are analyzing and eventually assigning appropriate decision Method - calculating distance from analyzed object to the others in our database and finding the closest ones

9 K-NN Graphical representation

10 Definitions Distance Calculations in multidimensional space Coefficients Alfa w i – weights – underlining importance of particular attributes n – number of all the attributes

11 Optimalisation The reason – cost of multidimensional distance computation for 1-all elements Solution – improved Knn Result – better efficiency because of reduced number of distance computations due to narrowed set of possibly similar objects

12 Step 1 - Group-oriented plane division

13 Step 2 – new Object appeares

14 Step 3

15 Step 4

16 Step 5

17 Grouping problem The problem – assigning object into appropriate groups according to chosen distance definition Solution – some clustering algorithm Brief example – k-means algorithm

18 DataBase – entities

19 DataBase General structure of database results from optimization issues Due to universal purpose of the system database may contain many different tables of objects Need of using system tables for defining experiments Group Member as a temporary table ?

20 Summary There is still a lot of work to do...