Presentation is loading. Please wait.

Presentation is loading. Please wait.

DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski.

Similar presentations


Presentation on theme: "DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski."— Presentation transcript:

1 DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski

2 2 Agenda General description of the problem Functionality Data Mining aspects Algorithm and optimisation Data Base aspects General entities scheme

3 3 General Description Universal Tool Different kinds of objects (e.g. preprocessed photos, hospital patients data) Finding similar objects Decision problems

4 4 Functionality Independent system – user operated Using sets of data already provided or uploading new types Influence on the way data is processed Possible usage in bigger systems as a processing engine Additional module used as a helping tool in more complex systems

5 5 General Use Case

6 6 Data Mining General Ideas Description of a object Definition of a distance K-NN algorithm Brief explanations of the algorithm Optimization Problem of comparing large number of objects Optimized solution – using grouping idea

7 7 Definitions Objects

8 8 K-NN K – Nearest Neighbors Idea standing behind k-nn Aim - finding k-similar objects to the one we are analyzing and eventually assigning appropriate decision Method - calculating distance from analyzed object to the others in our database and finding the closest ones

9 9 K-NN Graphical representation

10 10 Definitions Distance Calculations in multidimensional space Coefficients Alfa w i – weights – underlining importance of particular attributes n – number of all the attributes

11 11 Optimalisation The reason – cost of multidimensional distance computation for 1-all elements Solution – improved Knn Result – better efficiency because of reduced number of distance computations due to narrowed set of possibly similar objects

12 12 Step 1 - Group-oriented plane division

13 13 Step 2 – new Object appeares

14 14 Step 3

15 15 Step 4

16 16 Step 5

17 17 Grouping problem The problem – assigning object into appropriate groups according to chosen distance definition Solution – some clustering algorithm Brief example – k-means algorithm

18 18 DataBase – entities

19 19 DataBase General structure of database results from optimization issues Due to universal purpose of the system database may contain many different tables of objects Need of using system tables for defining experiments Group Member as a temporary table ?

20 20 Summary There is still a lot of work to do...


Download ppt "DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski."

Similar presentations


Ads by Google