GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE.

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

Copyright Jiawei Han, modified by Charles Ling for CS411a
What is Cluster Analysis?
PARTITIONAL CLUSTERING
CS690L: Clustering References:
Data Mining Techniques: Clustering
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
Week 9 Data Mining System (Knowledge Data Discovery)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
What is Cluster Analysis
Segmentação (Clustering) (baseado nos slides do Han)
University of Minnesota
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
CLUSTERING (Segmentation)
Data Mining – Intro.
Birch: An efficient data clustering method for very large databases
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang.
Data Mining Techniques
Data Mining Chun-Hung Chou
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
CLUSTER ANALYSIS.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Knowledge Discovery and Data Mining Evgueni Smirnov.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Knowledge Discovery and Data Mining Evgueni Smirnov.
October 27, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 7 — ©Jiawei Han and Micheline.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Text Clustering Hongning Wang
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.
CLUSTERING PARTITIONING METHODS Elsayed Hemayed Data Mining Course.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Topic 4: Cluster Analysis Analysis of Customer Behavior and Service Modeling.
Cluster Analysis This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed under a Creative Commons.
Data Mining – Intro.
Data Mining Comp. Sc. and Inf. Mgmt. Asian Institute of Technology
What Is Cluster Analysis?
Data Mining--Clustering
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Topic 3: Cluster Analysis
©Jiawei Han and Micheline Kamber Department of Computer Science
Data Mining: Introduction
Self organizing networks
Sangeeta Devadiga CS 157B, Spring 2007
Clustering Basic Concepts and Algorithms 1
Data Mining: Introduction
Fuzzy Clustering.
Dr. Unnikrishnan P.C. Professor, EEE
CSE572, CBS598: Data Mining by H. Liu
CSCI N317 Computation for Scientific Applications Unit Weka
CSE572, CBS572: Data Mining by H. Liu
MIS2502: Data Analytics Clustering and Segmentation
What Is Good Clustering?
MIS2502: Data Analytics Clustering and Segmentation
Clustering Wei Wang.
Clustering Large Datasets in Arbitrary Metric Space
Topic 5: Cluster Analysis
CSE572: Data Mining by H. Liu
Presentation transcript:

GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE 2013/BIT/120/PS ATWINE EVANS 2013/BIT/106/PS

What is a cluster:  cluster refers to forming groups of objects that are very similar to each other but are highly different from the objects in other clusters Example:  having a cluster of customers who buy coca cola soda and another buying Pepsi.  It is a group of independent servers which are normally in close proximity to one another interconnected through a dedicated network to work as one centralized data processing resource.

What is cluster analysis: Cluster analysis is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.

Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups

Examples of clustering applications: MARKETING Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. Approach: Collect different attributes of customers based on their geographical and lifestyle related information. Find clusters of similar customers. Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters.

Document Clustering: Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered

ILLUSTRATING DOCUMENT CLUSTERING Clustering Points: 3204 Articles of Los Angeles Times. Similarity Measure: How many words are common in these documents (after some word filtering).

Examples of clustering applications: Biology: classification of plants and animals given their features; Libraries: book ordering; Insurance: identifying groups of motor insurance policy holders with a high average claim cost; City-planning: identifying groups of houses according to their house type, value and geographical location; Earthquake studies: clustering observed earthquake epicenters to identify dangerous zones; WWW: document classification; clustering weblog data to discover groups of similar access patterns.

What is meant by good clustering: A good clustering method will produce high quality cluster in which the intra-class that is intra cluster;similary is high the inter-class similarity is low The quality of a clustering result also depends on both the similarity measure used by the method and its implementation The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns.

Requirements of clustering in data mining: The main requirements that a clustering algorithm should satisfy are: scalability dealing with different types of attributes; discovering clusters with arbitrary shape; minimal requirements for domain knowledge to determine input parameters; ability to deal with noise and outliers; interpretability and usability.

Problems: There are a number of problems with clustering. Among them: current clustering techniques do not address all the requirements adequately (and concurrently); dealing with large number of dimensions and large number of data items can be problematic because of time complexity; the effectiveness of the method depends on the definition of “distance” (for distance-based clustering); if an obvious distance measure doesn’t exist we must “define” it, which is not always easy, especially in multi-dimensional spaces; the result of the clustering algorithm (that in many cases can be arbitrary itself) can be interpreted in different ways