GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE.

GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE 2013/BIT/120/PS ATWINE EVANS 2013/BIT/106/PS

What is a cluster:  cluster refers to forming groups of objects that are very similar to each other but are highly different from the objects in other clusters Example:  having a cluster of customers who buy coca cola soda and another buying Pepsi.  It is a group of independent servers which are normally in close proximity to one another interconnected through a dedicated network to work as one centralized data processing resource.

What is cluster analysis: Cluster analysis is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.

Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups

Examples of clustering applications: MARKETING Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. Approach: Collect different attributes of customers based on their geographical and lifestyle related information. Find clusters of similar customers. Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters.

Document Clustering: Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered

ILLUSTRATING DOCUMENT CLUSTERING Clustering Points: 3204 Articles of Los Angeles Times. Similarity Measure: How many words are common in these documents (after some word filtering).

Examples of clustering applications: Biology: classification of plants and animals given their features; Libraries: book ordering; Insurance: identifying groups of motor insurance policy holders with a high average claim cost; City-planning: identifying groups of houses according to their house type, value and geographical location; Earthquake studies: clustering observed earthquake epicenters to identify dangerous zones; WWW: document classification; clustering weblog data to discover groups of similar access patterns.

What is meant by good clustering: A good clustering method will produce high quality cluster in which the intra-class that is intra cluster;similary is high the inter-class similarity is low The quality of a clustering result also depends on both the similarity measure used by the method and its implementation The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns.

Requirements of clustering in data mining: The main requirements that a clustering algorithm should satisfy are: scalability dealing with different types of attributes; discovering clusters with arbitrary shape; minimal requirements for domain knowledge to determine input parameters; ability to deal with noise and outliers; interpretability and usability.

Problems: There are a number of problems with clustering. Among them: current clustering techniques do not address all the requirements adequately (and concurrently); dealing with large number of dimensions and large number of data items can be problematic because of time complexity; the effectiveness of the method depends on the definition of “distance” (for distance-based clustering); if an obvious distance measure doesn’t exist we must “define” it, which is not always easy, especially in multi-dimensional spaces; the result of the clustering algorithm (that in many cases can be arbitrary itself) can be interpreted in different ways

GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE.

Similar presentations

Presentation on theme: "GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE.

Similar presentations

Presentation on theme: "GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE."— Presentation transcript:

Similar presentations

About project

Feedback