Presentation is loading. Please wait.

Presentation is loading. Please wait.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Similar presentations


Presentation on theme: "ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples."— Presentation transcript:

1 ML ALGORITHMS

2 Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples Instances: described by fixed set of features “attributes” Classes: discrete or continuous “classification” “regression” Interested in: – Results? (classifying new instances) – Model? (how the decision is made) Clustering (unsupervised) There are no classes Association rules Look for rules that relate features to other features

3 Classification

4 Clustering

5 It is expected that similarity among members of a cluster should be high and similarity among objects of different clusters should be low. The objectives of clustering – knowing which data objects belong to which cluster – understanding common characteristics of the members of a specific cluster

6 Clustering vs Classification There is some similarity between clustering and classification. Both classification and clustering are about assigning appropriate class or cluster labels to data records. However, clustering differs from classification in two aspects. – First, in clustering, there are no pre-defined classes. This means that the number of classes or clusters and the class or cluster label of each data record are not known before the operation. – Second, clustering is about grouping data rather than developing a classification model. Therefore, there is no distinction between data records and examples. The entire data population is used as input to the clustering process.

7 Association Mining

8 Overfitting Memorization vs generalization To fix, use – Training data — to form rules – Validation data — to decide on best rule – Test data — to determine system performance Cross-validation

9 Baseline Experiments In order to evaluate the efficiency of the classifiers used in experiments, we use baselines: – Majority based random classification (Kappa=0) – Class distribution based random classification (Kappa=0) Kappa statistics, is used as a measure to assess the improvement of a classifier’s accuracy over a predictor employing chance as its guide. P 0 is the accuracy of the classifier and P c is the expected accuracy that can be achieved by a randomly guessing classifier on the same data set. Kappa statistics has a range between 1 and 1, where 1 is total disagreement (i.e., total misclassification) and 1 is perfect agreement (i.e., a 100% accurate classification). Kappa score over 0.4 indicates a reasonable agreement beyond chance. 9


Download ppt "ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples."

Similar presentations


Ads by Google