Cluster Evaluation Metrics that can be used to evaluate the quality of a set of document clusters.

Precision Recall & FScore  From Zhao and Karypis, 2002  These metrics are computed for every (class,cluster) pair.  Terms:  class L r of size n r  cluster S i if size n i  n ri documents in S i from class L r

Precision  Loosely equated to accuracy  Roughly answers the question: “How many of the documents in this cluster belong there?”  P(L r, S i ) = n ri /n i

Recall  Roughly answers the question: “Did all of the documents that belong in this cluster make it in?”  P(L r, S i ) = n ri /n r

FScore  Harmonic Mean of Precision and Recall  Tries to give a good combination of the other 2 metrics  Calculated with the equation:

FScore - Entire Solution  We calculate a per-class FScore:  We then combine these scores into a weighted average:

FScore Caveats  The Zhao, Karypis paper focused on Hierarchical clustering, so the definitions of Precision/Mean and FScore might not apply as well to “flat” clustering.  The metrics rely on the use of class labels, so they can not be applied in situations were there is no labeled data.

Possible Modifications  Calculate a per-cluster (not per class FScore:  Combine these scores into a weighted average:

Rand Index  Yeung, et al., 2001  Measure of partition agreement  Answers the question “How similar are these two ways of partitioning the data?”  To evaluate clusters, we compute the Rand Index between actual labels and clusters

Rand Index  a = # pairs of documents that are in the same S i and L r  b = # pairs of documents that are in the same L r, but not the same S i  c = # pairs of documents in the same S i, but not the same L r  d = # pairs of documents that are not in the same L r nor S i.

Adjusted Rand Index  The Rand index has a problem, the expected value for any 2 random partitions is relatively high, we’d like it to be close to 0.  Adjusted Rand index puts the expected value at 0, gives a more dynamic range and is probably a better metric.  See appendix B of Yeung, et al., 2001.

Rand Index Caveat  Penalizes good, but finer grained clusters: imagine a sports class that produces 2 clusters, one for ball sports and one for track sports.  To fix that issue, we could hard label each cluster and treat all clusters with the same label as the same (clustering the clusters).

Problems  The metrics so far depend on class labels.  They also give undeserved high scores as k approaches n, because almost all instances end up alone in a cluster.

Label Entropy  My idea? (I haven’t seen it anywhere else)  Calculate an entropy value per cluster:  Combine entropies (weighted average):

Log Likelihood of Data  Calculate the log likelihood of the Data according to the clusterers model.  If the clusterer doesn’t have an explicit model, treat clusters as classes and train a class conditional model of the data based on these class labelings. Use the new model to calculate log likelihood.

Cluster Evaluation Metrics that can be used to evaluate the quality of a set of document clusters.

Similar presentations

Presentation on theme: "Cluster Evaluation Metrics that can be used to evaluate the quality of a set of document clusters."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cluster Evaluation Metrics that can be used to evaluate the quality of a set of document clusters.

Similar presentations

Presentation on theme: "Cluster Evaluation Metrics that can be used to evaluate the quality of a set of document clusters."— Presentation transcript:

Similar presentations

About project

Feedback