Document Similarity Measures Content:  Precision Recall and F-measure  Dice Coefficient  Jaccard Coefficient  Cosine Similarity  Asymmetric Similarity.

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Basic techniques for cluster detection
Clustering Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300.
1 BUS 297D: Data Mining Professor David Mease Lecture 8 Agenda: 1) Reminder about HW #4 (due Thursday, 10/15) 2) Lecture over Chapter 10 3) Discuss final.
Text Similarity David Kauchak CS457 Fall 2011.
ARNOLD SMEULDERS MARCEL WORRING SIMONE SANTINI AMARNATH GUPTA RAMESH JAIN PRESENTERS FATIH CAKIR MELIHCAN TURK Content-Based Image Retrieval at the End.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Data Clustering Methods
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 9- 1.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Advanced Concepts and Algorithms Figures for Chapter 9 Introduction.
Copyright © 2005 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
ISBN Chapter 10 Implementing Subprograms.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Anomaly Detection Figures for Chapter 10 Introduction to Data Mining by Tan,
Copyright © 2005 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Tutorial 2 LIU Tengfei 2/19/2009. Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource.
Data Mining Classification: Alternative Techniques
Distance Measures Tan et al. From Chapter 2.
ISBN Chapter 5 Names, Bindings, Type Checking, and Scopes.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Classification: Alternative Techniques Figures for Chapter 5 Introduction to.
Classification and clustering methods development and implementation for unstructured documents collections by Osipova Nataly St.Petesburg State University.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining: Exploring Data Figures for Chapter 3 Introduction to Data Mining by Tan, Steinbach,
Use with Management and Cost Accounting 8e by Colin Drury ISBN © 2012 Colin Drury Use with Management and Cost Accounting 8e by Colin Drury.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Association Analysis: Advanced Concepts Figures for Chapter 7 Introduction to.
Distance Measures Tan et al. From Chapter 2. Similarity and Dissimilarity Similarity –Numerical measure of how alike two data objects are. –Is higher.
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
ARMT + Grade 7 Mathematics Content Standard 7 GEOMETRY.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
9.9 Intro to Trig Goal: Understand three basic trig relationships.
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300.
1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.
Taxicab Geometry A Study Into Non-Euclidean Geometry Using Geometer Sketchpad Tyler Roell Studied under Dr. R. Talbert Franklin College Math Day November.
Chapter 2: Getting to Know Your Data
Types of Data How to Calculate Distance? Dr. Ryan Benton January 29, 2009.
S2B Chapter 11 Introduction to Trigonometric Ratios.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
Social Searching and Information Recommendation Systems Hassan Zamir.
Naver vs. Google.co.kr For foreigners visiting Korea Zdenek Slegl Michal Kaciuba Jorge Sanchez.
MIS 451 Building Business Intelligence Systems Clustering (1)
Source Page US:official&tbm=isch&tbnid=Mli6kxZ3HfiCRM:&imgrefurl=
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 (4) Introduction to Data Mining by Tan, Steinbach, Kumar ©
Living on the Grid Using Taxicab Geometry to Model Urban Environments Forrest Hinton.
Measurements and Data. Topics Types of Data Distance Measurement Data Transformation Forms of Data Data Quality.
Distance/Similarity Functions for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CS Dept., Tsing Hua Univ., Taiwan
Trigonometry Mini-Project Carlos Velazquez 6/4/13 A block.
Путешествуй со мной и узнаешь, где я сегодня побывал.
GRADE 11 EUCLIDEAN GEOMETRY Circle Theorems.
Lecture 2-2 Data Exploration: Understanding Data
Data Clustering Michael J. Watts
Lecture Notes for Chapter 2 Introduction to Data Mining
School of EECS, Peking University
Similarity and Dissimilarity
NDA Coaching in Chandigarh
Page 1. Page 2 Page 3 Page 4 Page 5 Page 6 Page 7.
School of Computer Science & Engineering
Lecture Notes for Chapter 2 Introduction to Data Mining
Dr. Unnikrishnan P.C. Professor, EEE
Chapter 1: Introduction
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
Objectives Data Mining Course
MIS 451 Building Business Intelligence Systems
Data Mining Classification: Alternative Techniques
Similarities Differences
Right Triangles and Trigonometry
Practice Geometry Practice
Data Mining: Concepts and Techniques — Chapter 2 —
Presentation transcript:

Document Similarity Measures Content:  Precision Recall and F-measure  Dice Coefficient  Jaccard Coefficient  Cosine Similarity  Asymmetric Similarity  Euclidean Distance  Manhattan blocks distance

Similarity Measures: Requirements

Precision Recall and F-measure

Dice Coefficient

Denotation of Dice Coefficient

Jaccard Coefficient

Cosine Similarity

Calculating Cosine Similarity

Asymmetric Similarity

Distance Based Similarity Measures

Reference Leach and Gillet (2003) Chapters 3, 4, 5 and 6 Eugene F. Krause (1987). Taxicab Geometry. Dover. ISBN P.-N. Tan, M. Steinbach & V. Kumar, "Introduction to Data Mining",, Addison-Wesley (2005), ISBN , chapter 8; page 500. Elena Deza & Michel Marie Deza (2009) Encyclopedia of Distances, page 94, Springer. Hazewinkel, Michiel, ed. (2001), "Mahalanobis distance", Encyclopedia of Mathematics, Springer, ISBN