Presentation is loading. Please wait.

Presentation is loading. Please wait.

Document Similarities Anand Bahety Cody Dunne. Project Idea Find similar segments of documents.

Similar presentations


Presentation on theme: "Document Similarities Anand Bahety Cody Dunne. Project Idea Find similar segments of documents."— Presentation transcript:

1 Document Similarities Anand Bahety Cody Dunne

2 Project Idea Find similar segments of documents

3 Project Idea (cont…) Inexact matching –Local alignment (Smith-Waterman, BLAST) –Based on character Meaningless to score character differences –Based on word Need a good scoring function

4 Project Idea (cont…) Scoring function based on word relationships –Part of speech Noun -> pronoun (ok) Noun ->verb (worse) –Synonyms – positive score –Antonyms – negative score –Network of word relationships WordNet – publicly available lexical English database –Gaps Different numbers of adjectives/adverbs Prepositions, pronouns

5 Related Work Document versioning (Versioning Machine, etc…) Detecting plagiarism (Bagdis, etc…)

6 Potential Pitfalls False positives The Great Wall of China is very famous. The Fantastic Wall by XYZ is very famous. –Pick correct word meanings False negatives –Database isn’t perfect/complete Incomplete scoring function –Only examines particular types of words –Depends on order Limited to English –EuroWordNet


Download ppt "Document Similarities Anand Bahety Cody Dunne. Project Idea Find similar segments of documents."

Similar presentations


Ads by Google