Download presentation
Presentation is loading. Please wait.
Published bySybil Hall Modified over 8 years ago
1
Finding Text Trends l Word usage tracks interest changes l Segment documents by time period l Phrase frequency = number of documents l Phrase must have support l Trend is sequence of frequencies
2
Approach l Reuse current data mining tools l Two major components phrase identification trend identification via shape queries l Each word treated as a “transaction” “timestamp” becomes a variable l Match history to query visual query language l Visualize results
3
Identifying Phrases l Phrases defined recursively > (IBM) = a word = a 1-phrase > = a 2-phrase
4
Kludge timestamps l Trying to use existing tools l Queries may specify document sections same sentence same paragraph same section l Word “timestamps” fudged sentence + 1,000 paragraph + 100,000 section + 10,000,000
5
Query tricks l Minimum gap = 1000 same but sequential sentences l Maximum gap = 999 same sentence l Maximum gap = 99,999 same paragraph
6
Shape Definition Language l For describing trends in word frequency rising falling spike l Has graphical front-end l Can be “blurry” shape significant interval details neglected
7
Test Application l U.S. Patent dB l dB searched by unknowledgeable user l Identified rising trends for several phrases l Transition from specific query to mining not described
8
Problems l Tended to identify too many phrases l Worked on pruning of phrases non-maximal subset near maximal phrase syntactic sub-phrases
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.