Finding Text Trends l Word usage tracks interest changes l Segment documents by time period l Phrase frequency = number of documents l Phrase must have.

Finding Text Trends l Word usage tracks interest changes l Segment documents by time period l Phrase frequency = number of documents l Phrase must have support l Trend is sequence of frequencies

Approach l Reuse current data mining tools l Two major components phrase identification trend identification via shape queries l Each word treated as a “transaction” “timestamp” becomes a variable l Match history to query visual query language l Visualize results

Identifying Phrases l Phrases defined recursively > (IBM) = a word = a 1-phrase > = a 2-phrase

Kludge timestamps l Trying to use existing tools l Queries may specify document sections same sentence same paragraph same section l Word “timestamps” fudged sentence + 1,000 paragraph + 100,000 section + 10,000,000

Query tricks l Minimum gap = 1000 same but sequential sentences l Maximum gap = 999 same sentence l Maximum gap = 99,999 same paragraph

Shape Definition Language l For describing trends in word frequency rising falling spike l Has graphical front-end l Can be “blurry” shape significant interval details neglected

Test Application l U.S. Patent dB l dB searched by unknowledgeable user l Identified rising trends for several phrases l Transition from specific query to mining not described

Problems l Tended to identify too many phrases l Worked on pruning of phrases non-maximal subset near maximal phrase syntactic sub-phrases

Finding Text Trends l Word usage tracks interest changes l Segment documents by time period l Phrase frequency = number of documents l Phrase must have.

Similar presentations

Presentation on theme: "Finding Text Trends l Word usage tracks interest changes l Segment documents by time period l Phrase frequency = number of documents l Phrase must have."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Finding Text Trends l Word usage tracks interest changes l Segment documents by time period l Phrase frequency = number of documents l Phrase must have.

Similar presentations

Presentation on theme: "Finding Text Trends l Word usage tracks interest changes l Segment documents by time period l Phrase frequency = number of documents l Phrase must have."— Presentation transcript:

Similar presentations

About project

Feedback