Presentation is loading. Please wait.

Presentation is loading. Please wait.

Where does this new information belong? From developing mining algorithms to supporting knowledge discovery Bettina Berendt – thanks for joint work with.

Similar presentations


Presentation on theme: "Where does this new information belong? From developing mining algorithms to supporting knowledge discovery Bettina Berendt – thanks for joint work with."— Presentation transcript:

1 Where does this new information belong? From developing mining algorithms to supporting knowledge discovery Bettina Berendt – thanks for joint work with and support from Ilija Subasić Mathias Verbeke Siegfried Nijssen Luc De Raedt K.U. Leuven

2 Yes we can! The problem

3 The solution? Automatic topic dectection Period 1Period 2Period 3Period 4 Healthcare agenda Green energy plan Opposition to healthcare reform Another healthcare vote Climate agenda A healthcare vote Peace Nobel Prize Cophenhagen climate summit Health0.017 Care0.015 Insurance0.013 American0.013 Uninsured0.009 Families0.008 Working0.005

4 Same event/document; different interpretations & categorisations Visionary president Party-politics (right and left) Obama‘s overall agenda Damp-rag president Rhetorics

5 Similar problems in science and learning Topic detection in time-indexed corpora of news texts Conference programme ! Text miningStream miningMedia studies

6 Music collections, multimedia collections: see Andreas Nürnberger‘s talk at SML 2010 Similar problems in other areas

7 The solution? Context-aware systems / personalisation Female Has problems with anger management You probably do / should think about it this way:... Political activist

8 is (nearly) green What users want leftright squares / circles green / not green... to structure the world how they see it... to re-use their categories (that they worked so hard to find)... to be able to see through their eyes  interactivity... to acknowledge that others see the world differently  semantics  Social similarity / diversity  perspective- taking... to provide data mining methods to do all that!

9  Research agenda  interactivity  semantics  Social similarity / diversity  perspective- taking... to provide data mining methods to do all that! automatic topic dectection support sense-making = provide methods / tools for Knowledge Disovery (in the full sense) The problem

10  Research agenda  interactivity  semantics  Social similarity / diversity  perspective- taking... to provide data mining methods to do all that! Our solution approach The problem automatic topic dectection support sense-making = provide methods / tools for Knowledge Disovery (in the full sense)

11 STORIES: functionality basics

12

13 STORIES: mining basics (1) Graphical summarisation of multiple text documents Document / text pre-processing Document summarization strategy Template recognition Multi-document named entities Stopword removal, lemmatization “fact (assertion) recognition” no topics, but salient concepts & relations time window; word-span window Selection approach for concepts concepts = words or named entities salient concept = high TF & involved in a salient relation, time-indexed Similarity measure to determine salient relations bursty co-occurrence Burstiness measure time relevance, a “temporal co-occurrence lift”

14 Aim: highlight subgraphs that represent an event Topological properties Change: Subgraph new in this period STORIES: mining basics (2) Graph analysis for query recommendation

15 STORIES: evaluation 4.Comparison with other temporal text mining methods nNew (and only) framework for cross-method comparison nRecall-&precision-style metrics  different method rankings 3.Learning effectiveness nDocument search with story graphs leads to averages of n67-75% accuracy on judgments of story fact truth non average, 1.3-4.7 queries with 3.4-5.2 nodes/words per query 1.Information retrieval quality Edges – events: up to 80% recall, ca. 30% precision 2.Search quality Subgraphs index coherent document clusters

16 Damilicious: functionality basics Apply my grouping rfid (Security/privacy, Group 2,...) to the following new search result: * Show users and how similarly they group * Apply U4‘s grouping to my new search result:

17 Damilicious: mining basics (1) Methods and process 1.Query 2.Automatic clustering 3.Manual regrouping 4.Re-use 1.Learn classifier & present way(s) of grouping 2.Transfer the constructed concepts Features/methods for the conceptual/predictive clustering: Lingo phrases, Lingo clustering, Ripper co-citation, bibliometric coupling, word or LSA similarity, combinations; k-means, hierarchical

18 “How similarly do two users group documents?“ For each query q, consider their groupings gr: For several queries: aggregate Damilicious: mining basics (2) Measures of grouping and user diversity Diversity = 1 – similarity = 1 - Normalized mutual information (entropy-based measure) NMI = 0 “How similarly do two users group documents?“ For each query q, consider their groupings gr: For several queries: aggregate

19 Damilicious: evaluation Clustering: Does it generate meaningful document groups? –yes (tradition in bibliometrics) – but: data? –Small expert evaluation of CiteseerCluster Choosing the clustering and classification methods for conceptual clustering –Experiments: different features, clustering methods, classification methods  quality of reconstruction and extension-over-time (NMI) Technology acceptance –End-user experiment (clustering & regrouping) –5-person formative user study (transfer of own results)

20 Sense-making involves –Extracting information from texts –Extracting structural information between entities –Creating, using and modifying categories –Interacting with external representations –Acknowledging diversity and perspective-taking –... Appropriate mining methods, measures,...? More/better evaluation methods and frameworks? Use cases? KD approach Text mining Graph mining Semantics Interactivity Usage mining and “model-processing“ (conceptual / predictive clustering) Conclusions and (some) questions Sense-making involves –Extracting information from texts –Extracting structural information between entities –Creating, using and modifying categories –Interacting with external representations –Acknowledging diversity and perspective-taking –...

21 Questions ?you !Thank

22 Subašić, I. & Berendt, B. (2009). Discovery of interactive graphs for understanding and searching time-indexed corpora. Knowledge and Information Systems. DOI - 10.1007/s10115-009-0227-x (PDF)PDF Berendt, B. & Subašić, I. (2009). STORIES in time: a graph-based interface for news tracking and discovery. n N. Cristianini & M. Turchi (Eds.), Proceedings of Intelligent Analysis and Processing of Web News Content (IAPWNC) at The 2009 IEEE /WIC / ACM International Conferences Web Intelligence (WI'09) / Intelligent Agent Technology (IAT'09). 15 September 2009, Milan, Italy. (Proceedings of WI-IAT.2009, DOI 10.1109/WI-IAT.2009.342, pp. 531-534) (PDF)PDF Verbeke, M., Berendt, B., & Nijssen, S. (2009). Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search. In G. Boato & C. Niederee (Eds.), Proceedings of First International Workshop on Living Web, collocated with the 8th International Semantic Web Conference (ISWC- 2009), Washington D.C., USA, October 26, 2009. CEUR Workshop Proceedings Vol- 515. (PDF)PDF Berendt, B. (2010). Diversity in search: what, how, and what for? Talk at Barcelona Media / Yahoo! Research and UPF, 4 March 2010. (PPT)PPT Berendt, B., Krause, B., & Kolbe-Nusser, S. (2010). Intelligent scientific authoring tools: Interactive data mining for constructive uses of citation networks. networks. Information Processing & Management, 46(1), 1-10. (PDF)PDF To Read


Download ppt "Where does this new information belong? From developing mining algorithms to supporting knowledge discovery Bettina Berendt – thanks for joint work with."

Similar presentations


Ads by Google