Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases University of Kansas CBAR Wednesday, 04 September 2013 William H. Hsu Laboratory for Knowledge Discovery in Databases, Kansas State University Acknowledgements Kansas State: Wesam Elshamy, Ming Yang, Surya Teja Kallumadi, Majed Alsadhan Illinois: Chengxiang Zhai, Jiawei Han, Kevin Chang, Dan Roth iQGateway: Praveen Koduru, Krishna Kumar Vallyatodi Dynamic Topic Modeling for Spatiotemporal Event Extraction: Probabilistic Approaches and The Dim Sum Process
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Based on NLP Group NER Toolkit © Stanford University Simile © Massachusetts Institute of Technology Google Maps © Tele Atlas, Inc. and Google, Inc. Motivation: Thematic Mapping [1] Summarizing News from The Web
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases © 2006 – 2013 Brownstein, J. & Freifeld, C. Motivation: Thematic Mapping [2] HealthMap
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases © 2006 – 2013 Brownstein, J. & Freifeld, C. Motivation: Thematic Mapping [2] HealthMap
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases © 2011 – 2012 TextMap.org Motivation: Thematic Mapping [4] TextMap & Topic modelsc
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Volkova, S., Caragea, D., Hsu, W. H., Drouhard, J., & Fowles, L. (2010). Boosting Biomedical Entity Extraction by using Syntactic Patterns for Semantic Relation Discovery. Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2010). See also: Volkova, S. (2010). As Entity Extraction, Animal Disease-related Event Recognition and Classification from Web. M.S. thesis, Kansas State University. Motivation: Thematic Mapping [5] Existing Systems & Limitations
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Simultaneous Topic Enumeration & Formation Topic Modeling: Static (Atemporal) to Dynamic Continuous Time vs. Variable Number of Topics Dim Sum Process for Hybrid STEF Dynamic Topic Modeling Test Bed News Monitoring: Geotagging & Timelines Recent Results STEF & Heterogeneous Info Network Analysis Outline
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Timeline Formation: General Task Illustrated Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Simultaneous Topic Enumeration & Formation (STEF) Adapted from Elshamy (2012) Time t: 3 extant topicsTime t + k: 2 extant topics
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Simultaneous Topic Enumeration & Formation Topic Modeling: Static (Atemporal) to Dynamic Continuous Time vs. Variable Number of Topics Dim Sum Process for Hybrid STEF Dynamic Topic Modeling Test Bed News Monitoring: Geotagging & Timelines Recent Results STEF & Heterogeneous Info Network Analysis Outline
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Topic Modeling [1]: Basic Task (Static) Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Topic Modeling [2]: Understanding Plate Notation Adapted from Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Topic Modeling [3]: Hyperparameters (Another Model) Adapted from Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Simultaneous Topic Enumeration & Formation Topic Modeling: Static (Atemporal) to Dynamic Continuous Time vs. Variable Number of Topics Dim Sum Process for Hybrid STEF Dynamic Topic Modeling Test Bed News Monitoring: Geotagging & Timelines Recent Results STEF & Heterogeneous Info Network Analysis Outline
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Continuous Time vs. Variable Number of Topics Elshamy (2012) State of the Field Goal
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Event s from Text: Markov Model for Topic Detection & Tracking Adapted from Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Simultaneous Topic Enumeration & Formation Topic Modeling: Static (Atemporal) to Dynamic Continuous Time vs. Variable Number of Topics Dim Sum Process for Hybrid STEF Dynamic Topic Modeling Test Bed News Monitoring: Geotagging & Timelines Recent Results STEF & Heterogeneous Info Network Analysis Outline
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Continuous-time Dynamic Topic Model (cDTM) Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Discrete Time Online Hierarchical Dirichlet Process (oHDP) Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Continuous-time Infinite Dynamic Topic Model (CIDTM) Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Simultaneous Topic Enumeration & Formation Topic Modeling: Static (Atemporal) to Dynamic Continuous Time vs. Variable Number of Topics Dim Sum Process for Hybrid STEF Dynamic Topic Modeling Test Bed News Monitoring: Geotagging & Timelines Recent Results STEF & Heterogeneous Info Network Analysis Outline
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases © 2006 – 2013 Brownstein, J. & Freifeld, C. HealthMap Redux: Thematic Mapping, Health Infor matics, & Epidemiology
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Simultaneous Topic Enumeration & Formation Topic Modeling: Static (Atemporal) to Dynamic Continuous Time vs. Variable Number of Topics Dim Sum Process for Hybrid STEF Dynamic Topic Modeling Test Bed News Monitoring: Geotagging & Timelines Recent Results STEF & Heterogeneous Info Network Analysis Outline
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Thematic Mapping Tasks [1]: Entities Example: CNN, 2007 Foot-and-Mouth Disease ( Tests have confirmed a second foot-and-mouth outbreak in southern England, the government announced, raising fears that the highly contagious animal virus is spreading. Chief Veterinary Officer Debby Reynolds said Tuesday that tests showed a herd of cattle had been infected. The animals were culled Monday evening after showing signs of the disease. Update Summarization A second foot-and-mouth disease infection in a herd of cattle in southern England was responded to by culling on Monday evening and announced by Debby Reynolds on Tuesday. (Second since earlier report – hence “update”.) Compare: Recognizing Textual Entailment A foot-and-mouth disease infection was reported the day after culling. (True.)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Thematic Mapping Tasks [2]: Aspects © 2008 C. Zhai University of Illinois
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Current off-the-shelf applications fall into ambiguity problems Thematic Mapping Tasks [3]: Location & Disambiguation © 2008 W. Elshamy
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Search phrase: “smallpox”© 2007 – 2009 Google, Inc. Thematic Mapping Tasks [4]: Time & Timelines
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Thematic Mapping Tasks [5]: Timeline Reconstruction Murphy, Hsu, Elshamy, Kallumadi, & Volkova (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Simultaneous Topic Enumeration & Formation Topic Modeling: Static (Atemporal) to Dynamic Continuous Time vs. Variable Number of Topics Dim Sum Process for Hybrid STEF Dynamic Topic Modeling Test Bed News Monitoring: Geotagging & Timelines Recent Results STEF & Heterogeneous Info Network Analysis Outline
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Recent Results [1]: Meth Lab mapping Hsu, Abduljabbar, Osuga, Lu, & Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Recent Results [2]: Visual Analytics Hsu, Abduljabbar, Osuga, Lu, & Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Recent Results [3]: Topic Proportions Hsu, Abduljabbar, Osuga, Lu, & Elshamy (2012)
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Simultaneous Topic Enumeration & Formation Topic Modeling: Static (Atemporal) to Dynamic Continuous Time vs. Variable Number of Topics Dim Sum Process for Hybrid STEF Dynamic Topic Modeling Test Bed News Monitoring: Geotagging & Timelines Recent Results STEF & Heterogeneous Info Network Analysis Outline
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Sentiment Analysis Tasks: Polarity © 1999 – 2012 dslreports.com
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Aggregation & OLAP: Wikipedia Infobox as Fact Table Infobox: Albert Einstein © 2001 – 2010 Wikimedia Foundation Q: Where can this information be found? A: It depends… How much formatting does source page have? Marked up? (Machine-readable?) Semantically rich markup? Albert Einstein © 2001 – 2010 Wikimedia Foundation
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Opinion Mapping Example [1]: Health Blogs on Chronic Disease
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Opinion Mapping Example [2]: New Entities & Relationships
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Opinion Mapping Example [3]: Polarity © 2012 Twitrratr
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases Opinion Mapping Example [4]: Aims & Approach Aim 1 – Extend Algorithms to Detect New: Entities: Diseases, Treatments, Complications Relationships: Adverse Reactions, Controversies Aim 2 – Domain-Specific Ontology Symptoms, Disease Attributes Treatments, Complications Comparisons Aim 3 – Better Recognition of Scope, Polarity
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases User Groups: Goals & Primary Use Cases Goal: Thematic Opinion Map (Choropleth, etc.) User Groups Experienced: policymakers, health professionals Individual stakeholders: patients, activists, voters Primary Use Case: Infographics as IE Views © 2011 Mediabistro Are Germans really the happiest Twitter users by country, Tennesseans by U.S. state?