Event Trend Detector Ryan Ward, Skylar Edwards, Jun Lee, Stuart Beard, Spencer Su CS 4624 Multimedia, Hypertext, and Information Access Instructor: Edward.

Slides:



Advertisements
Similar presentations
HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.
Advertisements

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
ITS-VIP SPRING 2012 FINAL PRESENTATION DATA MINING GROUP PHP?HTML INTERFACE Mide Ajayi Nakul Dureja Data Miners Rakesh Kumar David Fleischhauer.
1 / 14 Integrated Visual Analysis of Global Terrorism Remco Chang Charlotte Visualization Center UNC Charlotte.
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.
Website Conversion & Virtual Food Drive Feeding America: Southwest Virginia Bradley BaileySarah Dotson Taehee HanHunter Shepherd Susan FengSean Kelley.
Tweets Metadata May 4, 2015 CS Multimedia, Hypertext and Information Access Department of Computer Science Virginia Polytechnic Institute and State.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Clustering.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
PCI th Panhellenic Conference in Informatics Clustering Documents using the 3-Gram Graph Representation Model 3 / 10 / 2014.
Chittampally Vasanth Raja vasanthexperiments.wordpress.com.
1 Centroid Based multi-document summarization: Efficient sentence extraction method Presenter: Chen Yi-Ting.
Information Storage and Retrieval(CS 5604) Collaborative Filtering 4/28/2016 Tianyi Li, Pranav Nakate, Ziqian Song Department of Computer Science Blacksburg,
Social Interactome Breathalyzer “Breathe-EZ”
INTRODUCTION TO DATABASES (MICROSOFT ACCESS)
Global Event Detector Final Project Presentation
Michael Liu, Andrew Chuba, Divya Sengar, James Wong, Alan Kai
CS6604 Digital Libraries Global Events Team Final Presentation
Collection Management (Tweets) Final Presentation
IDEALvr Team: Luciano Biondi, Omavi Walker, Dagmawi Yeshiwas
Collection Management
Rdoc2vec Jake Clark, Austin Cooke, Steven Rolph, Stephen Sherrard
Common Crawl Mining Team: Brian Clarke, Tommy Dean, Ali Pasha, Casey Butenhoff Manager: Don Sanderson (Eastman Chemical Company) Client: Ken Denmark.
Identifying Drug Related Events from Social Media
Background Check Website for R4 OpSec, LLC
Zenodo Data Archive Irtiza Delwar, Michael Culhane, John Sizemore, Gil Turner Client: Dr. Seungwon Yang Instructor: Dr. Edward A. Fox CS 4624 Multimedia,
Floods Joe Acanfora, Myron Su, David Keimig and Marc Evangelista
CLA Team Final Presentation CS 5604 Information Storage and Retrieval
VT microaggressions.cs.vt.edu
Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:
Virginia Tech Center for Drug Discovery Website Migration and Redesign
VR4GETAR CS4624: Multimedia, Hypertext and Information Access
Visualizations of School Shootings
Trail Study Kevin Cianfarini, Shane Davies, Marshall Hansen, Andrew Eason … CS4624: Multimedia, Hypertext, and Information Access Instructor: Dr. Edward.
Clustering and Topic Analysis
Tweet Collections Multimedia, Hypertext, and Information Access
Clustering tweets and webpages
CEED Phone App Madhur Mahajan, Zachary Hensley, Randy Liang, Sean Greynolds CS4624: Multimedia, Hypertext, and Information Access Edward A. Fox Virginia.
Jason Chan, Gregory Williams
CS 5604 Information Storage and Retrieval
The Team Ernesto Cortes Kipp Dunn Sar Gregorczyk Alex Schmidt
Graph Query Portal Amit Dayal David Brock
Multimedia Database Virginia Polytechnic Institute and State University Blacksburg, VA CS 4624 Multimedia, Hypertext and Information Access Client.
Social Interactome Recommender Team Final Presentation
Event Focused URL Extraction from Tweets
Collection Management Webpages Final Presentation
Stream Field Final Project Presentation
Final Presentation: Neural Network Doc Summarization
Tracking FEMA Kevin Kays, Emily Maier, Tyler Leskanic, Seth Cannon
Twitter Equity Firm Value
Wikipedia Hadoop Steven Stulga Spring 2016
Validation of Ebola LOD
LucidWorks: Vectorize Workflow Module
Information Storage and Retrieval
News Event Detection Website Joe Acanfora, Briana Crabb, Jeff Morris
Paleontology Topic Trends
Tweet URL Analysis Guoxin Sun, Kehan Lyu, Liyan Li
Social Interactome Recommender Team
Katrina Database SearchKat
CS5984:Big Data Text Summarization
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Pei Lee, ICDE 2014, Chicago, IL, USA
Team 7 → Final Presentation
Lightweight tools for on-line course development
Python4ML An open-source course for everyone
Presentation transcript:

Event Trend Detector Ryan Ward, Skylar Edwards, Jun Lee, Stuart Beard, Spencer Su CS 4624 Multimedia, Hypertext, and Information Access Instructor: Edward A. Fox May 4, 2018 Virginia Tech, Blacksburg VA 24061 Spencer

Table of Contents Project Overview Current Status Trend Detection Clustering Challenges What’s Left Acknowledgements Spencer

Project Introduction Collects news articles from Reddit and Google and identifies trends in frequency of mentioned entities. Builds on a previous CS4624 project which identifies similarities (clusters) in top Reddit news stories. Tasked with improving clustering algorithm and UI and implementing trend detection. The project is viewable outside Torgersen 2030. Spencer

Work Completed Clustering algorithm Trend detection Google News article collection Updated UI Spencer

Cluster Display

Tagged Entities before cleaning For the trend detection, we used a library called sner to extract all named entities. Here is what the entities looked like before we cleaned the data.

Cleaned Tagged Entities

Tagged Entity Database Table Example

Output graph

Clustering Implementation Document Similarity Matrix Determines subgraph connectivity Subgraphs are recalculated for dynamic similarity threshold Threshold filtering Sizes of the subgraphs change based on different similarity threshold settings Decrease threshold in each iteration to decrease the number of clusters Subsequently, number of centroids also decreases Goal is to create ‘the most acceptable’ number of clusters with highest similarities Jun

Changes to algorithm TensorFlow -> Scikit-learn Tools performs K-means clustering Hardship in manipulating data for cluster representation Creating subgraphs with iterations Testing various threshold percentages (High -> Low) Using clique as representative New articles will be… Added in clusters -OR- Used to create new clusters Spencer

Challenges Faced Apache configuration/version issues Matching the x-axis for trend graphs Using pre-built libraries - sometimes not so compatible Deciding number of clusters for display system Stuart

Acknowledgements Client: Liuqing Li Supported by NSF (IIS-1619028 and 1619371) References: Google Trends: https://trends.google.com/trends/story/US_cu_J7SG6GEBAADA3M_en https://trends.google.com/trends/explore?q=trend Cluster Methods: http://www.sthda.com/english/articles/25-cluster-analysis-in-r-practical-guide/111-types-of-clustering- methods-overview-and-quick-start-r-code/

Questions?