Tweets Metadata May 4, 2015 CS 4624 - Multimedia, Hypertext and Information Access Department of Computer Science Virginia Polytechnic Institute and State.

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

PHP I.
The Caught and Coloured website: its EMu origins Alex Chubaty – Collection Information Systems Craig Churchill – IT Software Development Museum Victoria.
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
Automatic Data Ramon Lawrence University of Manitoba
© Copyright 2003, Binomial International Inc. Phoenix Business Continuity and Disaster Recovery Planning Software Recovery Planning Software Tools Recovery.
Sai Deng, Metadata Catalog Librarian, Wichita State University Libraries Tse-Min Wang, Graduate Student in CS, Wichita State University Digital Imaging.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
New Innovative Access to Educational and Cultural Multimedia Contents Yuka Egusa Educational Resources Research Center, National Institute for Educational.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
Grickit William Vuong, Michael Long Date: 4/28/2015Course: 4624 Institution: Virginia TechInstructor: Ed Fox Department: Computer ScienceClient: Dr. Steven.
The British Library and SUNCAT Brenda Young The British Library Bibliographic Development.
Reducing Noise CS5604: Final Presentation Xiangwen Wang, Prashant Chandrasekar.
OME-TIFF and Bio-Formats K. Eliceiri, E. Hathaway, M. Linkert, and C. Rueden
Introduction to Omeka. What is Omeka? - An Open Source web publishing platform - Used by libraries, archives, museums, and scholars through a set of commonly.
Website Conversion & Virtual Food Drive Feeding America: Southwest Virginia Bradley BaileySarah Dotson Taehee HanHunter Shepherd Susan FengSean Kelley.
Using XML to store Descriptive Metadata Richard Murphy Rosarie O’Riordan Central Statistics Office Ireland.
Providing an Avalanche of Data GETTING OUR DATA ON THE WEB Scott Williams Oct 12, 2011.
RUBRIC IP1 Ruben Botero Web Design III. The different approaches to accessing data in a database through client-side scripting languages. – On the client.
An OAI-Compliant Federated Physics Digital Library for the NSDL Department of Computer Science Old Dominion University, Norfolk, VA In Collaboration.
VIRGINIA TECH BLACKSBURG CS 4624 MUSTAFA ALY & GASPER GULOTTA CLIENT: MOHAMED MAGDY IDEAL Pages.
Adapting the Electronic Laboratory Notebook for the Semantic Era Tara Talbott, Michael Peterson, Jens Schwidder, James D. Myers 2005 International Symposium.
U.S. Environmental Protection Agency Central Data Exchange Pilot Project Promoting Geospatial Data Exchange Between EPA and State Partners. April 25, 2007.
CS5604: Final Presentation ProjOpenDSA: Log Support Victoria Suwardiman Anand Swaminathan Shiyi Wei Department of Computer Science, Virginia Tech December.
Word 2007® Business and Personal Communication How can Microsoft Word 2007 help you work with others?
Department of Computer Science, Florida State University CGS 3066: Web Programming and Design Spring
Problem Based Learning To Build And Search Tweet And Web Archives Richard Gruss Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science.
Invitation to Computer Science 6 th Edition Chapter 10 The Tower of Babel.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Internet addresses By Toni Grey & Rashida Swan HTTP Stands for HyperText Transfer Protocol Is the underlying stateless protocol used by the World Wide.
U.S. Department of the Interior U.S. Geological Survey Manage and Provide Information: Examples from fish health, contaminants, and water quality data.
Submitted by: Moran Mishan. Instructed by: Osnat (Ossi) Mokryn, Dr.
A SCRIPT FOR ARCHIVING DIGITAL RESEARCH DATA IMPROVING ACCURACY AND EFFICIENCY IN THE DATAVERSE NETWORK ABSTRACT SUMMARY Rachel Carriere, Thu-Mai Christian,
Information Storage and Retrieval(CS 5604) Collaborative Filtering 4/28/2016 Tianyi Li, Pranav Nakate, Ziqian Song Department of Computer Science Blacksburg,
XP Creating Web Pages with Microsoft Office
Collection Management (Tweets) Final Presentation
Working with Client-Side Scripting
IDEALvr Team: Luciano Biondi, Omavi Walker, Dagmawi Yeshiwas
Collection Management
Attie Bioinformatics Server Redesign
Background Check Website for R4 OpSec, LLC
A Metadata System for Geomagnetism
Zenodo Data Archive Irtiza Delwar, Michael Culhane, John Sizemore, Gil Turner Client: Dr. Seungwon Yang Instructor: Dr. Edward A. Fox CS 4624 Multimedia,
Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:
Virginia Tech Center for Drug Discovery Website Migration and Redesign
VR4GETAR CS4624: Multimedia, Hypertext and Information Access
Virginia Tech Blacksburg CS 4624
Tweet Collections Multimedia, Hypertext, and Information Access
CEED Phone App Madhur Mahajan, Zachary Hensley, Randy Liang, Sean Greynolds CS4624: Multimedia, Hypertext, and Information Access Edward A. Fox Virginia.
CS 5604 Information Storage and Retrieval
CS6604 Digital Libraries IDEAL Webpages Presented by
Multimedia Database Virginia Polytechnic Institute and State University Blacksburg, VA CS 4624 Multimedia, Hypertext and Information Access Client.
VTechWorks Video Accessibility
Collection Management Webpages Final Presentation
Stream Field Final Project Presentation
Event Trend Detector Ryan Ward, Skylar Edwards, Jun Lee, Stuart Beard, Spencer Su CS 4624 Multimedia, Hypertext, and Information Access Instructor: Edward.
Tracking FEMA Kevin Kays, Emily Maier, Tyler Leskanic, Seth Cannon
The Web Wizard’s Guide To JavaScript
Twitter Equity Firm Value
CS6604 Digital Libraries IDEAL Webpages Presented by
Validation of Ebola LOD
Information Storage and Retrieval
News Event Detection Website Joe Acanfora, Briana Crabb, Jeff Morris
Michael Shuffett Virginia Tech Blacksburg, VA
Tweet URL Analysis Guoxin Sun, Kehan Lyu, Liyan Li
Katrina Database SearchKat
2016 Queen’s Printers Association of Canada Conference
Python4ML An open-source course for everyone
Presentation transcript:

Tweets Metadata May 4, 2015 CS Multimedia, Hypertext and Information Access Department of Computer Science Virginia Polytechnic Institute and State University Blacksburg, VA 24061

Project Personnel Principal Investigator: -Dr. Edward Fox, Virginia Tech Department of Computer Science - Clients: -Mohamed Magdy, Virginia Tech Department of Computer Science - Student Team Members: -Chris Conley -Alex Druckenbrod -Karl Meyer -Samuel Muggleworth

// Background ●CTRnet, QCRI, etc. collect and archive tweets surrounding events ●Desired some central database for tweet collections ●Michael Shuffet started project in 2014 ●Implemented to support CTRnet and QCRI data formats Figure 1: Michael Shuffett’s Upload information page

// Goals ●Tweet and tweet-collection metadata standard ○ Enable collection sharing and consistency ○ Standard exists at collection and tweet levels ●To implement methods for merging such collections ●To create a web-app tool that will allow a user to upload new collections and execute merging of collections.

// Code Base, Technologies ●Shuffetts TweetID tool as starting point o Implemented upload and merging specifically geared towards IDEAL and QCRI collections ●Technologies used: o Python scripting o SQLlite - database o HTML5/CSS/Bootstrap - web development o jQuery/Flask-script - client side scripting o jinja2 - dynamic HTML templating/rendering o Flask/WTForm - forms and upload Figure 2: Shuffett’s tweet listing page

// Discoveries ●All files converted to.tsv ●expected a ‘text’ field, but never writes it to database ●Using collection.name as principal key in database ●Supports only: o One schema type  9 fields: 7 data fields + 2 \N fields* o Two file types .csv or.tsv

// What we did ●Determined standard o What data is necessary? ●Updated Interface ●Created collection.id field as principal key ●Allowed multiple schemas/formats o Request schema from uploader o Map and record only standardized data to our schema. o Original merge process unchanged Figure 3: Our Updated Collection Page

// Standards Figure 4: Tweet metadata as captured from our test database. Figure 5: Collection metadata as captured from our test database.

// Lessons Learned ●Attempted to switch to Python3 o Learned about incompatible libraries ●Commenting code is very important o Continuing a project is much easier with documentation ●Be aware of database size o Better to implement for large database from the start o I.e. avoid iterating through the entire database

// Future Plans ●Implement autonomous capabilities o More complex schemas o Less manual mapping/schema info ●Become largely more flexible o Less reliant on specific formatting o Accept more file types ●Persist Merges ●Scale the technologies appropriately depending on size of project.

-"Developer Policy." Developer Policy. Twitter, Inc., 22 Oct Web.. -Shuffett, Michael. “Twitter Metadata.” Twitter Metadata. VTechWorks, 10 May Web. 09 Feb "Twitter Terms of Service." Twitter, Inc., 8 Sept Web.. -“QCRI.” Quatar Computing Research Institute, Web.. -“CTRnet - Events Archive.” Virginia Polytechnic Institute and State University. Web. // References

??? Questions ???