Katrina Database SearchKat

Slides:



Advertisements
Similar presentations
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Advertisements

Modern Information Retrieval Chapter 1 Introduction.
CS 1 with Robots CS1301 – Where it Fits Institute for Personal Robots in Education (IPRE)‏
Access 2007 ® Use Databases How can Access help you to find and use information?
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Grickit William Vuong, Michael Long Date: 4/28/2015Course: 4624 Institution: Virginia TechInstructor: Ed Fox Department: Computer ScienceClient: Dr. Steven.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Topical Categorization of Large Collections of Electronic Theses and Dissertations Venkat Srinivasan & Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
Tweets Metadata May 4, 2015 CS Multimedia, Hypertext and Information Access Department of Computer Science Virginia Polytechnic Institute and State.
WebFOCUS Magnify: Search Based Applications Dr. Rado Kotorov Technical Director of Strategic Product Management.
Acceptance Test By Phoenix Tech AcceptanceTestAcceptanceTest.
VIRGINIA TECH BLACKSBURG CS 4624 MUSTAFA ALY & GASPER GULOTTA CLIENT: MOHAMED MAGDY IDEAL Pages.
Contemplative Practices Interviews Department of Computer Science Virginia Tech Blacksburg, VA 24061, USA April 30, 2015.
1 IBM Academic Initiative Introduction for Pamplin School of Business Virginia Tech – October 13, 2011 “IBM Academic Skills Cloud and Computing Education.
CS5604: Final Presentation ProjOpenDSA: Log Support Victoria Suwardiman Anand Swaminathan Shiyi Wei Department of Computer Science, Virginia Tech December.
Opportunities for Virginia Tech at a Glance Located in Blacksburg, Virginia, with six satellite Commonwealth Campus Centers Eight colleges and a.
Problem Based Learning To Build And Search Tweet And Web Archives Richard Gruss Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science.
Teach. Write. Teach Writing.. THE BASICS The Writing Across the Curriculum (WAC) program provides support for faculty university-wide in implementing.
Information Storage and Retrieval(CS 5604) Collaborative Filtering 4/28/2016 Tianyi Li, Pranav Nakate, Ziqian Song Department of Computer Science Blacksburg,
VT Web Archiving Anthony Rinaldi and Dev Mehta CS 4624 Clients: Mohamed Magdy and Tarek Kanan Blacksburg, VA 5/6/2014.
Big Data Processing of School Shooting Archives
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
IDEALvr Team: Luciano Biondi, Omavi Walker, Dagmawi Yeshiwas
Rdoc2vec Jake Clark, Austin Cooke, Steven Rolph, Stephen Sherrard
Common Crawl Mining Team: Brian Clarke, Tommy Dean, Ali Pasha, Casey Butenhoff Manager: Don Sanderson (Eastman Chemical Company) Client: Ken Denmark.
Background Check Website for R4 OpSec, LLC
Zenodo Data Archive Irtiza Delwar, Michael Culhane, John Sizemore, Gil Turner Client: Dr. Seungwon Yang Instructor: Dr. Edward A. Fox CS 4624 Multimedia,
Database Management Systems
Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:
Virginia Tech Center for Drug Discovery Website Migration and Redesign
VR4GETAR CS4624: Multimedia, Hypertext and Information Access
Trail Study Kevin Cianfarini, Shane Davies, Marshall Hansen, Andrew Eason … CS4624: Multimedia, Hypertext, and Information Access Instructor: Dr. Edward.
Virginia Tech Blacksburg CS 4624
Clustering tweets and webpages
CEED Phone App Madhur Mahajan, Zachary Hensley, Randy Liang, Sean Greynolds CS4624: Multimedia, Hypertext, and Information Access Edward A. Fox Virginia.
Pathways Web CS4624 Multimedia, Hypertext, and Information Access
CS 5604 Information Storage and Retrieval
Maptivity Conor O’Neill, Kaz Eslami, Cody Douglass
DATABASE: INTERMEDIATE
Multimedia Database Virginia Polytechnic Institute and State University Blacksburg, VA CS 4624 Multimedia, Hypertext and Information Access Client.
Video: Contemplative Practices for a Technological Society
Cloud Digital Repo Optimization
Sam Fisher, Josh Horn, Johanna Pinsirikul, Taylor Sims
Collection Management Webpages Final Presentation
Stream Field Final Project Presentation
Event Trend Detector Ryan Ward, Skylar Edwards, Jun Lee, Stuart Beard, Spencer Su CS 4624 Multimedia, Hypertext, and Information Access Instructor: Edward.
Wikipedia Hadoop Steven Stulga Spring 2016
CS6604 Digital Libraries IDEAL Webpages Presented by
Validation of Ebola LOD
LucidWorks: Vectorize Workflow Module
Arabic News Summarization
Information Storage and Retrieval
News Event Detection Website Joe Acanfora, Briana Crabb, Jeff Morris
Paleontology Topic Trends
Computational Linguistic Analysis of Earthquake Collections
IL Step 3: Using Bibliographic Databases
Tweet URL Analysis Guoxin Sun, Kehan Lyu, Liyan Li
Computer Science CS 4624 Virginia Tech Blacksburg, VA USA
Social Interactome Recommender Team
VT Web Archiving Anthony Rinaldi and Dev Mehta CS 4624
CS1301 – Where it Fits Institute for Personal Robots in Education
Blacksburg to Guatemala Archive
CS1301 – Where it Fits Institute for Personal Robots in Education
Autism Support Portal Members: Sib Quayum, Ryan Galliher, Ayumi Ritchie, Kenneth Nagies Course: Multimedia, Hypertext, and Information Access (CS 4624)
Topic 12 Lesson 2 – Retrieving Data with Queries
Using Microsoft Outlook: Outlook Support Number
Capitalization – Academic Classifications
Semester 1, 2019 Dr. Aaron W. Pooley, PhD.
Python4ML An open-source course for everyone
Presentation transcript:

Katrina Database SearchKat University: Virginia Tech Course: CS 4624 (Multimedia, Hypertext, and Information Access) Professor: Edward A. Fox Date: April 30, 2015 Group: Matthew Chittum, Kyle He, Gary Li, Tanvir Rahman Location: Blacksburg, VA 24061, USA

Introduction A cross-disciplinary project combining linguistics with Computer Science. Create a searchable database of interviews of Hurricane Katrina victims. Supports Thematic Searching Search by word association synonyms, antonyms, etc. - query expansion client specific word association

Client Dr. Katie Carmichael Assistant Professor in the English Department College of Liberal Arts and Human Sciences at VT Ph.D from Ohio State University, master’s and bachelor's degree from Tulane University Works in 407 Shanks Hall 181 Turner St NW, Blacksburg, VA 24061, United States Phone: 540-231-7712 email: katcarm@vt.edu Search ‘Katie Carmichael’ at http://www.vt.edu/ source http://www.vtnews.vt.edu/articles/2014/09/091714-clahs-katiecarmichael.html

Project Phases Phase 1: Removing Markings Phase 2: Word Count Phase 3: Search Implementation Basic Searching Thematic Searching/ Query Expansion

Phase 1: Remove Markings Remove unnecessary markings from documents. Before After

Phase 2: Word Count Find word frequency across all documents. Using the results, Dr. Carmichael picked certain words to be used in query expansion. ex: ‘people’ appeared 31 times in document X

Word Groupings Social class Victims/refugees (Social class, academic, college, contest, income, palmisano, tulane, tuition). Victims/refugees (Damage, damaged, damages, denied, destroyed, devastating, refugee, refugees).

Phase 3: Search Implementation First run for searching Customize desired displayed category for with search results. Customize desired display categories for search results. Display the line of text that contains the search term Many results are too short, reflect very little contextual information

Phase 3: Search Implementation

Thematic Searching Able to search by word groupings. Group religion contains: pastor, prayed, prayer, prayers, etc..

Interesting Results Some initial query results were only one word. Solr needs fields for its search results. If not specified, it will try to retrieve them from the source files. Converting original documents to CSV files with specified fields (filename, line content, etc.) helps Solr to return desired search results.

Lessons Learned Learned to utilize Solr Learned about major concepts of search engine: indexing, result filtering, running query, query expansion, etc. Project reports suck

Acknowledgements Katie Carmichael Edward A. Fox Mohamed Magdy Gharib Farag