Information Storage and Retrieval(CS 5604) Collaborative Filtering 4/28/2016 Tianyi Li, Pranav Nakate, Ziqian Song Department of Computer Science Blacksburg,

Slides:



Advertisements
Similar presentations
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Advertisements

Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Web Archive Content Analysis: Disaster Events Case Study IIPC 2015 General Assembly Stanford University and Internet Archive Mohamed Farag Dr. Edward A.
Ontology Classifications Acknowledgement Abstract Content from simulation systems is useful in defining domain ontologies. We describe a digital library.
Digital Library Service Integration (DLSI) --> Looking for Collections and Services to be DLSI Testbeds
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
A NON-IID FRAMEWORK FOR COLLABORATIVE FILTERING WITH RESTRICTED BOLTZMANN MACHINES Kostadin Georgiev, VMware Bulgaria Preslav Nakov, Qatar Computing Research.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
Collaborative Research: Curriculum Development for Digital Library Education Presentation in May 1,2006
CTRnet: A Crisis, Tragedy, & Recovery Network ( Oct.16, 2009 VCOM Research Day Blacksburg, VA USA Edward Fox Bidisha.
Reducing Noise CS5604: Final Presentation Xiangwen Wang, Prashant Chandrasekar.
CITIDEL: Computing & Information Technology Interactive Digital Educational Library Web Page: Contacts: Future.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Solr Team CS5604: Cloudera Search in IDEAL Nikhil Komawar, Ananya Choudhury, Rich Gruss Tuesday May 5, 2015 Department of Computer Science Virginia Tech,
Tweets Metadata May 4, 2015 CS Multimedia, Hypertext and Information Access Department of Computer Science Virginia Polytechnic Institute and State.
VIRGINIA TECH BLACKSBURG CS 4624 MUSTAFA ALY & GASPER GULOTTA CLIENT: MOHAMED MAGDY IDEAL Pages.
ProjFocusedCrawler CS5604 Information Storage and Retrieval, Fall 2012 Virginia Tech December 4, 2012 Mohamed M. G. Farag Mohammed Saquib Khan Prasad Krishnamurthi.
The AlgoViz Project Cliff Shaffer Department of Computer Science Virginia Tech Blacksburg, VA.
1 Video Message: Welcome ETD 2015: 18 th Int’l Symposium on ETDs New Delhi, India Edward A. Fox Executive Director, Chairman of the Board NDLTD,
1 IBM Academic Initiative Introduction for Pamplin School of Business Virginia Tech – October 13, 2011 “IBM Academic Skills Cloud and Computing Education.
Algorithm Visualization (AV)  AVs are used for motivating students in exploring the core concepts of data structure and algorithms.  Instructors report.
Crisis, Tragedy and Recovery Network (CTRnet) Slides by Kiran Chitturi, Edward A. Fox, and the CTRnet team
Problem Based Learning To Build And Search Tweet And Web Archives Richard Gruss Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science.
Designing Protocols in Support of Digital Library Componentization Hussein Suleman and Edward A. Fox Digital Library Research Laboratory Virginia Tech.
User Modeling and Recommender Systems: recommendation algorithms
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
GFURR seminar Can Collecting, Archiving, Analyzing, and Accessing Webpages and Tweets Enhance Resilience Research and Education? Edward A. Fox, Andrea.
Recommender Systems Based Rajaraman and Ullman: Mining Massive Data Sets & Francesco Ricci et al. Recommender Systems Handbook.
CTRnet Digital Library for Disaster Information Services Seungwon Yang 1, Andrea Kavanaugh 1, Nádia P. Kozievitch 4, Lin Tzy Li 1,4,5, Venkat Srinivasan.
Collaborative Deep Learning for Recommender Systems
ItemBased Collaborative Filtering Recommendation Algorithms 1.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Big Data Processing of School Shooting Archives
CS6604 Digital Libraries Global Events Team Final Presentation
Collection Management (Tweets) Final Presentation
Collection Management Webpages
IDEALvr Team: Luciano Biondi, Omavi Walker, Dagmawi Yeshiwas
Collection Management
ArchiveSpark Andrej Galad 12/6/2016 CS-5974 – Independent Study
Zenodo Data Archive Irtiza Delwar, Michael Culhane, John Sizemore, Gil Turner Client: Dr. Seungwon Yang Instructor: Dr. Edward A. Fox CS 4624 Multimedia,
Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:
Clustering and Topic Analysis
Virginia Tech Blacksburg CS 4624
Tweet Collections Multimedia, Hypertext, and Information Access
Clustering tweets and webpages
CS 5604 Information Storage and Retrieval
CS6604 Digital Libraries IDEAL Webpages Presented by
CMPT 733, SPRING 2016 Jiannan Wang
Event Focused URL Extraction from Tweets
Team FE Final Presentation
Collection Management Webpages Final Presentation
Event Trend Detector Ryan Ward, Skylar Edwards, Jun Lee, Stuart Beard, Spencer Su CS 4624 Multimedia, Hypertext, and Information Access Instructor: Edward.
Twitter Equity Firm Value
CS6604 Digital Libraries IDEAL Webpages Presented by
Validation of Ebola LOD
Information Storage and Retrieval
News Event Detection Website Joe Acanfora, Briana Crabb, Jeff Morris
Michael Shuffett Virginia Tech Blacksburg, VA
HPML Conference, Lyon, Sept 2018
Tweet URL Analysis Guoxin Sun, Kehan Lyu, Liyan Li
Katrina Database SearchKat
Movie Recommendation System
Charles Tappert Seidenberg School of CSIS, Pace University
Web archives as a research subject
CMPT 733, SPRING 2017 Jiannan Wang
Presentation transcript:

Information Storage and Retrieval(CS 5604) Collaborative Filtering 4/28/2016 Tianyi Li, Pranav Nakate, Ziqian Song Department of Computer Science Blacksburg, Virginia – Dr. Edward A. Fox Virginia Tech

Agenda Role and goal User-based recommendation Recommendation process Algorithm Implementation Item-based recommendation Overview Algorithms and Implementation Build Process

Project goal Our project is to serve the Integrated Digital Event Archiving and Library (IDEAL) project. IDEAL project provides services for searching, browsing, analysing, and visualization of over 1 billion tweets and over 65 million webpages. Building a recommendation system and recommending tweets and webpages to assist users searching and browsing the IDEAL collection. User-based recommendation Item-based recommendation

User-based recommendation vs. Item- based recommendation ● Clicking history User-item matrix determining relationships between user and item. ● Content similarity Build an item-item matrix determining relationships between pairs of items.

Data requirement

User-based recommendation val data = sc.textFile("fakeUser.data") val ratings = data.map(_.split(',') match { case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble)}) // Build the recommendation model using ALS val rank = 10 val numIterations = 10 val model = ALS.train(ratings, rank, numIterations, 0.01) val productuser=model.recommendProductsForUsers(3).collect()

Algorithm: Recommendation strategy Neighborhood methods Latent factor models

Algorithm: Matrix factorization ALS (alternating least squares) Parallelization Implicit data

Implementation

Item-based recommendations Recommendations based on the text of the documents Useful for retrieving related documents System Pipeline

Implementation details Document preprocessing - Word lemmatization Feature extraction and transformation using TF-IDF method Repartitioning the RDD to increase parallelization DIMSUM algorithm by Twitter! Results collection and post-processing Configuration parameters for spark job - Similarity threshold (DIMSUM), spark executor memory, shuffle memory size

Build process Code compilation JAR file creation SBT script Spark-submit Evaluation - Mean squared error (MSE)

Acknowledgement We want to thank Dr. Fox and our GRAs Mohamed Magdy Farag and Sunshin Lee for guiding and advising us. Also grateful for the effort by Front-end and Solr team to communicate and provide the data we needed to implement the system. We thank the whole class for discussing the problems and learning about information retrieval together. NSF grant IIS , III: Small: Integrated Digital Event Archiving and Library (IDEAL).

Thank you! Q&A