By Klejdi Muca & Stephen Quinn. A method used by companies like IMDB or Netlfix to turn raw data into useful information, for example It helps companies.

Slides:



Advertisements
Similar presentations
1 ©2009 MeeMix MeeMix – A personalized Experience.
Advertisements

Web Mining.
Application of Data Mining in TV and Films Daniel Johnston and Nabeel Hanif.
Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Unit 7: Store and Retrieve it Database Management Systems (DBMS)
G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit
Agenda Secondary Data Qualitative Research Primary vs. Secondary
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Shirley Abrahami Aryk Grosz Laura Halstead Veronica Vela.
Chapter 3 Database Management
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Clementine Server Clementine Server A data mining software for business solution.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Recommender systems Ram Akella November 26 th 2008.
Data Mining – Intro.
Brenda Woods John Williams Daniel Bailey Breia Stamper.
Data Mining: A Closer Look
3-1 Chapter Three. 3-2 Secondary Data vs. Primary Data Secondary Data: Data that have been gathered previously. Primary Data: New data gathered to help.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Business Intelligence
Data Mining Techniques
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
By Rachsuda Jiamthapthaksin 10/09/ Edited by Christoph F. Eick.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
@ ?!.
©2002 South-Western Chapter 8 Version 6e1 chapter Marketing Research 8 8.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
BUSINESS DRIVEN TECHNOLOGY
Data Mining By Dave Maung.
Presenter: Shanshan Lu 03/04/2010
Netflix Netflix is a subscription-based movie and television show rental service that offers media to subscribers: Physically by mail Over the internet.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
CS 1308 Computer Literacy and the Internet
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
ITGS Databases.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
MOVIE RETRIEVAL SYSTEM INFORMATION VISUALIZATION & PROPOSING NEW INTERFACE IAT 814 Adrian Bisek.
What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Lesson 9: Types of information system. Introduction  An MIS is a decision support system in which the form of input query and response is predetermined.
© 2006 Pearson Education Canada Inc. 3-1 Chapter 3 Database Management PowerPoint Presentation Jack Van Deventer Ward M. Eagen.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
ERP and Related Technologies
Video On Demand Video on Demand is damaging to those who would normally go into DVD rental shop. The average time spent browsing in a video shop used.
The Cinema Analytics Opportunity 1 Join the Data Revolution.
- Sachin Singh. Data Mining - Concepts Extracting meaningful knowledge from huge chunk of ‘raw’ data. Types –Association –Classification –Temporal.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Pengantar Sistem Informasi
Data Mining – Intro.
Data Based Decision Making
MIS2502: Data Analytics Advanced Analytics - Introduction
Chapter 12 Information Systems.
Q4 : How does Netflix recommend movies?
Boštjan Kožuh Statistical Office of the Republic of Slovenia,
Ensembles.
Understanding Customer Behaviors with Information Technologies
Databases and Information Systems
Chapter 3 Database Management
MIS2502: Data Analytics Introduction to Advanced Analytics and R
Presentation transcript:

By Klejdi Muca & Stephen Quinn

A method used by companies like IMDB or Netlfix to turn raw data into useful information, for example It helps companies concentrate on the most important behavioural data that they have collected from their users and even potential users. It enables companies such as Blockbuster to mine their video rental history database to recommend rentals to individual customers. The techniques and algorithms data mining uses will not just change a presentation, but discovers formerly unknown relationships in the data.

The Internet Movie Database provides current Film and TV programme information freely to the user. IMDB includes plot summaries, actors, production crew and significantly offers a rating system that allows users to rate films on a scale of one to ten. “The database aims to capture any and all information associated with movies from any part of the world, starting with the earliest cinema to the very latest releases.” IMDB uses data mining techniques to find relationships in its dataset and structures it well allowing the user to navigate around the website easily and efficiently.

In 2012 The AIUB (American international University-Bangladesh) started a project in which they attempted to create a classification scheme of pre-release movie popularity based on inherent attributes using C4.5 (an algorithm used to generate a decision tree.) their aim was to basically attempt to create a system that would predict how popular a film/ TV title would be based on the relationships found between data gathered from other Film/TV titles. The data gathered included: production budget actors directors country language release date All of this information would be parsed and inserted into an SQL database where queries will be created and sorted into its final data sets and analysed with the use of WEKA for patterns in the relationships, examples would be whether the more money spent on a film would result in a greater financial return or if films directed by a certain director would be more likely to be popular.

Figure 1

Netflix is an American based internet streaming service that provides on demand TV programmes and films to its subscribers. Netflix uses data mining to its advantage by mining the films and TV programmes that the subscriber has watched as well as the rating that they gave, Netflix will then use data mining techniques to find patterns in the data and then proceed to produce recommendations to the subscriber. On October 2nd 2006 the 'Netflix Prize' began, the aim of the competition was for its competitors to create a collaborative filtering algorithm that improved Netflix's prediction accuracy by 10%, the winners of the competition were BellKor's pragmatic chaos team who in 2009 achieved an improvement of 10.06%. Why did they do this? Customer satisfaction/retention is key to Netflix – they would really like to improve their recommendation systems.

This technique is commonly used for predicting a precise outcome such as star ratings and whether the user is likely to watch or not watch a TV programme or film. This technique is used to rank the strength of a relationship with its target attribute, for example the budget of a film and its relationship with how popular the film will be the same can be done with actors, actresses or directors that are involved with a film and consequently how likely the film is to be popular based on those attributes.

This technique is used to find natural groups within a data set, for example movie genres, films by certain directors and TV or films that contain a specific actor/actress. This technique is used to detect results that do not follow the normal pattern a good example of this would be from the Netflix prize when the film ‘Napoleon Dynamite’ caused problems for the participants because of users varying ratings of the film, some users rated the film poorly whereas others rated it very highly making it very hard to predict how popular the film was going to be, some contestants claimed to be on average eight-tenths of a star out but on films such as ‘Napoleon Dynamite’ they were off by an average of 1.2 stars.

Text analytics is the process of finding High quality information/knowledge from a piece of text. This is done through the use of software such as: Autonomy AeroText Medallia These pieces of software analyse the text to find patterns and trends through statistical pattern learning. Around 80% of information in the world is currently stored in unstructured textual format.

We can analyse a film or TV programmes popularity by extracting reviews from websites such as Rotten Tomatoes, IMDB and Twitter. Both Rotten Tomatoes and Twitter contain API's (application programming interface) that will allow us to write a program that will interact with the data set and extract the data that we need. IMDB however does not contain an API meaning we would have to extract the data manually.

From Twitter we can search for the movie by using the hashtag or any words that relates to the film. For example for the film Twilight a user can type in Breaking Bad or #BreakingBad and get all information other users opinions about the film around the world. Or if the user wants to be more specific and refine the result they can simply search Breaking bad/ and other key words such as good/ amazing/ terrible and they will be presented with other people’s review on the film. Each tweet can be analysed to find key words and phrases that are commonly used, to get an understanding of the trends and patterns.