TED Talks – A Predictive Analysis Using Classification Algorithms

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.
Introduction to Data Mining with XLMiner
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April Presented by Jisu.
A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Prediction of Influencers from Word Use Chan Shing Hei.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
1 The role of feedback and self- efficacy on web-based learning: The social cognitive perspective Source: Computers & Education 51 (2008) 1589 – 1598 Authors:
An Exercise in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.
Making Sense of Large Volumes of Unstructured Responses K. M. P. N. Jayathilaka Department of Statistics University of Colombo.
Experience Report: System Log Analysis for Anomaly Detection
TED Talk & Language Learning
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
A Smart Tool to Predict Salary Trends of H1-B Holders
ICS 3UI - Introduction to Computer Science
Applying Deep Neural Network to Enhance EMPI Searching
Admission Prediction System
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Evaluating Classifiers
SLAQ: Quality-Driven Scheduling for Distributed Machine Learning
Rule Induction for Classification Using
Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.
Project Participants Mitch Campion, M.S. Graduate Student
Erasmus University Rotterdam
Machine Learning for Safer Roads
Jiawei Han Department of Computer Science
Our Data Science Roadmap
Machine Learning & Data Science
Categorizing networks using Machine Learning
Intro to Machine Learning
Accuracy and Precision
Evaluating Classifiers (& other algorithms)
iSRD Spam Review Detection with Imbalanced Data Distributions
Lecture 6: Introduction to Machine Learning
Intro to Machine Learning
Classification Breakdown
Machine Learning for Visual Scene Classification with EEG Data
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Scatter Plots Unit 11 B.
Lecture 10 – Introduction to Weka
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
Sentiment Analysis In Student Learning Experience By Obinna Obeleagu
MississaugaTalks! Saif Shaikh March 5, 2016 Code and the City
Clinically Significant Information Extraction from Radiology Reports
Our Data Science Roadmap
Evaluating Classifiers
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Introduction to Sentiment Analysis
Analysis on Accelerated Learning Cohorts
Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.
Practice Project Overview
Pooja Pun, Avdesh Mishra, Simon Lailvaux, Md Tamjidul Hoque
Credit Card Fraudulent Transaction Detection
Using Machine Learning to Analyze Serial Killer Patterns
An introduction to Machine Learning (ML)
Presentation transcript:

TED Talks – A Predictive Analysis Using Classification Algorithms Paulami Ray(paulami2@Illinois.edu),Kumkum Yadav(kumkumy2@Illinois.edu),Garima Garg(garimag2@Illinois.edu ) School of Information Sciences, University of Illinois at Urbana-Champaign Introduction:   TED talks are a great source of knowledge and ideas on a plethora of topics such as Technology, Entertainment, Design, Academic Research etc. which are presented by distinguished speakers. Aim: Predicting the number of views of the talk Analyzing the overall reaction to the talks based on the user comments.  Visualizations: Accuracy percentage for predictions: Analysis: Precision Recall Curve: Table 1: Number of views Fig 1: Top Ten Speakers Fig 6: Precision Recall Curve for predicting number of views Dataset:   The dataset contains the details of around 2550 TED talks from year 2006 till 2017.  Fig 7: Precision Recall Curve for predicting reaction of talks Table 2: Reactions to talks Fig 2: Number of Views on Talks per year Fig 3: Number of Talks per year Findings : Number of views and the number of comments were correlated. Talks with higher number of views had high number of comments. Most of the talks were on technology and very less on innovations. The number of talks increased over the years. The top ten speakers were mainly authors and motivational speakers. Conclusions and Future Work: We implemented five classification models and tested. Logistic regression does well in predicting the number of views of the talks. Random forest algorithm gives the best accuracy for predicting the reaction of the talks Using this model we can extend this research on datasets of various media and advertising and other online platforms to predict the user reviews. Confusion Matrix: Fig 4: Number of views Data Cleaning and Pre-processing:    Removed the special characters like ‘$’ ,’/’,’ ^’ etc. that were present in the data. Corrected date format in desired way - ddmmyyyy Divided the data in binary form 1 for numbers more than median,-1 for numbers less than median to get low and high views. Removed outliers from the data. Categorized words into positive, negative and neutral ratings. References: [1] https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/ [2] http://dataaspirant.com/2016/09/24/classification-clustering-alogrithms/ [3] https://www.ted.com/topics/programming [4] https://www.tableau.com/beginners-data-visualization [5] www.kaggle.com/ Fig 5: Reaction to talks