TED Talks – A Predictive Analysis Using Classification Algorithms

Slides:

Advertisements

Similar presentations

Florida International University COP 4770 Introduction of Weka.

Advertisements

Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.

Introduction to Data Mining with XLMiner

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April Presented by Jisu.

A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft

Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.

SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.

SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz.

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .

How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.

Prediction of Influencers from Word Use Chan Shing Hei.

Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.

1 The role of feedback and self- efficacy on web-based learning: The social cognitive perspective Source: Computers & Education 51 (2008) 1589 – 1598 Authors:

An Exercise in Machine Learning

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.

Making Sense of Large Volumes of Unstructured Responses K. M. P. N. Jayathilaka Department of Statistics University of Colombo.

Experience Report: System Log Analysis for Anomaly Detection

TED Talk & Language Learning

Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.

A Smart Tool to Predict Salary Trends of H1-B Holders

ICS 3UI - Introduction to Computer Science

Applying Deep Neural Network to Enhance EMPI Searching

Admission Prediction System

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Evaluating Classifiers

SLAQ: Quality-Driven Scheduling for Distributed Machine Learning

Rule Induction for Classification Using

Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.

Project Participants Mitch Campion, M.S. Graduate Student

Erasmus University Rotterdam

Machine Learning for Safer Roads

Jiawei Han Department of Computer Science

Our Data Science Roadmap

Machine Learning & Data Science

Categorizing networks using Machine Learning

Intro to Machine Learning

Accuracy and Precision

Evaluating Classifiers (& other algorithms)

iSRD Spam Review Detection with Imbalanced Data Distributions

Lecture 6: Introduction to Machine Learning

Intro to Machine Learning

Classification Breakdown

Machine Learning for Visual Scene Classification with EEG Data

Analysis for Predicting the Selling Price of Apartments Pratik Nikte

Scatter Plots Unit 11 B.

Lecture 10 – Introduction to Weka

Sentiment Analysis In Student Learning Experience By Obinna Obeleagu

Sentiment Analysis In Student Learning Experience By Obinna Obeleagu

MississaugaTalks! Saif Shaikh March 5, 2016 Code and the City

Clinically Significant Information Extraction from Radiology Reports

Our Data Science Roadmap

Evaluating Classifiers

Assignment 1: Classification by K Nearest Neighbors (KNN) technique

Introduction to Sentiment Analysis

Analysis on Accelerated Learning Cohorts

Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.

Practice Project Overview

Pooja Pun, Avdesh Mishra, Simon Lailvaux, Md Tamjidul Hoque

Credit Card Fraudulent Transaction Detection

Using Machine Learning to Analyze Serial Killer Patterns

An introduction to Machine Learning (ML)

Presentation transcript:

TED Talks – A Predictive Analysis Using Classification Algorithms Paulami Ray(paulami2@Illinois.edu),Kumkum Yadav(kumkumy2@Illinois.edu),Garima Garg(garimag2@Illinois.edu ) School of Information Sciences, University of Illinois at Urbana-Champaign Introduction: TED talks are a great source of knowledge and ideas on a plethora of topics such as Technology, Entertainment, Design, Academic Research etc. which are presented by distinguished speakers. Aim: Predicting the number of views of the talk Analyzing the overall reaction to the talks based on the user comments. Visualizations: Accuracy percentage for predictions: Analysis: Precision Recall Curve: Table 1: Number of views Fig 1: Top Ten Speakers Fig 6: Precision Recall Curve for predicting number of views Dataset: The dataset contains the details of around 2550 TED talks from year 2006 till 2017. Fig 7: Precision Recall Curve for predicting reaction of talks Table 2: Reactions to talks Fig 2: Number of Views on Talks per year Fig 3: Number of Talks per year Findings : Number of views and the number of comments were correlated. Talks with higher number of views had high number of comments. Most of the talks were on technology and very less on innovations. The number of talks increased over the years. The top ten speakers were mainly authors and motivational speakers. Conclusions and Future Work: We implemented five classification models and tested. Logistic regression does well in predicting the number of views of the talks. Random forest algorithm gives the best accuracy for predicting the reaction of the talks Using this model we can extend this research on datasets of various media and advertising and other online platforms to predict the user reviews. Confusion Matrix: Fig 4: Number of views Data Cleaning and Pre-processing: Removed the special characters like ‘$’ ,’/’,’ ^’ etc. that were present in the data. Corrected date format in desired way - ddmmyyyy Divided the data in binary form 1 for numbers more than median,-1 for numbers less than median to get low and high views. Removed outliers from the data. Categorized words into positive, negative and neutral ratings. References: [1] https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/ [2] http://dataaspirant.com/2016/09/24/classification-clustering-alogrithms/ [3] https://www.ted.com/topics/programming [4] https://www.tableau.com/beginners-data-visualization [5] www.kaggle.com/ Fig 5: Reaction to talks