Introduction to Supervised Machine Learning Concepts PRESENTED BY B. Barla Cambazoglu February 21, 2014.

Slides:

Advertisements

Similar presentations

Imbalanced data David Kauchak CS 451 – Fall 2013.

Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Large-Scale Entity-Based Online Social Network Profile Linkage.

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Indian Statistical Institute Kolkata

Data Mining Classification: Naïve Bayes Classifier

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Ensemble Learning: An Introduction

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

CS157A Spring 05 Data Mining Professor Sin-Min Lee.

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.

Introduction to Machine Learning Approach Lecture 5.

Chapter 5 Data mining : A Closer Look.

Introduction to machine learning

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104

1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.

Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen

Lecture 2: Introduction to Machine Learning

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.

Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Bayesian Networks. Male brain wiring Female brain wiring.

1 Data Mining Lecture 5: KNN and Bayes Classifiers.

Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.

Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.

Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.

Learning from Observations Chapter 18 Through

Today Ensemble Methods. Recap of the course. Classifier Fusion

Classification Techniques: Bayesian Classification

Learning from observations

1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.

CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.

Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Classification And Bayesian Learning

Class Imbalance in Text Classification

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Linear Models & Clustering Presented by Kwak, Nam-ju 1.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

Machine Learning Models

Introduction Machine Learning 14/02/2017.

Source: Procedia Computer Science（2015）70:

Data Mining Lecture 11.

Prepared by: Mahmoud Rafeek Al-Farra

Overview of Machine Learning

iSRD Spam Review Detection with Imbalanced Data Distributions

Classification and Prediction

Presentation transcript:

Introduction to Supervised Machine Learning Concepts PRESENTED BY B. Barla Cambazoglu February 21, 2014

Guest Lecturer’s Background 2

Lecture Outline 3  Basic concepts in supervised machine learning  Use case: Sentiment-focused web crawling

Basic Concepts

What is Machine Learning? 5  Wikipedia: “Machine learning is a branch of artificial intelligence, concerning the construction and study of systems that can learn from data.”  Arthur Samuel: “Field of study that gives computers the ability to learn without being explicitly programmed.”  Tom M. Mitchell: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

Unsupervised versus Supervised Machine Learning 6  Unsupervised learning › Assumes unlabeled data (the desired output is not known) › Objective is to discover the structure in the data  Supervised learning › Trained on labeled data (the desired output is known) › Objective is to generate an output for previously unseen input data

Supervised Machine Learning Applications 7  Common › Spam filtering › Recommendation and ranking › Fraud detection › Stock price prediction  Not so common › Recognize the user of a mobile device based on how he holds and moves the phone › Predict whether someone is a psychopath based on his twitter usage › Identify whales in the ocean based on audio recordings › Predict in advance whether a product launch will be successful or not

Terminology 8  Instance  Label  Feature  Training set  Test set  Learning model  Accuracy Toy problem: To predict the income level of a person based on his/her facial attributes.

Instances 9

Categorical Labels 10

Numeric Labels 11 $12K $11K $9K $8K$7K $5K $1K $2K $3K$4K

Features 12 Blonde No White No Male 5cm Bald No White Yes Male 0cm White No Black Yes Male 3cm Dark Yes White No Female 12cm

Training Set 13 Blonde No White No Male 5cm Bald No White Yes Male 0cm White No Black Yes Male 3cm Dark Yes White No Female 12cm

Test Set 14 Dark No White No Female 14cm Dark No White Yes Male 6cm Dark No Black No Male 6cm Dark Yes White No Female 15cm

Training and Testing 15 Model Training Testing Prediction Test instance Set of training instances

Accuracy 16 Actual labels Predicted labels Accuracy = # of correct predictions / total number of predictions = 2 / 4 = 50%

Precision and Recall 17  In certain cases, there are two class labels and predicting a particular class correctly is more important than predicting the other.  A good example is top-k ranking in web search.  Performance measures: › Recall › Precision

Some Practical Issues 18  Problem: Missing feature values  Solution: › Training: Use the most frequently observed (or average) feature value in the instance’s class. › Testing: Use the most frequently observed (or average) feature value in the entire training set.  Problem: Class imbalance  Solution › Oversampling: Duplicate the training instances in the small class › Undersampling: User fewer instances from the bigger class

Majority Classifier 19  Training: Find the class with the largest number of instances.  Testing: For every test instance, predict that class as the label, independent of the features of the test instance. Model PredictionTesting Class Size

k-Nearest Neighbor Classifier 20  Training: None! (known as a lazy classifier).  Testing: Find the k instances that are most similar to the test instance and use majority voting to decide on the label. k = 3

Decision Tree Classifier 21  Training: Build a tree where leaves represent labels and branches represent features that lead to those labels.  Testing: Traverse the tree using the feature values of the test instance. BlackWhite Black Not black Yes No

Naïve Bayes Classifier 22  Training: For every feature value v and class c pair, we compute and store in a lookup table the conditional probability P(v | c).  Testing: For each class c, we compute: P( | ) = 0.40 P( | ) = 0.65 P( | ) = 0.78

Other Commonly Used Classifiers 23  Support vector machines  Boosted decision trees  Neural networks

Use Case: Sentiment-Focused Web Crawling G. Vural, B. B. Cambazoglu, and P. Senkul, “Sentiment-focused web crawling”, CIKM’12, pp

Problem 25  Early discovery of the opinionated content in the Web is important.  Use cases › Measuring brand loyalty or product adoption › Politics › Finance  We would like to design a sentiment-focused web crawler that aims to maximize the amount of sentimental/opinionated content fetched from the Web within a given amount of time.

Web Crawling 26  Subspaces › Downloaded pages › Discovered pages › Undiscovered pages

Sentiment-Focused Web Crawling 27  Challenge: to predict the sentimentality of an “unseen” web page, i.e., without having access to the page content.

Features 28  Assumption: Sentimental pages are more likely to be linked by other sentimental pages.  Idea: Build a learning model using features extracted from › Textual content of referring pages › Anchor text on the hyperlinks › URL of the target page

Labels 29  Our data (ClueWeb09-B) lacks ground-truth sentiment scores.  We created a ground-truth using the SentiStrength tool. › Assigns a sentiment score (between 0 and 8) to each web page as its label.  A small scale user-study is conducted with three judges to verify the suitability of this ground-truth. › 500 random pages sampled from the collection. › pages are labeled as sentimental or not sentimental.  Observations › 22% of the pages are labeled as sentimental. › High agreement between judges: the overlap is above 85%.

Learner and Performance Metric 30  As the learner, we use the LibSVM software in the regression mode.  We rebuild the prediction model at regular intervals throughout the crawling process.  As the main performance metric, we compute the total sentimentality score accumulated after fetching a certain number of pages.

Evaluated Crawlers 31  Proposed crawlers › based on the average sentiment score of referring page content › based on machine learning  Oracle crawlers › highest sentiment score › highest spam score › highest PageRank  Baseline crawlers › random › indegree-based › breadth first

Performance 32