Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Graph-based cluster labeling using Growing Hierarchal SOM Mahmoud Rafeek Alfarra College Of Science & Technology The second International.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.
CHAPTER 9 External Selection II.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Introduction to Data Mining Engineering Group in ACL.
CSCI 347 – Data Mining Lecture 01 – Course Overview.
Experiences in Undergraduate Studies in the University of Zaragoza LEFIS Undergraduate studies Oslo, 19 th -20 th May 2006.
IT Planning and Managment Lecture 3: Advanced Terminology Collage of Information Technology University of Palestine, Gaza Prepared by: Mahmoud Rafeek Alfarra.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology Khan younis.
Machine Learning for Language Technology Introduction to Weka: Arff format and Preprocessing.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Computer Programming 2 Lecture 1: Advanced Array Data Structure Using Methods Prepared & Presented by: Mahmoud Rafeek Alfarra MINISTRY OF EDUCATION & HIGHER.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Using Java MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Chapter 6: Information Retrieval and Web Search
Using Java MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.
Data Warehousing Lecture-30 What can Data Mining do? Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Mahmoud Rafeek Alfarra Computer Programming || Chapter 1: Introduction & OOP.
Computer Programming 2 Lecture 8: Object Oriented Programming Creating Classes & Objects Prepared & Presented by: Mahmoud Rafeek Alfarra MINISTRY OF EDUCATION.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Presented By: Mahmoud Rafeek Alfarra
Introduction Machine Learning 14/02/2017.
Prepared by: Mahmoud Rafeek Al-Farra
7.5 Properties of Exponents and Scientific Notation
Presented By: Mahmoud Rafeek Alfarra
Introduction to Data Mining
MIS 451 Building Business Intelligence Systems
7.5 Properties of Exponents and Scientific Notation
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
FUNDAMENTALS OF MACHINE LEARNING AND DEEP LEARNING
Introduction To Programming Information Technology , 1’st Semester
Part a: Fundamentals & Class String
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Chapter 1: Introduction
Presented By: Mahmoud Rafeek Alfarra
Chapter 4 - Case Study Clustering
Introduction To Programming Information Technology , 1’st Semester
Introduction To Programming Information Technology , 1’st Semester
Presented By: Mahmoud Rafeek Alfarra
Chapter 3 Careers in Healthcare
Objectives Data Mining Course
CSE 635 Multimedia Information Retrieval
Introduction to Law.
Get on the Path to STEM Success
Background Prepared by: Mr. Mahmoud Rafeek Alfarra.
Prepared by: Mahmoud Rafeek Al-Farra
Presented By: Mahmoud Rafeek Alfarra
Introduction To Programming Information Technology , 1’st Semester
FLOSCAN: An Artificial Life Based Data Mining Algorithm
Presented By: Mahmoud Rafeek Alfarra
Quality management and Process improvement
Online NCERT Solution for Class 11 Political Science
Presentation transcript:

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining Chapter 2_1: Data Preparation and Preprocessing Case Study

Course’s Out Lines  Introduction  Data Preparation and Preprocessing  Association Rules  Classification Methods  Evaluation  Clustering Methods  Mid Exam  Knowledge Representation  Special Case study : Document clustering  Discussion of Case studies by students 2

Consider the following instances  The documents before preprocessing are the following:  Document 1:  Palestine freedom requires all Muslims.  All Muslims must pray five times every day.  Palestinians and Muslims are persecuted by United Nations.  Document 2:  Freedom for Palestine.  Palestine is a holy land for all Muslims.  The legal right of Palestine for Muslims.  I am proud to be Muslim.  Document 3:  Support our legal rights to Palestine.  I am proud to be from Palestine. 3

After the preprocessing 4  After passing them on the preprocessing steps many words will be removed  (ex. Our, to, am, the, five and so on)  Others will be stemmed to their roots  (ex. Muslims is stemmed to Muslim and persecuted to persecute and so on).

After the preprocessing 5  Now, after the preprocessing steps the three documents will be as the follows:  Document 1:  Palestin freedom requir all Muslim.  All Muslim pray.  Palestin Muslim persecut unit nation.  Document 2:  Freedom Palestin.  Palestin holy land all Muslim.  Legal right Palestin Muslim.  Proud Muslim.  Document 3:  Support legal right Palestin.  Proud Palestin.

Then … representation 6 item4item3item2item1 1110Doc1 1111Doc2 0011Doc3 0110Doc4 One of Possible ways Then our application uses each document as a vector

Thanks 7