Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013

Similar presentations

Presentation on theme: "Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013"— Presentation transcript:

1 Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 Chapter 2_1: Data Preparation and Preprocessing Case Study

2 Course’s Out Lines  Introduction  Data Preparation and Preprocessing  Association Rules  Classification Methods  Evaluation  Clustering Methods  Mid Exam  Knowledge Representation  Special Case study : Document clustering  Discussion of Case studies by students 2

3 Consider the following instances  The documents before preprocessing are the following:  Document 1:  Palestine freedom requires all Muslims.  All Muslims must pray five times every day.  Palestinians and Muslims are persecuted by United Nations.  Document 2:  Freedom for Palestine.  Palestine is a holy land for all Muslims.  The legal right of Palestine for Muslims.  I am proud to be Muslim.  Document 3:  Support our legal rights to Palestine.  I am proud to be from Palestine. 3

4 After the preprocessing 4  After passing them on the preprocessing steps many words will be removed  (ex. Our, to, am, the, five and so on)  Others will be stemmed to their roots  (ex. Muslims is stemmed to Muslim and persecuted to persecute and so on).

5 After the preprocessing 5  Now, after the preprocessing steps the three documents will be as the follows:  Document 1:  Palestin freedom requir all Muslim.  All Muslim pray.  Palestin Muslim persecut unit nation.  Document 2:  Freedom Palestin.  Palestin holy land all Muslim.  Legal right Palestin Muslim.  Proud Muslim.  Document 3:  Support legal right Palestin.  Proud Palestin.

6 Then … representation 6 item4item3item2item1 1110Doc1 1111Doc2 0011Doc3 0110Doc4 One of Possible ways Then our application uses each document as a vector

7 Thanks 7

Download ppt "Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013"

Similar presentations

Ads by Google