Presentation on theme: "Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra."— Presentation transcript:
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining 2013 www.cst.ps/staff/mfarra Chapter 2_1: Data Preparation and Preprocessing Case Study
Course’s Out Lines Introduction Data Preparation and Preprocessing Association Rules Classification Methods Evaluation Clustering Methods Mid Exam Knowledge Representation Special Case study : Document clustering Discussion of Case studies by students 2
Consider the following instances The documents before preprocessing are the following: Document 1: Palestine freedom requires all Muslims. All Muslims must pray five times every day. Palestinians and Muslims are persecuted by United Nations. Document 2: Freedom for Palestine. Palestine is a holy land for all Muslims. The legal right of Palestine for Muslims. I am proud to be Muslim. Document 3: Support our legal rights to Palestine. I am proud to be from Palestine. 3
After the preprocessing 4 After passing them on the preprocessing steps many words will be removed (ex. Our, to, am, the, five and so on) Others will be stemmed to their roots (ex. Muslims is stemmed to Muslim and persecuted to persecute and so on).
After the preprocessing 5 Now, after the preprocessing steps the three documents will be as the follows: Document 1: Palestin freedom requir all Muslim. All Muslim pray. Palestin Muslim persecut unit nation. Document 2: Freedom Palestin. Palestin holy land all Muslim. Legal right Palestin Muslim. Proud Muslim. Document 3: Support legal right Palestin. Proud Palestin.
Then … representation 6 item4item3item2item1 1110Doc1 1111Doc2 0011Doc3 0110Doc4 One of Possible ways Then our application uses each document as a vector