Presentation is loading. Please wait.

Presentation is loading. Please wait.

2/25/13 - Union University 1 ADVENTURES IN DATA MINING Margaret H. Dunham Southern Methodist University Dallas, Texas 75275 This material.

Similar presentations


Presentation on theme: "2/25/13 - Union University 1 ADVENTURES IN DATA MINING Margaret H. Dunham Southern Methodist University Dallas, Texas 75275 This material."— Presentation transcript:

1

2 2/25/13 - Union University 1 ADVENTURES IN DATA MINING Margaret H. Dunham Southern Methodist University Dallas, Texas 75275 mhd@lyle.smu.edu This material is based in part upon work supported by the National Science Foundation under Grant No. 9820841 and NIH Grant No.1R21HG005912-01A1 Some slides used by permission from Dr Eamonn Keogh; Some slides used by permission from Dr Eamonn Keogh; University of California Riverside; eamonn@cs.ucr.edu ACM Distinguished Speakers Program

3 2/25/13 - Union University 2 The 2000 ozone hole over the antarctic seen by EPTOMS http://jwocky.gsfc.nasa.gov/multi/multi.html#hole

4 Data Mining Outline nIntroduction nTechniques n Classification n Clustering n Association Rules nExamples 2/25/13 - Union University 3 Explore some interesting data mining applications

5 Introduction nData is growing at a phenomenal rate nUsers expect more sophisticated information nHow? 2/25/13 - Union University 4 UNCOVER HIDDEN INFORMATION DATA MINING

6 But it isn’t Magic nYou must know what you are looking for nYou must know how to look for you 2/25/13 - Union University 5 Suppose you knew that a specific cave had gold: What would you look for? How would you look for it? Might need an expert miner

7 CLASSIFICATION nAssign data into predefined groups or classes. 2/25/13 - Union University 6

8 “If it looks like a duck, walks like a duck, and quacks like a duck, then it’s a duck.” 2/25/13 - Union University 7 Description BehaviorAssociations Classification Clustering Link Analysis (Profiling) (Similarity) “If it looks like a terrorist, walks like a terrorist, and quacks like a terrorist, then it’s a terrorist.”

9 Classification Ex: Grading 2/25/13 - Union University 8 >=90<90 x >=80<80 x >=70<70 x F B A >=60<50 x C D

10 2/25/13 - Union University 9 Grasshoppers Katydids Given a collection of annotated data. (in this case 5 instances of Katydids and five of Grasshoppers), decide what type of insect the unlabeled example is. (c) Eamonn Keogh, eamonn@cs.ucr.edu

11 2/25/13 - Union University 10 Insect ID AbdomenLengthAntennaeLength Insect Class 12.75.5Grasshopper 28.09.1Katydid 30.94.7Grasshopper 41.13.1Grasshopper 55.48.5Katydid 62.91.9Grasshopper 76.16.6Katydid 80.51.0Grasshopper 98.36.6Katydid 108.14.7Katydid 11 5.1 7.0 ??????? ??????? The classification problem can now be expressed as: Given a training database predict the class label of a previously unseen instance Given a training database predict the class label of a previously unseen instance previously unseen instance = (c) Eamonn Keogh, eamonn@cs.ucr.edu

12 2/25/13 - Union University 11 Antenna Length 10 123456789 1 2 3 4 5 6 7 8 9 Grasshoppers Katydids Abdomen Length (c) Eamonn Keogh, eamonn@cs.ucr.edu

13 2/25/13 - Union University 12 How Stuff Works, “Facial Recognition,” http://computer.howstuf fworks.com/facial- recognition1.htm http://computer.howstuf fworks.com/facial- recognition1.htm

14 2/25/13 - Union University 13 Facial Recognition (c) Eamonn Keogh, eamonn@cs.ucr.edu

15 2/25/13 - Union University 14 Handwriting Recognition George Washington Manuscript 0 50100150200250300350400450 0 0.5 1 (c) Eamonn Keogh, eamonn@cs.ucr.edu

16 Rare Event Detection 2/25/13 - Union University 15

17 2/25/13 - Union University 16

18 2/25/13 - Union University 17 Dallas Morning News October 7, 2005

19 © Prentice Hall 18 Classification Performance True Positive True NegativeFalse Positive False Negative

20 Behavior Based Classification/Prediction nCredit Card Fraud Detection nCredit Score nHome Mortgage Approval 2/25/13 - Union University 19

21 CLUSTERING nPartition data into previously undefined groups. 2/25/13 - Union University 20

22 2/25/13 - Union University 21 http://149.170.199.144/multivar/ca.htm

23 2/25/13 - Union University 22 What is Similarity? (c) Eamonn Keogh, eamonn@cs.ucr.edu

24 Two Types of Clustering 2/25/13 - Union University 23 Hierarchical Partitional (c) Eamonn Keogh, eamonn@cs.ucr.edu

25 Hierarchical Clustering Example Iris Data Set 2/25/13 - Union University 24 Setosa Versicolor Virginica The data originally appeared in Fisher, R. A. (1936). "The Use of Multiple Measurements in Axonomic Problems," Annals of Eugenics 7, 179-188. Hierarchical Clustering Explorer Version 3.0, Human-Computer Interaction Lab, University of Maryland, http://www.cs.umd.edu/hcil/multi-cluster. http://www.cs.umd.edu/hcil/multi-cluster

26 ASSOCIATION RULES/ LINK ANALYSIS nFind relationships between data 2/25/13 - Union University 25

27 ASSOCIATION RULES EXAMPLES nPeople who buy diapers also buy beer nIf gene A is highly expressed in this disease then gene A is also expressed nRelationships between people nBook Stores nDepartment Stores nAdvertising nProduct Placement nhttp://www.amazon.com/Data-Mining-Introductory-Advanced- Topics/dp/0130888923/ref=sr_1_1?ie=UTF8&s=books&qid=123 5564485&sr=1-1http://www.amazon.com/Data-Mining-Introductory-Advanced- Topics/dp/0130888923/ref=sr_1_1?ie=UTF8&s=books&qid=123 5564485&sr=1-1 2/25/13 - Union University 26

28 2/25/13 - Union University 27 Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Prentice Hall, 2003. DILBERT reprinted by permission of United Feature Syndicate, Inc.

29 Data Mining Outline nIntroduction nTechniques nExamples n Vision Mining n Law Enforcement (Cheating, Plagiarism, Fraud, Criminal Behavior,…) n Bioinformatics 2/25/13 - Union University 28

30 Vision Mining nLicense Plate Recognition n Red Light Cameras n Toll Booths n http://www.licenseplaterecognition.com/ http://www.licenseplaterecognition.com/ nComputer Vision n http://www.eecs.berkeley.edu/Research/Proj ects/CS/vision/shape/vid/ http://www.eecs.berkeley.edu/Research/Proj ects/CS/vision/shape/vid/ 2/25/13 - Union University 29

31 2/25/13 - Union University 30 Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.

32 No/Little Cheating 2/25/13 - Union University 31 Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.

33 Rampant Cheating 2/25/13 - Union University 32 Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.

34 2/25/13 - Union University 33 Jialun Qin, Jennifer J. Xu, Daning Hu, Marc Sageman and Hsinchun Chen, “Analyzing Terrorist Networks: A Case Study of the Global Salafi Jihad Network” Lecture Notes in Computer Science, Publisher: Springer-Verlag GmbH, Volume 3495 / 2005, p. 287.

35 Arnet Miner nhttp://arnetminer.org/http://arnetminer.org/ 2/25/13 - Union University 34

36 DNA nBasic building blocks of organisms nLocated in nucleus of cells nComposed of 4 nucleotides nTwo strands bound together 2/25/13 - Union University 35 http://www.visionlearning.com/library/module_viewer.php?mi d=63

37 Central Dogma: DNA -> RNA -> Protein 2/25/13 - Union University 36 Protein RNA DNA transcription translation CCTGAGCCAACTATTGATGAA Amino Acid CCUGAGCCAACUAUUGAUGAA www.bioalgorithms.infowww.bioalgorithms.info; chapter 6; Gene Prediction

38 Human Genome nScientists originally thought there would be about 100,000 genes nAppear to be about 20,000 nWHY? nAlmost identical to that of Chimps. What makes the difference? nAnswers appear to lie in the noncoding regions of the DNA (formerly thought to be junk) 2/25/13 - Union University 37

39 RNAi – Nobel Prize in Medicine 2006 2/25/13 - Union University 38 Double stranded RNA Short Interfering RNA (~20-25 nt) RNA-Induced Silencing Complex Binds to mRNA Cuts RNA siRNA may be artificially added to cell! Image source: http://nobelprize.org/nobel_prizes/medicine/laureates/2006/adv.html, Advanced Information, Image 3 http://nobelprize.org/nobel_prizes/medicine/laureates/2006/adv.html

40 miRNA nShort (20-25nt) sequence of noncoding RNA nKnown since 1993 but significance not widely appreciated until 2001 nImpact / Prevent translation of mRNA nGenerally reduce protein levels without impacting mRNA levels (animal cells) nFunctions n Causes some cancers n Guide embryo development n Regulate cell Differentiation n Associated with HIV n … 2/25/13 - Union University 39

41 TCGR – Mature miRNA (Window=5; Pattern=3) 2/25/13 - Union University 40 All Mature Mus Musculus Homo Sapiens C Elegans ACG CGCGCGUCG

42 TCGRs for Xue Training Data 2/25/13 - Union University 41 P O S I T I VE NE GA T I VE C. Xue, F. Li, T. He, G. Liu, Y. Li, nad X. Zhang, “Classification of Real and Pseudo MicroRNA Precursors using Local Structure- Sequence Features and Support Vector Machine,” BMC Bioinformatics, vol 6, no 310.

43 2/25/13 - Union University 42 Affymetrix GeneChip ® Array http://www.affymetrix.com/corporate/outreach/lesson_plan/educator_resources.affx

44 BIG BROTHER ? nTotal Information Awareness n http://en.wikipedia.org/wiki/Information_Awareness_Offi ce http://en.wikipedia.org/wiki/Information_Awareness_Offi ce nTerror Watch List n http://www.businessweek.com/technology/content/may2 005/tc20050511_8047_tc_210.htm http://www.businessweek.com/technology/content/may2 005/tc20050511_8047_tc_210.htm n http://www.theregister.co.uk/2004/08/19/senator_on_te rror_watch/ http://www.theregister.co.uk/2004/08/19/senator_on_te rror_watch/ n http://blog.wired.com/27bstroke6/2008/02/us-terror- watch.html http://blog.wired.com/27bstroke6/2008/02/us-terror- watch.html nCAPPS n http://en.wikipedia.org/wiki/CAPPS http://en.wikipedia.org/wiki/CAPPS 2/25/13 - Union University 43

45 2/25/13 - Union University 44 http://ieeexplore.ieee.org/iel5/6/32236/01502526.pdf?tp=&arnumber=1502526&isnumber=32236

46 2/25/13 - Union University 45

47 My DM Toolbelt nC, C++ nPerl, Ruby nWeka nR, SAS nExcel, XLMiner nVi, word, … nGrep, sed, … 2/25/13 - Union University 46

48 2/25/13 - Union University 47


Download ppt "2/25/13 - Union University 1 ADVENTURES IN DATA MINING Margaret H. Dunham Southern Methodist University Dallas, Texas 75275 This material."

Similar presentations


Ads by Google