2/25/13 - Union University 1 ADVENTURES IN DATA MINING Margaret H. Dunham Southern Methodist University Dallas, Texas 75275 This material.

Slides:



Advertisements
Similar presentations
Gene Regulation in Eukaryotic Cells. Gene regulation is complex Regulation, and therefore, expression of a gene is complex. Regulation of these genes.
Advertisements

Central Dogma Big Idea 3: Living systems store, retrieve, transmit, and respond to info essential to life processes.
JEOPARDY #2 DNA and RNA Chapter 12 S2C06 Jeopardy Review
1 Review What genes control cell differentiation during development Compare and Contrast How is the way Hox genes are expressed in mice similar and different.
Lesson Overview Lesson Overview Gene Regulation and Expression Lesson Overview 13.4 Gene Regulation and Expression.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Computational biology seminar
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Chapter 15 Noncoding RNAs. You Must Know The role of noncoding RNAs in control of cellular functions.
7/10/07 - SEDE'07 1 DATA MINING APPLICATIONS Margaret H. Dunham Southern Methodist University Dallas, Texas This material is based.
RNA.
More regulating gene expression. Fig 16.1 Gene Expression is controlled at all of these steps: DNA packaging Transcription RNA processing and transport.
FROM GENE TO PROTEIN: TRANSCRIPTION & RNA PROCESSING Chapter 17.
DATA MINING Part I IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275,
How Genes Work. Transcription The information contained in DNA is stored in blocks called genes  the genes code for proteins  the proteins determine.
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
11/05/05SMU Homecoming1 DATA MINING AND TERRORISM Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275
Chapter 19: Eukaryotic Genomes Most gene expression regulated through transcription/chromatin structure Most gene expression regulated through transcription/chromatin.
Control of Gene Expression Eukaryotes. Eukaryotic Gene Expression Some genes are expressed in all cells all the time. These so-called housekeeping genes.
Transfection. What is transfection? Broadly defined, transfection is the process of artificially introducing nucleic acids (DNA or RNA) into cells, utilizing.
Introns and Exons DNA is interrupted by short sequences that are not in the final mRNA Called introns Exons = RNA kept in the final sequence.
8/29/061 Temporal Chaos Game Representation (TCGR) for DNA/RNA Sequence Visualization Margaret H. Dunham, Donya Quick, Yuhang Wang, Monnie McGee, Jim Waddle,
Data Type 1: Microarrays
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Chapter 1 Introduction to Data Mining
Regulating Gene Expression from RNA to Protein. Fig 16.1 Gene Expression is controlled at all of these steps: DNA packaging Transcription RNA processing.
Finish up array applications Move on to proteomics Protein microarrays.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
More regulating gene expression. Combinations of 3 nucleotides code for each 1 amino acid in a protein. We looked at the mechanisms of gene expression,
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
SiRNA and Epigenetic Asma Siddique Saloom Aslam Syeda Zainab Ali.
What is RNA interference?
Chapter 25 The RNA World. microRNA Previously thought to be “junk” DNA – Now determined to “code” for other RNA ENCODE project Andrew Fire and Craig Mello.
Eukaryotic Gene Regulation
Lecture #3 Transcription Unit 4: Molecular Genetics.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Gene Regulations and Mutations
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Image Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Control of Gene Expression Chapter Proteins interacting w/ DNA turn Prokaryotic genes on or off in response to environmental changes  Gene Regulation:
Gene Regulation and Expression. Learning Objectives  Describe gene regulation in prokaryotes.  Explain how most eukaryotic genes are regulated.  Relate.
CSE 8331 Spring CSE 8331 Spring 2010 Image Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
EB3233 Bioinformatics Introduction to Bioinformatics.
3/14/08, UMKC1 TCGR: A Novel DNA/RNA Visualization Technique Margaret H. Dunham Donya Quick Southern Methodist University Margaret H. Dunham and Donya.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Computational prediction of miRNA and miRNA-disease relationship
AP Biology Eukaryotic Genome Control Mechanisms for Gene expression.
Lecture 8 Ch.7 (II) Eukaryotic Gene Regulation. Control of Gene Expression in Eukaryotes: an overview.
Homework #2 is due 10/17 Bonus #1 is due 10/24 Office hours: M 10/ :30am 2-5pm in Bio 6.
1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA organized in eukaryotic cells?
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
The Transcriptional Landscape of the Mammalian Genome
RNAi Overview
Gene Regulation and Expression
Introduction Machine Learning 14/02/2017.
A-LEVEL BIOLOGY RNA interference (RNAi)
GENE REGULATION
Chapter 18 Gene Expression.
Pharmacogenetics and Pharmacoepidemiology
Review Warm-Up What is the Central Dogma?
mRNA Degradation and Translation Control
Genetics: From Genes to Genomes
Pharmacogenetics and Pharmacoepidemiology
The Study of Biological Information
Noncoding RNA roles in Gene Expression
13.4 Gene regulation 5/16/19 TB page
Presentation transcript:

2/25/13 - Union University 1 ADVENTURES IN DATA MINING Margaret H. Dunham Southern Methodist University Dallas, Texas This material is based in part upon work supported by the National Science Foundation under Grant No and NIH Grant No.1R21HG A1 Some slides used by permission from Dr Eamonn Keogh; Some slides used by permission from Dr Eamonn Keogh; University of California Riverside; ACM Distinguished Speakers Program

2/25/13 - Union University 2 The 2000 ozone hole over the antarctic seen by EPTOMS

Data Mining Outline nIntroduction nTechniques n Classification n Clustering n Association Rules nExamples 2/25/13 - Union University 3 Explore some interesting data mining applications

Introduction nData is growing at a phenomenal rate nUsers expect more sophisticated information nHow? 2/25/13 - Union University 4 UNCOVER HIDDEN INFORMATION DATA MINING

But it isn’t Magic nYou must know what you are looking for nYou must know how to look for you 2/25/13 - Union University 5 Suppose you knew that a specific cave had gold: What would you look for? How would you look for it? Might need an expert miner

CLASSIFICATION nAssign data into predefined groups or classes. 2/25/13 - Union University 6

“If it looks like a duck, walks like a duck, and quacks like a duck, then it’s a duck.” 2/25/13 - Union University 7 Description BehaviorAssociations Classification Clustering Link Analysis (Profiling) (Similarity) “If it looks like a terrorist, walks like a terrorist, and quacks like a terrorist, then it’s a terrorist.”

Classification Ex: Grading 2/25/13 - Union University 8 >=90<90 x >=80<80 x >=70<70 x F B A >=60<50 x C D

2/25/13 - Union University 9 Grasshoppers Katydids Given a collection of annotated data. (in this case 5 instances of Katydids and five of Grasshoppers), decide what type of insect the unlabeled example is. (c) Eamonn Keogh,

2/25/13 - Union University 10 Insect ID AbdomenLengthAntennaeLength Insect Class Grasshopper Katydid Grasshopper Grasshopper Katydid Grasshopper Katydid Grasshopper Katydid Katydid ??????? ??????? The classification problem can now be expressed as: Given a training database predict the class label of a previously unseen instance Given a training database predict the class label of a previously unseen instance previously unseen instance = (c) Eamonn Keogh,

2/25/13 - Union University 11 Antenna Length Grasshoppers Katydids Abdomen Length (c) Eamonn Keogh,

2/25/13 - Union University 12 How Stuff Works, “Facial Recognition,” fworks.com/facial- recognition1.htm fworks.com/facial- recognition1.htm

2/25/13 - Union University 13 Facial Recognition (c) Eamonn Keogh,

2/25/13 - Union University 14 Handwriting Recognition George Washington Manuscript (c) Eamonn Keogh,

Rare Event Detection 2/25/13 - Union University 15

2/25/13 - Union University 16

2/25/13 - Union University 17 Dallas Morning News October 7, 2005

© Prentice Hall 18 Classification Performance True Positive True NegativeFalse Positive False Negative

Behavior Based Classification/Prediction nCredit Card Fraud Detection nCredit Score nHome Mortgage Approval 2/25/13 - Union University 19

CLUSTERING nPartition data into previously undefined groups. 2/25/13 - Union University 20

2/25/13 - Union University 21

2/25/13 - Union University 22 What is Similarity? (c) Eamonn Keogh,

Two Types of Clustering 2/25/13 - Union University 23 Hierarchical Partitional (c) Eamonn Keogh,

Hierarchical Clustering Example Iris Data Set 2/25/13 - Union University 24 Setosa Versicolor Virginica The data originally appeared in Fisher, R. A. (1936). "The Use of Multiple Measurements in Axonomic Problems," Annals of Eugenics 7, Hierarchical Clustering Explorer Version 3.0, Human-Computer Interaction Lab, University of Maryland,

ASSOCIATION RULES/ LINK ANALYSIS nFind relationships between data 2/25/13 - Union University 25

ASSOCIATION RULES EXAMPLES nPeople who buy diapers also buy beer nIf gene A is highly expressed in this disease then gene A is also expressed nRelationships between people nBook Stores nDepartment Stores nAdvertising nProduct Placement nhttp:// Topics/dp/ /ref=sr_1_1?ie=UTF8&s=books&qid= &sr=1-1http:// Topics/dp/ /ref=sr_1_1?ie=UTF8&s=books&qid= &sr=1-1 2/25/13 - Union University 26

2/25/13 - Union University 27 Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Prentice Hall, DILBERT reprinted by permission of United Feature Syndicate, Inc.

Data Mining Outline nIntroduction nTechniques nExamples n Vision Mining n Law Enforcement (Cheating, Plagiarism, Fraud, Criminal Behavior,…) n Bioinformatics 2/25/13 - Union University 28

Vision Mining nLicense Plate Recognition n Red Light Cameras n Toll Booths n nComputer Vision n ects/CS/vision/shape/vid/ ects/CS/vision/shape/vid/ 2/25/13 - Union University 29

2/25/13 - Union University 30 Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.

No/Little Cheating 2/25/13 - Union University 31 Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.

Rampant Cheating 2/25/13 - Union University 32 Joshua Benton and Holly K. Hacker, “At Charters, Cheating’s off the Charts:, Dallas Morning News, June 4, 2007.

2/25/13 - Union University 33 Jialun Qin, Jennifer J. Xu, Daning Hu, Marc Sageman and Hsinchun Chen, “Analyzing Terrorist Networks: A Case Study of the Global Salafi Jihad Network” Lecture Notes in Computer Science, Publisher: Springer-Verlag GmbH, Volume 3495 / 2005, p. 287.

Arnet Miner nhttp://arnetminer.org/ 2/25/13 - Union University 34

DNA nBasic building blocks of organisms nLocated in nucleus of cells nComposed of 4 nucleotides nTwo strands bound together 2/25/13 - Union University 35 d=63

Central Dogma: DNA -> RNA -> Protein 2/25/13 - Union University 36 Protein RNA DNA transcription translation CCTGAGCCAACTATTGATGAA Amino Acid CCUGAGCCAACUAUUGAUGAA chapter 6; Gene Prediction

Human Genome nScientists originally thought there would be about 100,000 genes nAppear to be about 20,000 nWHY? nAlmost identical to that of Chimps. What makes the difference? nAnswers appear to lie in the noncoding regions of the DNA (formerly thought to be junk) 2/25/13 - Union University 37

RNAi – Nobel Prize in Medicine /25/13 - Union University 38 Double stranded RNA Short Interfering RNA (~20-25 nt) RNA-Induced Silencing Complex Binds to mRNA Cuts RNA siRNA may be artificially added to cell! Image source: Advanced Information, Image 3

miRNA nShort (20-25nt) sequence of noncoding RNA nKnown since 1993 but significance not widely appreciated until 2001 nImpact / Prevent translation of mRNA nGenerally reduce protein levels without impacting mRNA levels (animal cells) nFunctions n Causes some cancers n Guide embryo development n Regulate cell Differentiation n Associated with HIV n … 2/25/13 - Union University 39

TCGR – Mature miRNA (Window=5; Pattern=3) 2/25/13 - Union University 40 All Mature Mus Musculus Homo Sapiens C Elegans ACG CGCGCGUCG

TCGRs for Xue Training Data 2/25/13 - Union University 41 P O S I T I VE NE GA T I VE C. Xue, F. Li, T. He, G. Liu, Y. Li, nad X. Zhang, “Classification of Real and Pseudo MicroRNA Precursors using Local Structure- Sequence Features and Support Vector Machine,” BMC Bioinformatics, vol 6, no 310.

2/25/13 - Union University 42 Affymetrix GeneChip ® Array

BIG BROTHER ? nTotal Information Awareness n ce ce nTerror Watch List n 005/tc _8047_tc_210.htm 005/tc _8047_tc_210.htm n rror_watch/ rror_watch/ n watch.html watch.html nCAPPS n 2/25/13 - Union University 43

2/25/13 - Union University 44

2/25/13 - Union University 45

My DM Toolbelt nC, C++ nPerl, Ruby nWeka nR, SAS nExcel, XLMiner nVi, word, … nGrep, sed, … 2/25/13 - Union University 46

2/25/13 - Union University 47