Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Published byModified over 5 years ago
Presentation on theme: "Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University."— Presentation transcript:
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University
Prof. George Kollios Office: MCS 288 Office Hours: Monday 2:00pm-3:30pm Thursday 11:00am-12:30pm Mailing List: cs591g1
History of Database Technology 1960s : Data collection, database creation, IMS and network DBMS 1970s : Relational data model, relational DBMS implementation 1980s : RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s—2000s : Data mining and data warehousing, multimedia databases, and Web databases
Structure of a RDBMS A DBMS is an OS for data! A typical RDBMS has a layered architecture. This is one of several possible architectures; each system has its own variations. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Modern Database Systems Extend these layers
Index Methods for RDBMS Hashing Methods: Linear Hashing, extendible hashing B-tree family: B+-trees and variations Both of them are one-dimensional
Overview of the course Spatial Database Systems GIS, CAD/CAM : EOSDIS project NASA Manages points, lines and regions Temporal Database Systems Billing, medical records Spatio-temporal Databases Moving objects, changing regions, etc
Overview of the course Multimedia and medical databases A multimedia system can store and retrieve objects/documents with text, voice, images, video clips, etc Time series databases Stock market, ECG, trajectories, etc
Multimedia databases Applications: Digital libraries, entertainment, office automation Medical imaging: digitized X-rays and MRI images (2 and 3-dimensional) Query by content: (or QBE) Efficient ‘Complete’ (no false dismissals)
What is Data Mining? Data mining (knowledge discovery in databases): The efficient discovery of : previously unknown, valid, potentially useful and understandable information or patterns from data in large databases Alternative names: Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, etc.
DM Applications Database analysis and decision support Market analysis: target marketing, market basket analysis, market segmentation Fraud detection and management Biology and medicine Text mining (news group, email, documents) and Web analysis.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Other Disciplines Information Science Machine Learning Visualization
Overview of terms Data: a set of facts (items) D, stored in a database Pattern: an expression E in a language L, that describes a subset of facts Attribute: a field in an item i in D. Interestingness: a function I D,L that maps an expression to a measure space M
The Data Mining Task For a given dataset D, language of facts L, interestingness function I D,L and threshold c, find the expression E that: I D,L (E) > c efficiently.
How Data Mining is used Identify the problem Use data mining techniques to transform the data into information Act on the information Measure the results
DM Functionalities Concept description: Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions Association ( correlation and causality): Multi-dimensional vs. single-dimensional association age(X, “20..29”) ^ income(X, “20..29K”) buys(X, “PC”) [support = 2%, confidence = 60%] contains(T, “computer”) contains(x, “software”) [1%, 75%]
DM Functionalities Cluster analysis Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity
DM Functionalities Classification and Prediction Finding models (functions) that describe and distinguish classes or concepts for future prediction E.g., classify countries based on climate, or classify cars based on gas mileage Presentation: decision-tree, classification rule, neural network Prediction: Predict some unknown or missing numerical values