Presentation on theme: "Www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006."— Presentation transcript:
www.monash.edu.au Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006
www.monash.edu.au 2 Outline Mining Different Data Types –Spatial, Temporal, Time Series, Data Streams, Multimedia, XML, Web, Text etc. Distributed Data Mining (DDM) Mobile & Ubiquitous Data Mining (UDM) Data Mining E-Services Anytime, Anywhere Data Mining E-Services
www.monash.edu.au 3 Generations of Data Mining Four Generations of Data Mining Systems – Robert Grossman First Generation – Stand Alone, Centralised, Single Algorithm Second Generation – Integration with databases, support for high- dimensionality, complex data types Third Generation –Distribution and Heterogeniety Fourth Generation – Support for mining embedded, mobile and ubiquitous data sources
www.monash.edu.au 5 Distributed Data Mining Inherently distributed data MNC + Global Markets => Physical/geographical separation of users from the data sources Traditional data mining model involving the co-location of users, data and computational resources is inadequate
www.monash.edu.au 6 Distributed Data Mining (DDM) The inherent distribution of data and other resources as a result of organisations being distributed. The large volumes of data, the transfer of which results in exorbitant communication costs. The need to mine heterogeneous data, the integration of which is both non-trivial and expensive. The performance and scalability bottle necks of data mining.
www.monash.edu.au 7 Distributed Data Mining (DDM) DDM = Data Mining (DM) + Knowledge Integration (KI) DM - Performing traditional knowledge discovery at each distributed data site. KI - Merging the results generated from the individual sites into a body of cohesive and unified knowledge.
www.monash.edu.au 8 Parallel Data Mining (PDM) Principal distinction between DDM & Parallel DM –parallel mining involves parallel processors with or without shared memory Parallel data mining also includes development of parallel versions of traditional data mining techniques. Can be integration – DecisionCentre
www.monash.edu.au 9 DDM – Algorithms & Architectures Research in distributed data mining can be divided into two broad categories [Fu01]: Data Mining Algorithms. –focus on efficient techniques for knowledge integration. Distributed Data Mining Architectures. –focus on development of distributed data mining architectures –emphasizes the processes and technologies that support construction of software systems to perform distributed data mining
www.monash.edu.au 10 Taxonomy of DDM Architectures
www.monash.edu.au 11 Classification – DDM Systems DDM Architectural ModelsDDM Systems Client-serverDecisionCentre [CDG99], IntelliMiner [PaS99, PaS01], InterAct [PaD02] Agents Mobile Agent Stationary Agent JAM [SPT97], Infosleuth [UMG98, MUU99], BODHI [KPH99], Papyrus [Ram98], PADMA [KHS97a, KHS97b]
www.monash.edu.au 16 Ubiquitous Data Mining (UDM) Mining data in a resource-constrained environment to support the time critical information needs of mobile users Typical Characteristics –Mobile User – frequent disconnections –Handheld Device - >Resource constraints – memory, battery, processor, screen real-estate –Time critical –Real-time & On-line –Data Streams Example Scenarios Many Challenges
www.monash.edu.au 17 Current Research Kargupta’s Group –MobiMine @CSSE, Monash Univ. –AgentUDM –Adapative, Cost-efficient & Light-weight data mining techniques for data streams >Mohamed Medhat >LWC, LWF & LWClass >Watch this space!!!
www.monash.edu.au 19 Data Mining E-Services “…data analysis and mining functions themselves will be offered as business intelligence e-services that accept operational data from clients and return models or rules” Umesh Dayal, 2001 Why? – Knowledge is a key resource – Cost of data mining infrastructure
www.monash.edu.au 20 Data Mining E-Services Current Commercial Landscape –Several ASPs -> DigiMine, Information Discovery, WhiteCross Systems, ListAnalyst.com etc. etc. –Mode of Operation Hybrid Model & Data Mining ASPs –Optimise Response Time >Leads to improved throughput –QoS Estimation –Location Preferences of Clients
www.monash.edu.au 21 Data Mining E-Services Current Commercial Landscape –Several ASPs -> DigiMine, Information Discovery, WhiteCross Systems, ListAnalyst.com etc. etc. –Mode of Operation Hybrid Model & Data Mining ASPs –Optimise Response Time >Leads to improved throughput –QoS Estimation –Location Preferences of Clients
www.monash.edu.au Anytime, Anywhere Data Mining E-Services
www.monash.edu.au 23 My Thoughts Data is a commodity, Analysis is a service Access anytime, anywhere By anyone… –From large corporations to small business to individuals From home buyers to mobile salespersons to grocery shoppers…
www.monash.edu.au 24 My Thoughts A preliminary model for delivery –Datacentric Grids