AI Methods in Data Warehousing A System Architectural View Walter Kriha.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Web Mining.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Service Oriented Architecture for Mobile Applications Swarupsingh Baran University of North Carolina Charlotte.
1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
1. Abstract 2 Introduction Related Work Conclusion References.
Information and Business Work
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Xyleme A Dynamic Warehouse for XML Data of the Web.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
WebSphere -DB2 Integration Web Browser Web Server (Apache) WebSphere –JSP/Servlet/EJB DB2 JDBC, SQL HTTP.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Knowledge Portals and Knowledge Management Tools
Overview of Web Data Mining and Applications Part I
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Enterprise systems infrastructure and architecture DT211 4
Operational Data Tools Chapter Eight. Copyright © Houghton Mifflin Company. All rights reserved.8–28–2 Chapter Eight Learning Objectives To learn database.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Classroom User Training June 29, 2005 Presented by:
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
Copyright © 2009 Pearson Education, Inc. Slide 6-1 Chapter 6 E-commerce Marketing Concepts.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
Managing Knowledge in Business Intelligence Systems Dr. Jan Mrazek.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Data Mining By Dave Maung.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Ontology-Centered Personalized Presentation of Knowledge Extracted from the Web Ralitsa Angelova.
Organizing Data and Information
Chapter 4 Decision Support System & Artificial Intelligence.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Data Mining and ERP Presented by: Abhineet Malviya Ankesh Jindal Mayur Shinde.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining Copyright KEYSOFT Solutions.
Chapter 2 Data, Text, and Web Mining. Data Mining Concepts and Applications  Data mining (DM) A process that uses statistical, mathematical, artificial.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Business Intelligence Overview. What is Business Intelligence? Business Intelligence is the processes, technologies, and tools that help us change data.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Data mining in web applications
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
MIS2502: Data Analytics Advanced Analytics - Introduction
ece 627 intelligent web: ontology and beyond
Supporting End-User Access
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Web Mining Research: A Survey
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Presentation transcript:

AI Methods in Data Warehousing A System Architectural View Walter Kriha

Business Driver: Customer Relationship Management (CRM) learn more about your Customer Provide personalized offerings (cheaper, targeted) Make better use of in-house information (e.g. financial research) Somehow use all the data collected The web is accelerating the problems (terabytes of clickstream data) and provides new solutions: Web- mining, the Web-House)

CRM: Simulate Advisor Functions Know interests and hobbies Know personal situation Know situation in life Know plans and hopes Client oriented:Bank oriented: Know where to find information and what applications to use Know how to translate, summarize and prepare for customer Know who to ask if in trouble Plus: new ideas from automatic knowledge discovery etc. that even a real advisor can’t do!

Overview Requirements coming from a dynamic, personalized Portal Page Data Collection and DW Import AI Methods used to solve requirements How to flow the results back into the portal

A Portal: A self-adapting System Collect information for and about customers Learn from it Adapt to the individual customer by using the “lessons learned” The problem: a portal does not have the time to learn. This needs to happen off-line in a warehouse!

DW Integration: Sources Closed Loop SAP IBM PeopleSoft Data Integration Platform Data Marts Data Warehouse Web Servers Application Servers Web Logs Transaction Server SupplierExtranetContentServer Ad Server RDBMS Demographics/ External Sources e-Business Analytics

DW Integration: Structure Log Framewk Operational DB Ware house Web stats Mining tools Navigation, Transactions, Messages Personalized information and offerings Rule Engine External data And Applications Integ ration Off-line On-line

What information do we have? The pages the customer selected (order, topics etc.) Customer interests from homepage self-configuration Customer transactions Customer messages (forum, advisor) Internal financial information The data collection and import process needs to preserve the links between different information channels (e.g. order of customer activity)

Welcome Mrs. Rich, We would like to point you to our New Instrument X that fits nicely To your current investment strategy. News: IBM invests in company Y Research: asian equity update Charts: Sony Quotes: UBS 500, ARBA 200 Links: myweather.com, UBS glossary etc. Common: customize, filter, contact etc. Messages: 3 new From foo: hi Mrs. Rich Portfolio: Siemens, Swisskom, Esso, Common: Banner Forum: art banking, 12 new E-Banking: balance = Interest in our services (homepage config) forum activity transactions Interest in shares etc. Message activity Special interest (filters selected)

What do we want to know? Does a customer know how to work the system (site usability)? Does a customer voice dissatisfaction with company (customer retention) If new financial information enters the system – which customers might be interested in it (content extraction, customer notification)? Which AI techniques might answer those questions?

What do we want to provide? A personalized homepage that adapts itself to the customers interests (from self-customization to automatic integration) An early warning system for disgruntled customers or customers that have difficulties working the site An ontology for financial information An integrated view of the company and its services and information (“electronic advisor”) See: “Finance with a personal touch”, Communications of the ACM Aug.2000/Vol.43 No.8

Welcome Mrs. Rich, We would like to point you to our New Instrument X that fits nicely To your current investment strategy. News: IBM invests in company X, X now listed on NASDAQ Research: X future prospects asian equity update Charts: X Quotes: UBS 500, X 100 Links: X homepage myweather.com,. Common: customize, filter, contact etc. Messages: 3 new From advisor: about X inv. Portfolio: Siemens, add X? Common: Banner about X Dynamic, personalized and INTEGRATED homepage Forum: X is discussed here Connect communities and site content Personal “touch”

Data Mining The automatic extraction of hidden predictive information from large databases An AI-technique: automated knowledge discovery, prediction and forensic analysis through machine learning Web Mining Adds text-mining, ontologies and things like xml to the above

Data Mining Methods Data mining Equational Data Distilled Data retained Decision Trees Cross Tab Belief Nets K-nearest n.CBR. Rules Logical Agents Neural NetsStatistics Non-numeric data Non-symbolic results Induct.GACART etc. Kohonen etc. Smooth surfaces Ext.training

Data Preparation Catch complete session data for a specific user Store meta-information from content with behavioral data Create different data structures for different analytics (e.g. Polygenesis) Use a special log framework! Make sure there are meta-data for the content available (e.g. dynamically generated page content)

Data Analysis Cluster Analysis Classification Pattern detection Association rules Content Mining (e.g Segmentation of Topics) Usage Mining (e.g. Segmentation of Customers) Problem: How to express similarity and distance Linguistic analysis, statistics (k-nearest-neighbours) Machine learning (Neuronal nets, decision trees) Problem: How to create a user profile e.g from navigation data collaborative filtering: derive content similarities from behavioral similarities

(Combined content and behavioral analysis) Use statistical cluster mining to extract page-views that co- occur during sessions (visit coherence assumption) Use a concept learning algorithm that matches the clusters (of page-views) with the meta-information of the pages to extract common attributes Those common attributes form a “concept” Example: Find Session Topics automatically

Learning Concepts Session flow User A User B Meta-Information Conceptual Learning Algorithm Concept User Profile

The Text-Warehouse: Information Extraction Serving personalized information requires fine-grained extraction of interesting facts from text bodies in various formats User profile With interests Financial Research Documents (pdf, html, doc,xml) Facts not Stories! Autom. Database IE Tool

Methods for Information Extraction Analyze Syntax to derive Semantics Context changes break algorithm Use contextual features to infer semantics (e.g. html tags) Very brittle in case of source changes Natural Language Processing Wrapper Induction Both methods use extraction patterns that were acquired through machine learning based on training documents.

More textual methods Thematic Index: Generate the reference taxonomy from training documents (linguistic and statistic analysis) Clustering: group similar documents with respect to a feature vector and similarity measure (SOM and other clustering technologies)

Automatic Text Classification Rule based: Experts formulate rules and vertical vocabularies (Verity, Intelligent Classifier) Example-Based: A machine learning approach based on training documents and iterative improvement (e.g Autonomy, using Bayesian Networks) Fully automated text classification is not feasible today. Cyborg classification needed. More tagged data needed. Case: Building a directory for an enterprise portal

The Meta-data/Ontology Problem “The key limiting factor at present is the difficulty of building and maintaining ontologies for web use” J.Hendler, Is there an Intelligent Agent in your future? This is also true for all kinds of information integration e.g. financial research

The Solution: Semantic Web? XML Syntax Logic, Rules etc. Ontologies/Vocabularies XML Schemas/RDF Humans define meta-data and use them Software build, extracts new Ontologies (e.g. Ontobroker) Agents and tools use meta-data to construct new information

AI on Topic Maps? Occurrences Topics Associations See: James D.Mason, Ferrets and Topic Maps, Knowledge Engineering for an Analytical Engine

Financial Research Integration XML Editor Dep. B Dep. A Warehouse Distribution Result DBs Meta- Data Topic Maps Wrapper Induction discovers facts Schema translation, semantic consistency checks e.g. recommendations Internal Information Model users

Deployment Operational DB (Profiles, Meta- Data) Ware house Mining tools Personalized information and offerings Rule Engine Off-line On-line Rules

The Main Problems for the “Web-house” Portal architecture must be designed to collect the proper information and to use the results from the web-house easily Portal content is at the same time customer offer as well as customer measuring tool Few people understand both the portal system aspect and the warehouse analytical aspect.

Resources Katherine C.Adams, Extracting Knowledge ( e/010507/feat.shmtl) e/010507/feat.shmtl Dan Sullyvan, Beyond The Numbers ( /feat2.shtml) /feat2.shtml Communications of the ACM, August 2000/Vol.43 Nr. 8 Information Discovery, A Characterization of Data Mining Technologies and Process ( tech.htm) tech.htm Dan R.Greening, Data Mining on the Web ( ves/2000/01/greening.html)

Data Mining Tools (examples) IBM Intelligent Miner SPSS, Clementine SAS Netica (Belief Nets)