We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byRonan Tidball
Modified over 2 years ago
© 2002 Megaputer intelligence, Inc. Mining data with PolyAnalyst Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc. Outline Data Mining in BI chain PolyAnalyst overview Learning algorithms Additional features Future developments
© 2002 Megaputer intelligence, Inc. Data Mining in BI chain Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc. Consider a fragment of the BI chain: DM in Decision Making Data Data - is what we can capture and store Knowledge Knowledge - is what provides for informed decisions Problem: How to get from Data to Knowledge? Solution: Data Mining (Machine Learning)DataKnowledgeDecisionAction
© 2002 Megaputer intelligence, Inc. Data Mining "Data Mining is the process of identifying valid, novel, potentially useful, and ultimately comprehensible knowledge from databases that is used to make crucial business decisions." -- G. Piatetsky-Shapiro, KDNuggets editor Valid Valid Novel Novel Actionable Actionable Comprehensible Comprehensible
© 2002 Megaputer intelligence, Inc. Data Mining vs. OLAP OLAP - But you have to guess the answer first ! OLAP - Helps prove or reject your hypotheses by dissecting data along different dimensions - But you have to guess the answer first ! Data Mining Data Mining - Automatically develops and tests numerous hypotheses by learning from historical data - Analyzes raw data
© 2002 Megaputer intelligence, Inc. Business Intelligence Chain Consider direct marketing automation Analyze data Integrate applications X
© 2002 Megaputer intelligence, Inc. Data Mining Tasks Predicting Classifying Clustering Segmenting Explaining Associating Visualizing Link Analysis Text Mining
© 2002 Megaputer intelligence, Inc. Fields of application
© 2002 Megaputer intelligence, Inc. What makes DM hard? Unfamiliar concept and lack of experience Results require interpretation by an analyst Poor integration in existing applications Difficulty processing very large databases Necessity to learn a new application High cost
© 2002 Megaputer intelligence, Inc. Megaputer response Challenge: Response: Challenge: Unfamiliar concept and lack of experience Response: Collaborative Appliance Program – combines Megaputer analysts expertise in data mining and customer knowledge of the business project Challenge: Response: Challenge: Results require interpretation by an analyst Response: Simple reporting and batch processing capabilities Challenge: Response: Challenge: Poor integration in existing applications Response: Easy scoring of external data with a few mouse clicks Challenge: Response: Challenge: Difficulty processing very large databases Response: In-Place Data Mining Challenge: Response: Challenge: Necessity to learn a new application Response: An SDK of easy-to-integrate PolyAnalyst COM components Challenge: Response: Challenge: High cost Response: Flexible licensing mechanism
© 2002 Megaputer intelligence, Inc. PolyAnalyst overview Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc. What is PolyAnalyst? Multi-strategy data mining suite The largest selection of ML algorithms for diverse business tasks Structured data and text processing tools Ease-of-use: friendly data manipulation and visualization Deep integration Applying models to external DB through the OLE DB protocol Exporting models to XML COM components Best Price/Performance ratio
© 2002 Megaputer intelligence, Inc. Key differentiators of PolyAnalyst Integrated analysis of structured (numeric and categorical) and unstructured (text) data Easy to learn and operate visual analytical interface The largest selection of powerful machine learning algorithms Mouse-driven application of predictive models to data in any external system through a standard OLEDB link Simple integration with external applications: SDK of COM components In-Place Data Mining capabilities for processing huge databases Step-by-step tutorials based on real-world case studies Rich data manipulation and visualization tools Reusable analytical scripts for batch process data mining The best Price/Performance ratio
© 2002 Megaputer intelligence, Inc. PolyAnalyst Boeing (USA) 3M (USA) Chase Manhattan Bank (USA) McKinsey & Company (USA) Siemens (Germany) Lockheed Martin (USA) Allstate Insurance (USA) ICICI Bank (India) Mars (USA) Taco Bell (USA) DuPont (USA) Asea Skandia (Sweden) France Telecom (France) Cambridge Technology Partners (USA) Carlson Marketing (USA) Central Bank (Russia) US Navy (USA) KPN Research (Netherlands) Alka Insurance (Denmark) National Cancer Institute (USA) 300+ installations Customer base: 300+ installations Sample customers
© 2002 Megaputer intelligence, Inc. PolyAnalyst workplace Project navigation tree Control buttons Data and Results pane Objects and Collections represented by icons Exploration engine report fragment PolyAnalyst log journal
© 2002 Megaputer intelligence, Inc. PolyAnalyst provides Access to data held in a database or data warehouse Numerical Categorical Yes/no Date Data manipulation and visualization 14 machine learning algorithms Convenient results reporting and outputing Integration with external applications
© 2002 Megaputer intelligence, Inc. PolyAnalyst machine learning algorithms Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc. Probably one the most impressive characteristic of PolyAnalyst is the sheer number of data mining tasks it can tackle Probably one the most impressive characteristic of PolyAnalyst is the sheer number of data mining tasks it can tackle. Mario Apicella Technology Analyst InfoWorld Test Center July 3, 2000
© 2002 Megaputer intelligence, Inc. Learning algorithms Find Laws Find Laws (SKAT algorithm) Cluster Cluster (Localization of anomalies) Find Dependencies Find Dependencies (n-dimensional distributions) Classify Classify (Fuzzy logic modeling) Decision Tree Decision Tree (Information Gain criterion) PolyNet Predictor PolyNet Predictor (GMDH-Neural Net hybrid) Market Basket Analysis Market Basket Analysis (Association rules) Memory Based Reasoning Memory Based Reasoning (k-NN + GA) Linear Regression Linear Regression (Stepwise and rule-enriched) Discriminate Discriminate (Unsupervised classification) Summary Statistics Summary Statistics (Data summarization) Link Analysis Link Analysis (Visual correlation analysis) Text Mining Text Mining (Semantic text analysis)
© 2002 Megaputer intelligence, Inc. Cluster (FC) Identifies clusters of similar records Selects best variables for clustering Suggests the number of clusters Separates clusters of records in new data sets for further investigation - preprocessing for other algorithms
© 2002 Megaputer intelligence, Inc. Cluster (continued) Groups of similar records
© 2002 Megaputer intelligence, Inc. Cluster (continued) Based on analyzing distributions in hypercubes of all variables rather than on measuring distances between points Hence, independent of rescaling of axes variable Finds only clusters actually present in data, on the background of uniformly distributed cases
© 2002 Megaputer intelligence, Inc. Classify (CL) Fuzzy-logic based classification The function of belonging modeled by either Find Laws, PolyNet Predictor, or LR Provides record scoring with Lift and Gain charts used for visualization Assigns records to one of two classes and furnishes utilized classification rule
© 2002 Megaputer intelligence, Inc. Classify (continued) Mass mailing Targeted mailing PolyAnalyst Lift chart illustrates an increase in the response to a campaign based on the discovered model - instead of random mailing % of maximal possible response Mass mailing Targeted mailing Profit ($) PolyAnalyst Gain chart helps optimize the profit obtained in a direct marketing campaign
© 2002 Megaputer intelligence, Inc. Decision Tree (DT) Intuitively classifies cases to selected categories Based on Information Gain splitting criteria The fastest algorithm in PolyAnalyst Scales linearly with increasing number of records
© 2002 Megaputer intelligence, Inc. Decision Tree (continued) Classification tree Node characteristics
© 2002 Megaputer intelligence, Inc. Decision Forest (DF) The most efficient classification algorithm for tasks with multiple target categories Transforms the task of categorizing data records to N classes into the problem of solving N tasks of categorizing records to two classes Develops the best collection of N classification trees, with leaves containing probabilities of classifying records in the corresponding classes Scales linearly with increasing number of records
© 2002 Megaputer intelligence, Inc. Link Analysis (LK) Reveals pairs of correlated objects Used in Fraud Detection, Text Analysis and other correlation analysis tasks
© 2002 Megaputer intelligence, Inc. Text Analysis (TA) Extracts key concepts from natural language notes Tags individual records with the main encountered concepts Recognizes synonyms and othe semantic relations Can perform user-focused or unsupervised analysis Integrates the analysis of text with the power of other machine learning algorithms of PolyAnalyst Facilitates categorization of textual documents
© 2002 Megaputer intelligence, Inc. Text Analysis (continued)
© 2002 Megaputer intelligence, Inc. Basket Analysis (BA) Is used in Retailing, Fraud Detection and Medicine Identifies in transactional data groups of products sold together well Finds directed association rules for each of these groups Groups baskets containing similar sets of products Characterized by Support Confidence Improvement Based on new mathematics: works 10 to 50 times faster than traditional algorithms
© 2002 Megaputer intelligence, Inc. Basket Analysis (continued) Groups of products sold together well Directed Association Rules
© 2002 Megaputer intelligence, Inc. Basket Analysis (continued) Works with both transactional and flat data format Easily finds many-to-one rules I would like to continue working together with Megaputer on other CTP customers projects (mainly Swedish and Danish Banks ). -- Olof Goransson Senior Data Consultant CTP Skandinavien AB
© 2002 Megaputer intelligence, Inc. Find Laws (FL) Models relationships hidden in data Presents discovered knowledge explicitly Searches the space of all possible hypotheses The unique Find Laws algorithm along with an easy to use interface made PolyAnalyst the only choice for our environment. -- James Farkas, Senior Navigation Engineer, The Boeing Company
© 2002 Megaputer intelligence, Inc. Find Laws (continued) FL is based on the Megaputers unique Symbolic Knowledge Acquisition Technology Symbolic Knowledge Acquisition Technology (SKAT) PCAI magazine, January 99, p A good introduction to SKAT : PCAI magazine, January 99, p
© 2002 Megaputer intelligence, Inc. Find Dependencies (FD) Determines most influential variables Detects multi-dimensional dependencies Predicts target variable in a table format Used as preprocessing for FL
© 2002 Megaputer intelligence, Inc. Find Dependencies (continued) Predicted Sales per Employee
© 2002 Megaputer intelligence, Inc. PolyNet Predictor (PN) Predicts values of continuous attributes Hybrid GMDH-Neural Network method Works well with large amounts of data The best architecture network is built automatically
© 2002 Megaputer intelligence, Inc. Memory Based Reasoning (MB) Performs classification to multiple categories Based on identifying similar cases in the previous history Uses Genetic Algorithms to find the most suitable metric for the problem
© 2002 Megaputer intelligence, Inc. Discriminate (DS) Determines what features of a selected data set distinguish it from the rest of the data Requires no target variable Can be powered by Find Laws PolyNet Predictor Linear Regression
© 2002 Megaputer intelligence, Inc. Linear Regression (LR) Incorporates categorical and yes/no variables in the analysis correctly Stepwise Linear Regression: only influential variables included Can be used as a preprocessing and benchmarking module
© 2002 Megaputer intelligence, Inc. PolyAnalyst features in more detail Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc. Data Analysis Project Workflow Access data Understand, clean and transform data Run machine learning analysis Visualize, report and share results Integrate results in existing business process
© 2002 Megaputer intelligence, Inc. Data Access ODBC-compliant databases: Oracle, DB2, Informix, Sybase, MS SQL Server, etc. Dedicated access IBM Visual Warehouse Oracle Express OLE DB (can do In-Place Data Mining) CSV or DBF files Data can be appended to the project when necessary
© 2002 Megaputer intelligence, Inc. Data cleansing and manipulation SQL querying through OLE DB Records selection according to multiple criteria Union, intersection, or complement of data sets Categorical values aggregation Visual Drill-through Exceptional records filtering Split into n-tile percentage intervals Random sampling
© 2002 Megaputer intelligence, Inc. Visualization Histograms Line and scatter plots with zoom and drill- through capabilities Snake charts Interactive 3D-charts Interactive Rule-graphs with sliders for visualizing multi-variable relations Frequency charts for categorical, integer, or yes/no variables Lift and Gain charts for marketing applications
© 2002 Megaputer intelligence, Inc. Histograms and Frequencies Histogram displays distribution of numerical variables Frequencies chart displays distribution of categorical and yes/no variables
© 2002 Megaputer intelligence, Inc. 2D charts and Rule-graphs Sliders help visualize effects of other variables in more than two- dimensional models The Find Laws model (red line) for a product market share dependence on the price predicts a dramatic change in the formula when the product goes on promotion
© 2002 Megaputer intelligence, Inc. Snake-charts Quickly compare qualitatively several datasets on all their attributes Low High Compared data sets All variables
© 2002 Megaputer intelligence, Inc. Interactive 3D charts You can use mouse to rotate the 3D-cube
© 2002 Megaputer intelligence, Inc. PolyAnalyst integration features Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc. Integration objectives Use models to simply score data in various external databases Deliver models to external applications in the format they understand - XML Be able to analyze very large databases in their entirety Integrate dedicated machine learning components in existing decision support systems
© 2002 Megaputer intelligence, Inc. Applying models externally PolyAnalyst can readily apply predictive models directly to data in any external source through a standard OLE DB protocol PolyAnalyst can export models to XML (PMML) format for their incorporation in external decision support applications
© 2002 Megaputer intelligence, Inc. Analyzing large databases In-Place Data Mining Traditional Data Mining
© 2002 Megaputer intelligence, Inc. PolyAnalyst COM A kit of COM-based Data Mining components See DMReview magazine, January 2000, p. 42 and PCAI magazine, March 99, p. 16 Benefits Develop new applications quickly and effortlessly Incorporate third party components Choose best components from different vendors Extend functionality by adding new components Cross-platform applications Integration with most simple tools (Visual Basic)
© 2002 Megaputer intelligence, Inc. PolyAnalyst COM (continued) Offers individual machine learning engines Integration with external applications Users see only the familiar interface enhanced by a few new buttons The main program instructs PolyAnalyst on how to access the stored data Hard analytical work is performed by integrated PolyAnalyst machine learning components behind the scenes
© 2002 Megaputer intelligence, Inc. PolyAnalyst platforms Standalone system: PolyAnalyst - Windows 9x/NT/2000/XP PolyAnalyst Pro - Windows NT/2000P/XP Pro PolyAnalyst XL - Add-ins for MS Excel Client/Server system: PolyAnalyst Knowledge Server - Windows NT Client - Windows 9x/NT/2000 or OS/2
© 2002 Megaputer intelligence, Inc. Customer quotes Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc. Timothy Nagle Consulting Scientist 3M Corporation St. Paul, MN, USA Analytical engines do an excellent job of finding relations amongst many fields without overfitting. PolyAnalyst supports medical projects at 3M
© 2002 Megaputer intelligence, Inc. James Farkas Senior Navigation Engineer The Boeing Company Kent, WA, USA PolyAnalyst provides quick and easy access for inexperienced users to powerful modeling tools. PolyAnalyst helps improving flight control system at Boeing
© 2002 Megaputer intelligence, Inc. Raymond Burke E.W. Kelley Professor of BA Kelley Business School Indiana University Bloomington, IN, USA PolyAnalyst provides a unique and powerful set of tools for data mining applications, including promotion response analysis, customer segmentation and profiling, and cross- selling analysis. PolyAnalyst facilitates marketing research at Indiana University
© 2002 Megaputer intelligence, Inc. PolyAnalyst helps medical research at the University of Wisconsin-Madison Prof. Roger L. Brown Director of RDSU University of Wisconsin Madison, WI, USA PolyAnalyst suite enabled our researchers to search their data for rules and structure while providing a symbolic knowledge of the structure, the detail they needed.
© 2002 Megaputer intelligence, Inc. PolyAnalyst provides efficient machine learning algorithms Mario Apicella Technology Analyst InfoWorld Test Center PolyAnalyst focuses more effectively on data discovery than its competition.
© 2002 Megaputer intelligence, Inc. PolyAnalyst future developments Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc. Future developments Further support for OLE DB for DM Nested tables New machine learning algorithms Time series analysis Kohonen maps Enhanced data import and manipulation Visual development of workflow scripts New push-button vertical applications
© 2002 Megaputer intelligence, Inc. PolyAnalyst -- WebAnalyst PolyAnalyst supports support visual project development when used on top of a new Megaputer web-enabled enterprise server, WebAnalyst
© 2002 Megaputer intelligence, Inc. PolyAnalyst evaluation Download a FREE evaluation copy of PolyAnalyst from and enjoy using it hands-on following the provided step- by-step lessons, or exploring your own data.
© 2002 Megaputer intelligence, Inc. Any Questions? Call Megaputer at (812) or write 120 W Seventh Street, Suite 310 Bloomington, IN USA Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc. Case 1: Asea Skandia (Sweden)
© 2002 Megaputer intelligence, Inc. Asea Skandia Established 1907 Largest Swedish distributor of electrical equipment About 1,400 employees and a turnover of SEK 5.1 billion About ten thousand product names Not good at CRM and DB marketing yet Had only transactional data in a database
© 2002 Megaputer intelligence, Inc. Groups of products offered Home Appliances 90 Cookers, cooker fans, microwave ovens 91 Fridges/Chillers/Freezers 92 Washing machines, dishwashers, dryers 93 Sauna unit, fans 94 Small appliances Lightning 17 IR, RF and Bus control systems 19 Light reg.. timers, plugs, CCE-con., car heaters 70 Interior light fittings 72 Industrial light fittings 73 Emergency luminaires 74 Spotlights and downlights, lighting tracks 75 Decorative interior light fittings 77 Exterior light fittings 79 Accessories and spare parts 80 Fluorescent lamps and other discharge lamps 81 Incandescent filament and halogen lamps 82 Special lamps Ventilation and sheet metal 15 Fastening and fixings, protective equipment 16 Tools, implements, protective equipment & clothin 66 Ventilation 67 Sheet Metal for Buildings Telecommunications 48 Low current cable 49 Data and optical fiber cable 50 Network material 51 Local data networks 52 Power Supply 53 Signalling equipment 55 Distress signal systems 57 Telephony 58 Internal communication systems 60 Aerial equipment 62 Sound and time distribution systems 63 Safety and Security Systems 64 Service Alarm Systems Electrical Equipment 1 Power and control cables 2 Electrical installation, wiring and flexible cable 6 Material kits, cable protection, lightning equipment 7 Terminations, joints, cabinets and electrical tape 8 Contact crimping 9 Electric meters 11 Cable ladders, trays, trunking, cable trolleys 14 Conduit, boxes, glands, fire protection 15 Fastening and fixings, protective equipment 16 Tools, implements, protective equipment & clothin 18 Switch systems 20 Fuses with accessories 21 Miniature circuit breaker systems 22 Distribution board systems IP20-IP43 23 Distribution board systems IP43-IP65 25 Equipment boxes, equipment cabinets 26 Distribution board accessories 28 Switchgear components, capacitors, busbar trunking 29 Connection terminals and marking materials 31 Motor, safety, load and MCCB breakers 32 Contactors and starters 35 Motors 37 Push switches 38 Sensors, monitors and regulators 40 Relays, time relays 42 Metering instruments 43 Spare parts for consumer goods 45 Programmable control system 85 Radiators and thermostats 87 Fan heaters 88 Water heaters and electric boilers 89 Heating cable
© 2002 Megaputer intelligence, Inc. (continued) Predicting cross-sell opportunities was possible Closer cooperation with the client was necessary Megaputer teamed with Cambridge Technology Partners (Sweden) Data was disguised prior to the analysis Asea Skandia CTPCTPMegaputerMegaputer Determined business potential of the data Developed data exploration strategy Developed data exploration strategy Carried out Market Basket Analysis Carried out Market Basket Analysis Provided actionable results to CTP Provided actionable results to CTP Identified most suitable solution provider Worked with the client Collected available data Collected available data Aggregated data in product categories Aggregated data in product categories Presented Megaputer results to the client Presented Megaputer results to the client Identified new opportunity Hired a consultant Helped aggregating products in groups Incorporated results in marketing activities Incorporated results in marketing activities
© 2002 Megaputer intelligence, Inc. PolyAnalyst MBA Works times faster than traditional Easily finds many-to-one rules I would like to continue working together with Megaputer on other CTP customers projects (mainly Swedish and Danish Banks ). -- Olof Goransson Senior Data Consultant CTP Skandinavien AB
Intelligence Through Learning from Data Monash University Semester 1, March 2006.
1 Information Systems Using Information (Higher).
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 11: Data Mining and Data Visualization Decision Support Systems.
1 Data Warehousing Denis Manley Enterprise Systems FT228/3.
Introduction to Crystal Reports Allows you to produce the report you want from virtually any data source. Designed to help analyze and interpret.
© 2009 Wellesley Information Services. All rights reserved. A comprehensive guide to SAP NetWeaver Visual Composer Dr. Bjarne Berg.
Chapter 3: Supervised Learning. CS583, Bing Liu, UIC 2 Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification.
Training on Cost Estimation & Analysis Karen Richey Jennifer Echard Madhav Panwar.
UNIT-l Conceptual foundation of Business Process reengineering 1.
Call Recording Made Easy Presented by Barbara Courneya National Director of Contact Center Technology Avaya Certified Contact Center Expert ,
1 APO KM Tools and Techniques. 2 Objectives To present and discuss some of the key KM methods, tools, technologies and techniques to be considered for.
1 © 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
1 © Copyright 2010 Dieter Fensel and Katharina Siorpaes Semantic Web Applications.
ISBN Prentice-Hall, 2006 Chapter 5 Designing the System Copyright 2006 Pearson/Prentice Hall. All rights reserved.
The Whirlwind Tour Chapter 1a. 2 Transactions: Where It All Started [Cuneiform] documents now number about half a million, three- quarters of them more.
VitalQIP ® Address Management Solutions, 2009 Alcatel-Lucent Sales Training – September 2009 VQ-SM-003W Alcatel-Lucent Educational Services.
1 Ontologizing the Ontolog Content Protégé Workshop Denise A. D. Bedford, Ph.d. July 23, 2006.
Welcome to Washington, DC Accessibility Forum Meeting June 2002.
10-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
CSS Data Warehousing for BS(CS) Lecture 1-2: DW & Need for DW Khurram Shahzad Department of Computer Science.
A Tool for Energy Planning and GHG Mitigation Assessment Charlie Heaps LEAP Developer and Director, U.S. Center.
By TriVium Systems Maximizes the return on your telecom equipment investment 1.
Technology that changes everything. About this Powerpoint Show The prime objective of this PPT is to introduce GP partners to the scope and depth of Trinitys.
Chapter 4 Slide 1 Original 33 slides by Prof. Anita Beecroft, Kwantlen Polytechnic University 16 slides added Feb 2011 by Prof. Tim Richardson, University.
Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide.
SAS Training Basic 1. Agenda 2 Introduction to SAS Software Program Data preparation & Tabulation Test of Difference: T-test, and ANOVA Test of Association:
Customer Assessment Office of Quality Management Office of Research Services National Institutes of Health October 2005.
© 2016 SlidePlayer.com Inc. All rights reserved.