P ROPOSAL AND V ALIDATION OF A F EASIBILITY M ODEL FOR I NFORMATION M INING P ROJECTS Pablo Pytel. Paola Britos & Ramón García-Martínez.

Slides:



Advertisements
Similar presentations
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Data Processing, Fundamental Data Analysis, and Statistical Testing of Differences CHAPTER.
Advertisements

Departments of Medicine and Biostatistics
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 12 Chi-Square Tests and Nonparametric Tests
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Information Systems Development Lecture 2: the idea of the Life Cycle.
© 2004 Prentice-Hall, Inc.Chap 10-1 Basic Business Statistics (9 th Edition) Chapter 10 Two-Sample Tests with Numerical Data.
Basic Business Statistics (9th Edition)
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Today Concepts underlying inferential statistics
Chapter 12 Chi-Square Tests and Nonparametric Tests
Mann-Whitney and Wilcoxon Tests.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
1 Chapter 15: Nonparametric Statistics Section 15.1 How Can We Compare Two Groups by Ranking?
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
14 Elements of Nonparametric Statistics
Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.
1 Introduction to Hypothesis Testing. 2 What is a Hypothesis? A hypothesis is a claim A hypothesis is a claim (assumption) about a population parameter:
Section 9.2 Testing the Mean  9.2 / 1. Testing the Mean  When  is Known Let x be the appropriate random variable. Obtain a simple random sample (of.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Unit 2: Engineering Design Process
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Lesson Inferences about the Differences between Two Medians: Dependent Samples.
Team Assignment 15 Team 04 Class K15T2. Agenda 1. Introduction 2. Measurement process 3. GQM 4. Strength Weakness of metrics.
Chapter 22: Comparing Two Proportions
Ordinally Scale Variables
Research Methods in Computational Informatics IST 501 Fall 2014 Dongwon Lee, Ph.D.
1 Nonparametric Statistical Techniques Chapter 17.
Chapter Twelve Copyright © 2006 John Wiley & Sons, Inc. Data Processing, Fundamental Data Analysis, and Statistical Testing of Differences.
1 Statistical Significance Testing. 2 The purpose of Statistical Significance Testing The purpose of Statistical Significance Testing is to answer the.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
DOX 6E Montgomery1 Design of Engineering Experiments Part 2 – Basic Statistical Concepts Simple comparative experiments –The hypothesis testing framework.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Data Processing.
USE OF UNCERTAINTY OF MEASUREMENT IN TESTING ROHAN PERERA MSc ( UK ), ISO/IEC Technical Assessor, Metrology Consultant.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
ASSESSMENT CRITERIA Jessie Johncock Mod. 2 SPE 536 October 7, 2012.
Nonparametric tests: Tests without population parameters (means and standard deviations)
Nonparametric Tests with Ordinal Data Chapter 18.
Dr. Engr. Sami ur Rahman Data Analysis Lecture 2: Basic Concepts of Statistical Methods.
MSA Orientation – v203a 1 What’s RIGHT with the CMMI?!? Pat O’Toole
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
T tests comparing two means t tests comparing two means.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter 4 Exploring Chemical Analysis, Harris
Collecting and Processing Information Foundations of Technology Collecting and Processing Information © 2013 International Technology and Engineering Educators.
Copyright © Cengage Learning. All rights reserved. 10 Inferences about Differences.
Reconciling the Value Estimates Basic Real Estate Appraisal: Principles & Procedures – 9 th Edition © 2015 OnCourse Learning Chapter 15.
Comparing Two Proportions
Chapter 12 Chi-Square Tests and Nonparametric Tests
NONPARAMETRIC STATISTICS
Research Methodology Lecture No :25 (Hypothesis Testing – Difference in Groups)
LBSRE1021 Data Interpretation Lecture 9
Agile Software Development
Comparing Two Proportions
Working Group on Rail Transport Statistics
Elementary Statistics
Elementary Statistics
Chapter 11: The ANalysis Of Variance (ANOVA)
Lecture Slides Elementary Statistics Twelfth Edition
Comparing Two Proportions
Tests About a Population Mean
Presentation transcript:

P ROPOSAL AND V ALIDATION OF A F EASIBILITY M ODEL FOR I NFORMATION M INING P ROJECTS Pablo Pytel. Paola Britos & Ramón García-Martínez

AGENDA  Problem Description  Proposed Solution  Validation o Proof Concept o Comparison with real projects  Conclusions

 Problem Desctipion: Information Mining Projects Software Engineering o Methods o Technics o Tools Metodologies: o CRISP-DM o P 3 TQ o SEMMA 85% [2000] and 60% [2005] of projects failed to achieve its goals The main problems (and associted risks) are not identified in the initial stages Feasibility Model

 Feasibility Model for Information Mining Projects:  13 characteristics to be evaluated: o Categories:  Plausibility  Adequacy  Sucess  Procedure:  Data  Business Problem  Project  Project Team o Dimensions: Determining the value of each project features Interpreting the results Converting feature values into fuzzy intervals Calculating the value of each dimension Calculating the overall project feasibility

 Validation – Proof Concept: o Step 1: Determining the value of each project features Project Objetive Detecting evidence of causality between general satisfaction and internet. CategoryIDValue Data P1All P2Regular A1All A2Much A3Regular E1Little Business Problem P3All A4Much A5Regular Project E2Much E3Regular Project Team P4All E4Much Fuzzy Interval ( 7.8; 8.8; 10; 10 ) ( 3.4; 4.4; 5.6; 6.6 ) ( 7.8; 8.8; 10; 10 ) ( 5.6; 6.6; 7.8; 8.8 ) ( 3.4; 4.4; 5.6; 6.6 ) ( 1.2; 2.2; 3.4; 4.4 ) ( 7.8; 8.8; 10; 10 ) ( 5.6; 6.6; 7.8; 8.8 ) ( 3.4; 4.4; 5.6; 6.6 ) ( 5.6; 6.6; 7.8; 8.8 ) ( 3.4; 4.4; 5.6; 6.6 ) ( 7.8; 8.8; 10; 10 ) (5.6; 6.6; 7.8; 8.8 ) o Step 2: Converting feature values into fuzzy intervals Conversion Table

 Validation – Proof Concept: (2) o Step 3: Calculating the value of each dimension o Step 4: Calculating the overall project feasibility. o Step 5: Interpreting the results. Plausibility Adequacy Sucess DimensionValue Plausibility7.60 Adequacy6.27 Sucess5.25 Overall Project Feasibility6.47 Feasible Accepted (in the limit)

 Validation – Comparison with real projects: 1)Apply the model into 25 real projects: o 22 projects finished successfully o 3 projects cancelled before completion 2)Request experts to appraise the project. 3)Compare the model’s result with project appraisal provided by experts.  Statistical Analysis  Wilcoxon signed-rank test

 Validation – Comparison with real projects: (2)  Statistical Analysis Plausibility Adequacy

 Validation – Comparison with real projects: (3)  Statistical Analysis Sucess Overall Project Feasibility

 Validation – Comparison with real projects: (4)  Statistical Analysis Plausibility Adequacy Sucess Overall Project Feasibility

 Validation – Comparison with real projects: (5)  Wilcoxon signed-rank test: Hypotheses : H 0 : there are no meaningful differences between the researchers and the model values (i.e. they are equivalent). H 1 : the researchers and the model values are not equivalent. Dimension Sum Ranks + ( W + ) Sum Ranks – ( W + ) Plausibility Adequacy Success Overall Feasibility  level of significance = 0.01  quantity of non-zero pairs = 25  critical value = 68 Check Critical Value 97 > 68  H 0 accepted 98 > 68  H 0 accepted 150 > 68  H 0 accepted 144 > 68  H 0 accepted

 Conclusions:  A model to determine whether a data mining project is feasible or not at an early stage is proposed  From the application of the model into real projects:  Statistical Analysis: o the model tends to be more conservative than the experts o standard deviation range and average values are almost the same  Wilcoxon signed-rank test the proposed model is equivalent to the appraisal performed by the experts.

THANK YOU FOR YOUR ATTENTION