Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data.

Slides:



Advertisements
Similar presentations
Richard M. Jacobs, OSA, Ph.D.
Advertisements

Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Introduction to Summary Statistics
BHS Methods in Behavioral Sciences I April 18, 2003 Chapter 4 (Ray) – Descriptive Statistics.
QUANTITATIVE DATA ANALYSIS
Calculating & Reporting Healthcare Statistics
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Intro to Descriptive Statistics
Edpsy 511 Homework 1: Due 2/6.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Measures of Central Tendency
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?
Numerical Descriptive Techniques
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
© The Catholic University of America Dept of Biomedical Engineering ENGR 104: Lecture 2 Statistical Analysis Using Matlab Lecturers: Dr. Binh Tran.
Descriptive Statistics Anwar Ahmad. Central Tendency- Measure of location Measures descriptive of a typical or representative value in a group of observations.
Methods for Describing Sets of Data
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Descriptive Statistics Descriptive Statistics describe a set of data.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Thinking About Psychology: The Science of Mind and Behavior 2e Charles T. Blair-Broeker Randal M. Ernst.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Research Methodology Lecture No :24. Recap Lecture In the last lecture we discussed about: Frequencies Bar charts and pie charts Histogram Stem and leaf.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Descriptive Statistics Descriptive Statistics describe a set of data.
UTOPPS—Fall 2004 Teaching Statistics in Psychology.
According to researchers, the average American guy is 31 years old, 5 feet 10 inches, 172 pounds, works 6.1 hours daily, and sleeps 7.7 hours. These numbers.
INVESTIGATION 1.
Descriptive Statistics Prepared by: Asma Qassim Al-jawarneh Ati Sardarinejad Reem Suliman Dr. Dr. Balakrishnan Muniandy PTPM-USM.
CEN st Lecture CEN 4021 Software Engineering II Instructor: Masoud Sadjadi Monitoring (POMA)
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
INVESTIGATION Data Colllection Data Presentation Tabulation Diagrams Graphs Descriptive Statistics Measures of Location Measures of Dispersion Measures.
Chapter SixteenChapter Sixteen. Figure 16.1 Relationship of Frequency Distribution, Hypothesis Testing and Cross-Tabulation to the Previous Chapters and.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
RESEARCH & DATA ANALYSIS
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Designing Social Inquiry STATISTICAL METHOD Jaechun Kim.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
LIS 570 Summarising and presenting data - Univariate analysis.
Why do we analyze data?  It is important to analyze data because you need to determine the extent to which the hypothesized relationship does or does.
CHAPTER 2: Basic Summary Statistics
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Why do we analyze data?  To determine the extent to which the hypothesized relationship does or does not exist.  You need to find both the central tendency.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
MM150 Unit 9 Seminar. 4 Measures of Central Tendency Mean – To find the arithmetic mean, or mean, sum the data scores and then divide by the number of.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 18.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 10 Descriptive Statistics Numbers –One tool for collecting data about communication.
Methods for Describing Sets of Data
Different Types of Data
Analysis and Empirical Results
Data Mining: Concepts and Techniques
Measures of Central Tendency
Teaching Statistics in Psychology
Central Tendency and Variability
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Description of Data (Summary and Variability measures)
Science of Psychology AP Psychology
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics
Introduction to Statistics
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Descriptive Statistics
Lecture 4 Psyc 300A.
Presentation transcript:

Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data Smoothing: Moving Averages - Data Correlation - Normalization of Data (Source: Tsui, F. Managing Software Projects. Jones and Bartlett, 2004)

Reliable, Accurate, and Valid Data

3 Definitions Reliable data: Data that are collected and tabulated according to the defined rules of measurement and metric Accurate data: Data that are collected and tabulated according to the defined level of precision of measurement and metric Valid data: Data that are collected, tabulated, and applied according to the defined intention of applying the measurement

Distribution of Data

5 Definition Data distribution: A description of a collection of data that shows the spread of the values and the frequency of occurrences of the values of the data

6 Example #1: Skew of the Distribution Severity level 1: 23 Severity level 2: 46 Severity level 3: 79 Severity level 4: 95 Severity level 5: 110 The number of problems detected at each of five severity levels (more on next slide)

7 Example #1 (continued) Number of Problems Found 120 – 100 – 80 – 60 – 40 – 20 – Severity Level Number of problems is skewed towards the higher-numbered severity levels

8 Example #2: Range of Data Values Functional area 1: 2 Functional area 2: 7 Functional area 3: 3 Functional area 4: 8 Functional area 5: 0 Functional area 6: 1 Functional area 7: 8 The number of severity level 1 problems by functional area The range is from 0 to 8

9 Example #3: Data Trends Week 1: 20 Week 2: 23 Week 3: 45 Week 4: 67 Week 5: 35 Week 6: 15 Week 7: 10 The total number of problems found in a specific functional area across the test time period in weeks

Centrality and Dispersion

11 Definition Centrality analysis: An analysis of a data set to find the typical value of that data set Approaches –Average value –Median value –Mode value –Variance and Standard deviation –Control chart

12 Average, Median, and Mode Average value (or mean): One type of centrality analysis that estimates the typical (or middle) value of a data set by summing all the observed data values and dividing the sum by the number of data points –This is the most common of the centrality analysis methods Median: A value used in centrality analysis to estimate the typical (or middle) value of a data set. After the data values are sorted, the median is the data value that splits the data set into upper and lower halves –If there are an even number of values, the values of the middle two observations are averaged to obtain the median Mode: The most frequently occurring value in a data set –If the data set contains floating point values, use the highest frequency of values occurring between two consecutive integers (inclusive)

13 Example Data Set = {2, 7, 3, 8, 0, 1, 8} Average = x avg = ( ) / 7 = 4.1 Median: 30, 1, 2, 3, 7, 8, 8 ^ Mode: 8

14 Variance and Standard Deviation Variance: The average of the squared deviations from the average value s 2 = SUM [ (x i – x avg ) 2 ) ] / (n – 1) Standard deviation: the square root of the variance. A metric used to define and measure the dispersion of data from the average value in a data set It is numerically defined as follows: s = SQRT [ SUM [ (x i – x avg ) 2 ) ] / (n – 1) ] where SQRT = square root function SUM = sum function x i = ith observation x ave = average of all x i n = total number of observations

15 Standard Deviation: Example Data Set = {2, 7, 3, 8, 0, 1, 8} x avg = ( ) / 7 = 4.1 SUM [ (x i – x avg ) 2 ) ] = = SUM [ (x i – x avg ) 2 ) ] / (n – 1) = / 6 = STANDARD DEVIATION = s = SQRT(11.81) = 3.44

16 Control Chart Control chart: A chart used to assess and control the variability of some process or product characteristic It usually involves establishing lower and upper limits (the control limits) of data variations from the data set’s average value If an observed data value falls outside the control limits, then it would trigger evaluation of the characteristic

17 Control Chart (continued) 7.54 problems 4.1 problems (average) 0.66 problems

Data Smoothing: Moving Averages

19 Definitions Moving average: A technique for expressing data by computing the average of a fixed grouping (e.g., data for a fixed period) of data values; it is often used to suppress the effects of one extreme data point Data smoothing: A technique used to decrease the effects of individual, extreme variability in data values

20 Example Test weekProblems found2-week moving avg3-week moving avg

Data Correlation

22 Definition Data correlation: A technique that analyzes the degree of relationship between sets of data One sought-after relationship is software is that between some attribute prior to product release and the same attribute after product release One popular way to examine data correlation is to analyze whether a linear relationship exists –Two sets of data are paired together and plotted –The resulting graph is reviewed to detect any relationship between the data sets

23 Linear Regression Linear regression: A technique that estimates the relationship between two sets of data by fitting a straight line to the two sets of data values This is a more formal method of doing data correlation Linear regression uses the equation of line: y = mx + b, where m is the slope and b is the y-intercept value To calculate the slope, use the following: m = SUM [(x i – x avg ) x (y i – y avg )] / SUM [(x i – x avg ) 2 ] To calculate the y-intercept, use the following: b = y avg – (m x x avg )

24 Example SW Products#Pre-release#Post-release A1024 B513 C3571 D75155 E1534 F2250 G716 H54112 Pre-release and Post-release Problems

25 Example (continued) x avg = 27.9 y avg = 59.4 m = 2.0 slope (approx.) b = 3.6 y-intercept (approx.) y = 2x + 3.6

26 Example (continued) Number of Post-release Problems Found Number of Pre-release Problems Found – 100 – 50 –

Normalization of Data

28 Definition Normalizing data: A technique used to bring data characterizations to some common or standard level so that comparisons become more meaningful This is needed because a pure comparison of raw data sometimes does not provide an accurate comparison The number of source lines of code is the most common means of normalizing data –Function points may also be used

29 Summary Reliable, Accurate, and Valid Data Distribution of Data Centrality and Dispersion Data Smoothing: Moving Averages Data Correlation Normalization of Data