Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data.

Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data Smoothing: Moving Averages - Data Correlation - Normalization of Data (Source: Tsui, F. Managing Software Projects. Jones and Bartlett, 2004)

Reliable, Accurate, and Valid Data

3 Definitions Reliable data: Data that are collected and tabulated according to the defined rules of measurement and metric Accurate data: Data that are collected and tabulated according to the defined level of precision of measurement and metric Valid data: Data that are collected, tabulated, and applied according to the defined intention of applying the measurement

Distribution of Data

5 Definition Data distribution: A description of a collection of data that shows the spread of the values and the frequency of occurrences of the values of the data

6 Example #1: Skew of the Distribution Severity level 1: 23 Severity level 2: 46 Severity level 3: 79 Severity level 4: 95 Severity level 5: 110 The number of problems detected at each of five severity levels (more on next slide)

7 Example #1 (continued) Number of Problems Found 120 – 100 – 80 – 60 – 40 – 20 – Severity Level 0 1 2 3 4 5 + + + + + Number of problems is skewed towards the higher-numbered severity levels

8 Example #2: Range of Data Values Functional area 1: 2 Functional area 2: 7 Functional area 3: 3 Functional area 4: 8 Functional area 5: 0 Functional area 6: 1 Functional area 7: 8 The number of severity level 1 problems by functional area The range is from 0 to 8

9 Example #3: Data Trends Week 1: 20 Week 2: 23 Week 3: 45 Week 4: 67 Week 5: 35 Week 6: 15 Week 7: 10 The total number of problems found in a specific functional area across the test time period in weeks

Centrality and Dispersion

11 Definition Centrality analysis: An analysis of a data set to find the typical value of that data set Approaches –Average value –Median value –Mode value –Variance and Standard deviation –Control chart

12 Average, Median, and Mode Average value (or mean): One type of centrality analysis that estimates the typical (or middle) value of a data set by summing all the observed data values and dividing the sum by the number of data points –This is the most common of the centrality analysis methods Median: A value used in centrality analysis to estimate the typical (or middle) value of a data set. After the data values are sorted, the median is the data value that splits the data set into upper and lower halves –If there are an even number of values, the values of the middle two observations are averaged to obtain the median Mode: The most frequently occurring value in a data set –If the data set contains floating point values, use the highest frequency of values occurring between two consecutive integers (inclusive)

13 Example Data Set = {2, 7, 3, 8, 0, 1, 8} Average = x avg = (2 + 7 + 3 + 8 + 0 + 1 + 8) / 7 = 4.1 Median: 30, 1, 2, 3, 7, 8, 8 ^ Mode: 8

14 Variance and Standard Deviation Variance: The average of the squared deviations from the average value s 2 = SUM [ (x i – x avg ) 2 ) ] / (n – 1) Standard deviation: the square root of the variance. A metric used to define and measure the dispersion of data from the average value in a data set It is numerically defined as follows: s = SQRT [ SUM [ (x i – x avg ) 2 ) ] / (n – 1) ] where SQRT = square root function SUM = sum function x i = ith observation x ave = average of all x i n = total number of observations

15 Standard Deviation: Example Data Set = {2, 7, 3, 8, 0, 1, 8} x avg = (2 + 7 + 3 + 8 + 0 + 1 + 8) / 7 = 4.1 SUM [ (x i – x avg ) 2 ) ] = 4.41 + 8.41 + 1.21 + 15.21 + 16.81 + 9.61 + 15.21 = 70.87 SUM [ (x i – x avg ) 2 ) ] / (n – 1) = 70.87 / 6 = 11.81 STANDARD DEVIATION = s = SQRT(11.81) = 3.44

16 Control Chart Control chart: A chart used to assess and control the variability of some process or product characteristic It usually involves establishing lower and upper limits (the control limits) of data variations from the data set’s average value If an observed data value falls outside the control limits, then it would trigger evaluation of the characteristic

17 Control Chart (continued) 7.54 problems 4.1 problems (average) 0.66 problems + + + +

Data Smoothing: Moving Averages

19 Definitions Moving average: A technique for expressing data by computing the average of a fixed grouping (e.g., data for a fixed period) of data values; it is often used to suppress the effects of one extreme data point Data smoothing: A technique used to decrease the effects of individual, extreme variability in data values

20 Example Test weekProblems found2-week moving avg3-week moving avg 120-- 23326.5- 3453932.7 4675648.3 5355149 6152539 72017.523.3

Data Correlation

22 Definition Data correlation: A technique that analyzes the degree of relationship between sets of data One sought-after relationship is software is that between some attribute prior to product release and the same attribute after product release One popular way to examine data correlation is to analyze whether a linear relationship exists –Two sets of data are paired together and plotted –The resulting graph is reviewed to detect any relationship between the data sets

23 Linear Regression Linear regression: A technique that estimates the relationship between two sets of data by fitting a straight line to the two sets of data values This is a more formal method of doing data correlation Linear regression uses the equation of line: y = mx + b, where m is the slope and b is the y-intercept value To calculate the slope, use the following: m = SUM [(x i – x avg ) x (y i – y avg )] / SUM [(x i – x avg ) 2 ] To calculate the y-intercept, use the following: b = y avg – (m x x avg )

24 Example SW Products#Pre-release#Post-release A1024 B513 C3571 D75155 E1534 F2250 G716 H54112 Pre-release and Post-release Problems

25 Example (continued) x avg = 27.9 y avg = 59.4 m = 2.0 slope (approx.) b = 3.6 y-intercept (approx.) y = 2x + 3.6

26 Example (continued) Number of Post-release Problems Found Number of Pre-release Problems Found 200 - 150 – 100 – 50 – 0 10 20 30 40 50 60 70 80 + + + + + + + +

Normalization of Data

28 Definition Normalizing data: A technique used to bring data characterizations to some common or standard level so that comparisons become more meaningful This is needed because a pure comparison of raw data sometimes does not provide an accurate comparison The number of source lines of code is the most common means of normalizing data –Function points may also be used

29 Summary Reliable, Accurate, and Valid Data Distribution of Data Centrality and Dispersion Data Smoothing: Moving Averages Data Correlation Normalization of Data

Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data.

Similar presentations

Presentation on theme: "Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data.

Similar presentations

Presentation on theme: "Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data."— Presentation transcript:

Similar presentations

About project

Feedback