MEGN 537 – Probabilistic Biomechanics Ch.3 – Quantifying Uncertainty Anthony J Petrella, PhD
Big Picture Why study traditional probability (Ch. 2)? Unions and intersections allow us to conceptualize system level variability with multiple sources Chapter 3 deals with uncertainty in variables Chapter 2 presupposes failure probabilities, but often times these are not absolute or fixed This chapter allows us to characterize variability in parameters
Random Variables Continuous (variable) Data Data measured on an infinitely divisible scale or continuum No gaps between possible values Examples Height, weight Blood glycogen level Joint contact force Kinematic measures Response time (milliseconds) Discrete (attribute) Data Discrete data measures attributes, qualitative conditions, counts Gaps between possible values Examples Surgery or placebo Fusion or dynamic Number of surgeries per week or per year Pain/satisfaction surveys Number of patients
Population vs. Sample POPULATION Sample Population - A total set of all process results Sample - A subset of a population POPULATION Sample Population Measures N m s Sample Measures n x s number of data points mean standard deviation _
Frequency Histograms Histogram visually represent data centering, variability, and shape Histograms are a graphical tool used to depict the frequency of numerical data by categories (classes or bins) Properties All data will fall into a class or bin No data will overlap
Descriptors of Uncertainty Used to characterize measured data and distributions Common descriptors Mean Standard deviation Coefficient of variation Skewness
Measures of Location (Central Tendency ) Mean – also known as average; sum of all values divided by number of values Grand Average: overall average or average of the averages Median - midpoint of the data Arrange the data from lowest to highest, the median is the middle data point number 50% of the data points will fall below the median and the other 50% will fall above the median Mode - the most frequent data point, or value occurring the most often
Example - Measures of Location Given the following data on salaries: $50k, $30k, $170k, $45k, $30k, $55k, $40k Mean = (50+30+170+45+30+55+40)/7 = 420/7 = $60k Median (midpoint): 30,30,40, 45, 50,55,170 Mode (most frequent) 20 30 40 50 60 100 170 1 2 # data pts 170 55 50 45 40 30 Value Mean Median Mode
Measures of Dispersion (Variation or Spread) Range - Total width of a distribution. Range = Maximum Value - Minimum Value Variance (V) – Measure of the spread in data about the mean Second central moment Standard Deviation (S) - The most common measure of dispersion Range
Standard Deviation Standard deviation is a measure of variation telling us about consistency around the mean 50 60 70 20 30 40 50 60 170 High spread High standard deviation Poor performance Low spread Low standard deviation Consistent performance X
Other Descriptors Coefficient of variation (COV) - Relative indicator of uncertainty in a variable Ratio of standard deviation and the mean Skewness – Measure of the spread of data about the mean emphasizing the shape of the distribution Third central moment
Skewness Skewness Coefficient (qx) – Non dimensional measure 0 = symmetric + skewness Most values below the mean - skewness Most values above the mean qx = + qx = - f(x) x
Skewness Skewness Coefficient (qx) – Non dimensional measure 0 = symmetric + skewness Most values below the mean - skewness Most values above the mean mode qx = + qx = - f(x) median mean x
Probability Density Function PDF is the typical histogram or bell curve fX(x) = Probability x is in a specific bin
Cumulative Distribution Function CDF ranges from 0 to 1 Integral of the pdf f(x) CDF PDF m = 0 s = 1
Cumulative Distribution Function The CDF gives the probability of a continuous random variable having a value less than or equal to a specific value Relationship between pdf and cdf: The CDF has the following values F(x -∞) = 0 F(x = median) = 0.5 F(x +∞) =1
Creating Histograms or PDFs Arrange data in increasing order Create evenly spaced bins and count how many data points occur in each bin Note: the # of bins can affect the appearance of the histogram Rule of thumb: k = 1 + 3.3 log10 n where k = # of bins and n = number of data points Plot the number of observations versus the variable Note: for PDFs, plot frequency = (# of observations) / n
Creating CDFs Arrange data in increasing order For each data point Create an index i = 1, 2, 3,…, n Compute F(x) = i / (n+1) Plot F(x) as a function of the variable x Demo in Excel (come back to this)
Multiple Random Variables Joint distributions or joint PDF’s can be defined for multiple variables Joint PDF in 3D represents how the variables are dependent on each other
Covariance and Correlation Covariance indicates the degree of linear relationship between two random variables, denoted as: Cov(X,Y) = E(XY) – E(X)*E(Y) where E( ) is the expected value Covariance is the second moment about the respective means Covariance = 0 for statistically independent events Correlation coefficient (non-dimensional) represents the degree of linear dependence between two random variables ρx,y = Cov(X,Y)/(σx * σy) Correlation coefficient can range from -1 to +1 (Haldar p. 53) ρ = 0 no correlation ρ = +1 perfectly correlated / proportionate ρ = -1 perfectly correlated / inversely proportionate
Project Demo Assume the following parameters are normally distributed with a coefficient of variation of 0.05: |rquad.knee|, rtubercle.x, rtubercle.y, rham.knee.y.
Demos… NESSUS Excel for random trials, PDF, CDF Matlab demo with trials from #2 Reverse engineer Excel and repeat for rtx