Doc.RNDr.Iveta Bedáňová, Ph.D.

Doc.RNDr.Iveta Bedáňová, Ph.D.
Biostatistics Doc.RNDr.Iveta Bedáňová, Ph.D.

= statistics applied to biological problems.
Biostatistics = statistics applied to biological problems. Every individual is unique, therefore data obtained may be very different and variable (genetic variability) – they need specific methods (i.e. statistical) for their evaluation. Statistical methods can take into account great variability of biological data, evaluate them and give correct inferencies about studied biological objects. Use: in research sphere - how to design experiments and evaluate their results.

Types of Biological Data (Variables)
Data on Nominal Scale – are classified by some quality (Categorical Data) (2 possibilities: present or not present – disease, anomaly, death, vaccination … ) Data on Ordinal Scale – consist of arrangement of measurements (Rank Data) based on subjective scale (classification on grades, points in competitions) Data on Numerical Scale – exact numeric values (obtained in objective measurement, device) (body temperature, weight, lenght, volume etc.)

Formal viewpoint: Continuous Data - variables that could be any conceivable value within any observed range (height, lenght, weight, temperature) Discrete Data (discontinuous) - variables that can take only certain values – integer numbers (number of animals, patients, eggs, cells etc.) Numerical- and ordinal-scale data may be continuous or discrete. Nominal-scale data are discrete by their nature.

Statistical Sets (groups of individuals – animals, plants, cells, etc
Population (Universe) – N= („endless“ number of members) - „all items“, that could show studied variable - is often very large (e.g. cattle in Europe, dogs in CR) Sample (Subset) – n (number of members) - definite number of individuals from the population (inaccuracy in comparison with the whole population) - „representative“ subset of the population (to reach the most valid conclusions about a population): • random sample (no subjective choice) • appropriate size of the sample

Random Variable - Frequency Distribution (Discrete Data: Bar Graph)
x (number of pups) y (frequency) 3 2 1 Discrete data - number of puppies in a litter: 2,3,4,4,5,5,5,6,6,7,7,8

Frequency Distribution – Continuous Data: (Histogram)
We create classes = equivalent intervals of data. Freq. x (Weight) Polygon (specific for 1 sample) Histogram Midpoint of the class All data in the interval get the same value = midpoint of the class Number of items (individuals) in the interval = frequency of the class

Theoretical curve (population)
Frequency (Probability) Distribution P(x) – Probability (proportion of cases) Empirical curves (samples) Theoretical curve (population) x (Weight) Empirical curves for different samples (from one population) are located along the only one theoretical curve (continuous), that describes probability distribution of the variable in the population.

Shapes of Probability Distributions
Normal (Gaussian) symmetric bell b) Nonnormal („Unknown“ ) asymetric, extreme, irregular

Quantiles, Proportions of Distribution
For every distribution, we can define measures (quantiles) that divide a group of ordered data into 2 parts (portions): - values that are smaller than quantile - values that are bigger than quantile 50% quantile – x0.5 (Median) divides a group into 2 halves X0.5 50% X0.5 50% Quartiles (4 equal parts), Deciles (10 parts), Percentiles (100) Quantiles are used as critical values in statistical hypotheses testing.

Descriptive Characteristics of Statistical Sets

Parameters – describe characteristic features of populations
(exact, but we are not able to calculate them for endless number of individuals in the population – we can only estimate them by means of sample data) - represented by Greek letters (e.g ) Statistics – describe characteristic features of samples (we calculate them from the sample data and they serve as an estimate of exact population parameters) - represented by Latin letters (e.g )

Descriptive Characteristics
A) Measures of Central Tendency - describe the middle of range of values in a sample or population B) Measures of Dispersion and Variability - describe dispersion of values around the middle in a sample or population

Measures of Central Tendency
(describe where a majority of measurements occurs) 1) The Arithmetic Mean: (population), (sample) (Average – AVG) Properties: is affected by extreme values  it should be used in homogenous regular distributions (Gaussian) only (to describe the middle of the population correctly) has the same units of measurement as do the individual observations (sum of all deviations from the mean will be always 0)

2) The Median: (population), (sample)
= the middle value in an ordered set of data (there are just as many values bigger than the median as there are smaller) if the sample size (n) is odd  there is only 1 middle value in ordered sample data and indicates the median if n is even  there are two middle values, and the median is a midpoint (mean) between them Rank of the median:

The Median - Properties:
- is not affected by extreme values - 50% quantile (divides distribution curve into 2 halves ) 50% 50% - it may be used in irregular (asymetric) distributions (is a better characteristic of the middle of the set than the average)

Example: Body weights in two varieties of laboratory mice:
Variety A Variety B xi (g) xi (g) __________ ___45______ n = n = 10

3) The Mode: (population), (sample)
= most frequently occuring measurement in a data set (top of distribution curve) Properties: is not affected by extremes is not very exact measure of the middle of set (not often used in biological and medical data)

B) Measures of Variability
- describe dispersion (scattering) of measurements around the center of a distribution 1) The Range: R= xmax – xmin is dependent on 2 extreme values of data relatively imprecise measure of variability – it does not take into account any measurements between the highest and lowest value.

Variability expressed in terms of deviations from the mean:
As the sum of all deviations from the mean is always equal to 0  summation would be useless as a measure of variability. The method to eliminate the signs of the deviations from the mean: to square the deviations. Then we can define the sum of squares:

2) The Variance: (population), (sample)
= the mean sum of squares about a mean Population variance „Estimated variance“ Variance has the square units as do the original measurements.

Degree of freedom (DF): =n-1
(n reduced by number of known statistics in the sample) DF reflects a sample error in comparison with the population: 2  s2 Population Sample When n is big (small error)  the result of s2 calculation is only a little different from the exact 2. When n is small (big error)  the result of s2 calculation is very different from the exact 2.

3) The Standard Deviation (SD): (population), (sample)
= square-root of the variance (it has the same units as the original measurements) 4) The Coefficient of Variability: („Relative standard deviation“) – a relative measure, not dependent on units of measurement „Estimated V“ Used for comparison of variability in data sets with different magnitude of their units (e.g.weight in mice and cows).

Doc.RNDr.Iveta Bedáňová, Ph.D.

Similar presentations

Presentation on theme: "Doc.RNDr.Iveta Bedáňová, Ph.D."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Doc.RNDr.Iveta Bedáňová, Ph.D.

Similar presentations

Presentation on theme: "Doc.RNDr.Iveta Bedáňová, Ph.D."— Presentation transcript:

Similar presentations

About project

Feedback