Ondrej Ploc Part 2 The main methods of mathematical statistics, Probability distribution.

Ondrej Ploc Part 2 The main methods of mathematical statistics, Probability distribution

Outline 2.1 Assignment of the theoretical distribution to empirical distribution 2.2 Comparison of empirical and theoretical parameters, estimation of theoretical parameters, testing of parametrical hypotheses 2.3 Measurement of statistical dependences – some fundaments of regression and correlation analysis

2.1 Assignment of the theoretical distribution to empirical distribution

Goals Probable investigation of selective statistical set: Choice of acceptable theoretical distribution Probable picture of selective statistical set: Testing non- parametric hypotheses

Acquired concepts and knowledge pieces Theoretical distribution, partial survey in alphabetical order: Bernoulli, Beta, Binomial, Chi-square, Discrete Uniform, Erlang, Exponential, F, Gamma, Geometric, Lognormal, Negative binomial, Normal, Poisson, Student´s, Triangular, Uniform, Weibull Testing of non-parametric hypotheses Test of zero hypothesis H 0 Receiving or rejecting of zero hypothesis H 0 Level of statistical significance , e.g. at  = 0,05

Assigned example Hypothesis: Empirical distribution can be substituted by the normal distribution

Assigned example (2) The results of 50 test elaboration

Assignment of the theoretical distribution to empirical distribution = testing of non-parametric hypothesis Theoretical distribution is better due to simple mathematical apparatus that enables to detect the information inaccessible by another way

2.1.1 Interval division of frequencies It is recommended to construct 5 to 20 equidistant intervals of the extent of statistical sign values Sturges rule (empirical) k = 1+3.3 log n n is the extent of selective statistical set

2.1.2 Theoretical distribution Fundamental concept of probability theory it is the rule that every value of random variable assigns the probability Random variable is the variable value which is definitely determined by result of random attempt Random attempt is a realization of activities or processes the result of which is not possible to anticipate with certainty Probability = positive random attempt results / all random attempt results (e.g. shooting at a target) Random variables can be discrete or continues

Random variable To values of random variable it is possible to assign the probabilities with which they come in the course of random attempt. 2.1.2 Theoretical distribution

Distribution function F (Cumulative) distribution function quotes the probability that a random variable RV obtains the values smaller or equal to just chosen value x i (or x) and this cumulative probability will be expressed by a summation (or integral) of partial probabilities. The probability that X lies in the semi- closed interval (a, b], where a < b, is thereforeinterval Properties: 2.1.2 Theoretical distribution

Parameters of theoretical distributions The theoretical general, central and standardized moments O j, C j and N j Discrete: P j marks the distribution function, x i the values of random variable RV Continues:  (x) marks the probability density and the x the RV

Parameters of theoretical distributions Often the names and marks “mean value (expected value) E and dispersion (variance) D” are used, too. The expected value E is a location parameter which measures the level of random variable RV. The dispersion D is a variability parameter which measures the “diffusion” of random variable values. The expected value E is equal to theoretical general moment of 1.order O1, the dispersion D is equal to theoretical central moment of 2.order C2. The theoretical general moment of 1.order O1 is the location parameter, the theoretical central moment of 2.order C2 is the variability parameter, the theoretical standardized moment of 3.order N3 is the skewness parameter and the theoretical standardized parameter of 4.order N4 is the kurtosis parameter. The relation between empirical and theoretical parameters describes the law of large numbers. Subject to compliance with certain conditions, it can be expected that the empirical distribution and related empirical parameters will approximate the theoretical distribution and associated with him theoretical parameters. And the more, the greater the extent of selective statistical set (the larger the number of realized random attempts). Approaching the empirical parameters to the theoretical parameters has not character of mathematical convergence but probability convergence.

Binomial distribution Characteristic of random phenomenon The n independent random attempts are carried out, the probability of monitored random phenomenon is the same in the all random attempts and it is equal to p. It is sought the probability that this phenomenon occurs itself 0, 1, …, n- times. According to this definition the values x 0, x 1, …, x n of relevant random variable are given by numbers 0, 1, …, n. Theoretical distribution (probability function) For described random phenomenon the probability function is a rule which assigns the probabilities P i for i = 0, 1, …, n to the values x i of random variable. Distribution function: Discrete distribution

Binomial distribution The significance of binomial distribution A typical example of independent random attempts is a random selection of elements from a set if the selected element is returned back, so called the selection with return. It can be shown that, in the case where the extent of selective set is small in comparison with the extent of basic set, the difference between the selection with return and the selection without return is insignificant. The binomial distribution can therefore serve as a suitable criterion, whether the selective statistical set was created on the basis of random selection. Discrete distribution

Normal distribution Continues distribution

Normal distribution Continues distribution Standardized normal distribution: N(0,1) Distribution function F(u) is Laplace function

Normal distribution Continues distribution

Alternative distribution Discrete distribution Special case of binomial distribution for n = 1 The alternative distribution is discrete theoretical distribution A(p) with one theoretical parameter of zero-one random variable RV (the random variable has values xi = i = 0, 1).

Poisson distribution Discrete distribution

Geometric distribution Discrete distribution

Lognormal distribution Continues distribution The lognormal distribution is continuous theoretical distribution LN(μ, σ) of random variable RV which is increasing function of random variable Y in the form x = e y (the random variable Y has normal distribution N(μ, σ)). The lognormal distribution has two theoretical parameters μ, σ.

Lognormal distribution Continues distribution

Apparatus of non-parametric testing zero hypothesis H 0 supposes that empirical distribution can be substituted by intended theoretical distribution alternative hypothesis H A then supposes that this presumption isn´t correct A comparison between theoretical and empirical absolute frequencies is the essence of testing non- parametric hypotheses.

Apparatus of non-parametric testing For the verification of non-parametric and parametric hypotheses the special group of theoretical distributions was developed – these distributions are not intended to replace the empirical distributions but they work as statistical criteria. The normal distribution is the only exception – in its standardized shape it may play a role of statistical criterion, in its non-standardized shape may substitute the empirical distributions. Standardized normal distribution (u-test), Student´ distribution (t-test), Pearson´ χ2 distribution and Fisher- Snedecor distribution (F-test) belong among the most frequent statistical criteria.

Apparatus of non-parametric testing For verification of hypotheses H0 and Ha the suitable statistical criterion is needful to select. The χ2-test is used the most frequently for verification of a non- parametric hypothesis. If the creation of interval division of frequencies is a condition for its application, it is then needful to connect the each partial interval with the absolute frequency equal to at least 5. If this condition isn´t fulfilled it is necessary to connect the partial intervals. Similarly, it is necessary to proceed to the interval division of frequencies.

Apparatus of non-parametric testing After the selection of statistical criterion (e.g., χ2-test) it is needful to come up to the determination of experimental value of this criterion (e.g., χ2-exp.) and critical theoretical value (e.g., χ2-theor.). So called the critical domain W of relevant statistical criterion will be recorded by means of the critical theoretical value. If the experimental value of selected criterion will be an element of the critical domain W it is necessary to receive the alternative hypothesis Ha – i.e. the empirical distribution cannot be substituted by intended theoretical distribution. In the contrary case (the experimental value will not be an element of the critical domain W) the zero hypothesis H0 can be received – i.e. the empirical distribution can be substituted by intended theoretical distribution.

Significance level The determination of significance level α is an essential element of testing non-parametric and parametric hypotheses. This significance level quotes the probability of erroneous rejection of tested hypothesis (i.e. the probability of the error of I. type). The most frequent significance levels are the values α = 0.05 and α = 0.01. E.g., the significance level 0.05 enables for the positive test of normality (i.e. it is received the hypothesis H0 on the possibility to substitute the empirical distribution by normal distribution and the hypothesis Ha is refused) to determine the conclusion – if the selective statistical set SSS will be selected 100 times from basic statistical set BSS, in 95 cases it will be shown the empirical distribution can be substituted by normal distribution.

2.1.5. Illustration of Non- parametric Testing Hypothesis: Empirical distribution can be substituted by the normal distribution

2.1.5. Illustration of Non- parametric Testing In the course of testing the χ2-test will be applied, in the course of its application the letter k will be to refer to the number of intervals of frequency interval division, the letter r then to the number of normal distribution theoretical parameters (i.e. r = 2). The formulation ν = k–r–1 expresses the number of freedom degrees which enables together with a selected level of significance to determine the critical theoretical value χ2-teor. = χ2-k-r-1 using statistical tables. The significance level is selected α = 0,05.

Ondrej Ploc Part 2 The main methods of mathematical statistics, Probability distribution.

Similar presentations

Presentation on theme: "Ondrej Ploc Part 2 The main methods of mathematical statistics, Probability distribution."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ondrej Ploc Part 2 The main methods of mathematical statistics, Probability distribution.

Similar presentations

Presentation on theme: "Ondrej Ploc Part 2 The main methods of mathematical statistics, Probability distribution."— Presentation transcript:

Similar presentations

About project

Feedback