Download presentation

Presentation is loading. Please wait.

Published byKendall Kimble Modified about 1 year ago

1
1 Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

2
2 Part 1 Computation Tools

3
Include Functions in R source("file path") Example In MME.R: 3 MME=function(y1, y2, y3, y4) { n=y1+y2+y3+y4; phi1=4.0*(y1/n-0.5); phi2=1-4*y2/n; phi3=1-4*y3/n; phi4=4.0*y4/n; phi=(phi1+phi2+phi3+phi4)/4.0; print("By MME method") return(phi); # print(phi); } > source( “ C:/MME.R ” ) > MME(125, 18, 20, 24) [1] "By MME method" [1] In R:

4
4 Part 2 Motivation Examples

5
Example 1 in Genetics (1) Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab 5 A Bb a B A b a 1/2 a B b A A B b a

6
Example 1 in Genetics (2) Probabilities for genotypes in gametes 6 No RecombinationRecombination Male1-rr Female1-r ’ r’r’ ABabaBAb Male(1-r)/2 r/2 Female(1-r ’ )/2 r ’ /2 A Bb a B A b a 1/2 a B b A A B b a

7
Example 1 in Genetics (3) Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92. More: yes/bank/handout12.pdf 7

8
Example 1 in Genetics (4) 8 MALE AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2 FEMALEFEMALE AB (1-r ’ )/2 AABB (1-r) (1-r ’ )/4 aABb (1-r) (1-r ’ )/4 aABB r (1-r ’ )/4 AABb r (1-r ’ )/4 ab (1-r ’ )/2 AaBb (1-r) (1-r ’ )/4 aabb (1-r) (1-r ’ )/4 aaBb r (1-r ’ )/4 Aabb r (1-r ’ )/4 aB r ’ /2 AaBB (1-r) r ’ /4 aabB (1-r) r ’ /4 aaBB r r ’ /4 AabB r r ’ /4 Ab r ’ /2 AABb (1-r) r ’ /4 aAbb (1-r) r ’ /4 aABb r r ’ /4 AAbb r r ’ /4

9
Example 1 in Genetics (5) Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*. A*: the dominant phenotype from (Aa, AA, aA). a*: the recessive phenotype from aa. B*: the dominant phenotype from (Bb, BB, bB). b*: the recessive phenotype from bb. A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. Total: 16 combinations. 9

10
Example 1 in Genetics (6) Let, then 10

11
Example 1 in Genetics (7) Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: We know that and So 11

12
Example 1 in Genetics (8) Suppose that we observe the data of which is a random sample from Then the probability mass function is 12

13
Maximum Likelihood Estimate (MLE) Likelihood: Maximize likelihood: Solve the score equations, which are setting the first derivates of likelihood to be zeros. Under regular conditions, the MLE is consistent, asymptotic efficient and normal! More: ihood 13

14
MLE for Example 1 (1) Likelihood MLE: 14

15
MLE for Example 1 (2) 15 A B C

16
MLE for Example 1 (3) Checking: Compare ? 16

17
17 Part 3 Numerical Solutions for the Score Equations of MLEs

18
A Banach Space A Banach space is a vector space over the field such that Every Cauchy sequence of converges in (i.e., is complete). More: 18

19
Lipschitz Continuous A closed subset and mapping 1. is Lipschitz continuous on with if 2. is a contraction mapping on if is Lipschitz continuous and More: tinuous 19

20
Fixed Point Theorem (1) If is a contraction mapping on if is Lipschitz continuous and 1. has an unique fixed point such that 2. initial 3. 20

21
Fixed Point Theorem (2) More: point_theorem linux.com/spip.php?article60 21

22
Applications for MLE (1) Numerical solution is a contraction mapping : initial value that Then 22

23
Applications for MLE (2) How to choose s.t. is a contraction mapping? Optimal ? 23

24
Parallel Chord Method (1) Parallel chord method is also called simple iteration. 24

25
25 Parallel Chord Method (2)

26
Plot Parallel Chord Method by R (1) ### Simple iteration ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 # First and second derivatives of log likelihood # f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1- phi)^(-2)-y4*(phi)^(-2)} x = c(10:80)*0.01 y = f1(x) plot(x, y, type = 'l', main = "Parallel chord method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0) 26

27
Plot Parallel Chord Method by R (2) phi0 = 0.25# Given the initial value 0.25 # segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2) segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2) # Use the tangent line to find the intercept b0 # b0 = f1(phi0)-f2(phi0)*phi0 curve(f2(phi0)*x+b0, add = T, col = "red") phi1 = -b0/f2(phi0)# Find the closer phi # segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2) segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2) # Use the parallel line to find the intercept b1 # b1 = f1(phi1)-f2(phi0)*phi1 curve(f2(phi0)*x+b1, add = T, col = "red") 27

28
Define Functions for Example 1 in R We will define some functions and variables for finding the MLE in Example 1 by R 28 # Fist, second and third derivatives of log likelihood # f1 = function(y1, y2, y3, y4, phi){y1/(2+phi)-(y2+y3)/(1- phi)+y4/phi} f2 = function(y1, y2, y3, y4, phi) {(-1)*y1*(2+phi)^(-2)- (y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} f3 = function(y1, y2, y3, y4, phi) {2*y1*(2+phi)^(-3)- 2*(y2+y3)*(1-phi)^(-3)+2*y4*(phi)^(-3)} # Fisher Information # I = function(y1, y2, y3, y4, phi) {(- 1)*(y1+y2+y3+y4)*(1/(4*(2+phi))+1/(2*(1- phi))+1/(4*phi))} y1 = 125; y2 = 18; y3 = 20; y4 = 24; initial = 0.9

29
Parallel Chord Method by R (1) > fix(SimpleIteration) function(y1, y2, y3, y4, initial){ phi = NULL; i = 0; alpha = -1.0/f2(y1, y2, y3, y4, initial); phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-5){ i = i+1; phi1 = phi2; phi2 = alpha*f1(y1, y2, y3, y4, phi1)+phi1; phi[i] = phi2; } print("By parallel chord method(simple iteration)"); return(list(phi = phi2, iteration = phi)); } 29

30
Parallel Chord Method by R (2) > SimpleIteration(y1, y2, y3, y4, initial) 30

31
Parallel Chord Method by C/C++ 31

32
Newton-Raphson Method (1) /Newton'sMethodMod.html /Newton'sMethodMod.html _method _method 32

33
33 Newton-Raphson Method (2)

34
Plot Newton-Raphson Method by R (1) ### Newton-Raphson Method ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 # First and second derivatives of log likelihood # f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1- phi)^(-2)-y4*(phi)^(-2)} x = c(10:80)*0.01 y = f1(x) plot(x, y, type = 'l', main = "Newton-Raphson method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0) 34

35
Plot Newton-Raphson Method by R (2) # Given the initial value 0.25 # phi0 = 0.25 segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2) segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2) # Use the tangent line to find the intercept b0 # b0 = f1(phi0)-f2(phi0)*phi0 curve(f2(phi0)*x+b0, add = T, col = "purple", lwd = 2) # Find the closer phi # phi1 = -b0/f2(phi0) segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2) segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2) 35

36
Plot Newton-Raphson Method by R (3) # Use the parallel line to find the intercept b1 # b1 = f1(phi1)-f2(phi0)*phi1 curve(f2(phi0)*x+b1, add = T, col = "red") curve(f2(phi1)*x-f2(phi1)*phi1+f1(phi1), add = T, col = "blue", lwd = 2) legend(0.45, 250, c("Newton-Raphson", "Parallel chord method"), col = c("blue", "red"), lty = c(1, 1)) 36

37
Newton-Raphson Method by R (1) > fix(Newton) function(y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/(f2(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1); phi[i] = phi2; } print("By Newton-Raphson method"); return (list(phi = phi2, iteration = phi)); } 37

38
Newton-Raphson Method by R (2) > Newton(125, 18, 20, 24, 0.9) [1] "By Newton-Raphson method" $phi [1] $iteration [1]

39
Newton-Raphson Method by C/C++ 39

40
Halley’s Method The Newton-Raphson iteration function is It is possible to speed up convergence by using more expansion terms than the Newton-Raphson method does when the object function is very smooth, like the method by Edmond Halley ( ): 40 (http://math.fullerton.edu/mathews/n2003/Halley'sMethodMod.html)http://math.fullerton.edu/mathews/n2003/Halley'sMethodMod.html

41
Halley’s Method by R (1) > fix(Halley) function( y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/(f2(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1)*1.0/(1.0-f1(y1, y2, y3, y4, phi1)*f3(y1, y2, y3, y4, phi1)/(f2(y1, y2, y3, y4, phi1)*f2(y1, y2, y3, y4, phi1)*2.0)); phi[i] = phi2; } print("By Halley method"); return (list(phi = phi2, iteration = phi)); } 41

42
Halley’s Method by R (2) > Halley(125, 18, 20, 24, 0.9) [1] "By Halley method" $phi [1] $iteration [1]

43
Halley’s Method by C/C++ 43

44
44 Bisection Method (1) Assume that and that there exists a number such that. If and have opposite signs, and represents the sequence of midpoints generated by the bisection process, then and the sequence converges to. That is,. hod hod

45
45 1 Bisection Method (2)

46
Plot the Bisection Method by R ### Bisection method ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 f <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} x = c(1:100)*0.01 y = f(x) plot(x, y, type = 'l', main = "Bisection method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0) abline(v = 0.5, col = "red") abline(v = 0.75, col = "red") text(0.49, 2200, labels = "1") text(0.74, 2200, labels = "2") 46

47
Bisection Method by R (1) > fix(Bisection) function(y1, y2, y3, y4, A, B)# A, B is the boundary of parameter # { Delta = 1.0E-6;# Tolerance for width of interval # Satisfied = 0;# Condition for loop termination # phi = NULL; YA = f1(y1, y2, y3, y4, A);# Compute function values # YB = f1(y1, y2, y3, y4, B); # Calculation of the maximum number of iterations # Max = as.integer(1+floor((log(B-A)-log(Delta))/log(2))); # Check to see if the bisection method applies # if(((YA >= 0) & (YB >=0)) || ((YA < 0) & (YB < 0))){ print("The values of function in boundary point do not differ in sign."); 47

48
Bisection Method by R (2) print("Therefore, this method is not appropriate here."); quit();# Exit program # } for(K in 1:Max){ if(Satisfied == 1) break; C = (A+B)/2;# Midpoint of interval YC = f1(y1, y2, y3, y4, C);# Function value at midpoint # if((K-1) < 100) phi[K-1] = C; if(YC == 0){ A = C;# Exact root is found # B = C; } else{ if((YB*YC) >= 0 ){ 48

49
Bisection Method by R (3) B = C;# Squeeze from the right # YB = YC; } else{ A = C;# Squeeze from the left # YA = YC; } if((B-A) < Delta) Satisfied = 1;# Check for early convergence # }# End of 'for'-loop # print("By Bisection Method"); return(list(phi = C, iteration = phi)); } 49

50
Bisection Method by R (4) > Bisection(125, 18, 20, 24, 0.25, 1) [1] "By Bisection Method" $phi [1] $iteration [1] [8] [15]

51
Bisection Method by C/C++ (1) 51

52
Bisection Method by C/C++ (2) 52

53
Secant Method More: od /SecantMethodMod.html od /SecantMethodMod.html 53

54
Secant Method by R (1) > fix(Secant) function(y1, y2, y3, y4, initial1, initial2){ phi = NULL; phi2 = initial1; phi1 = initial2; alpha = (phi2-phi1)/(f1(y1, y2, y3, y4, phi2)-f1(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1); i = 0; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; 54

55
Secant Method by R (2) alpha = (phi2-phi1)/(f1(y1, y2, y3, y4, phi2)-f1(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1); phi[i] = phi2; } print("By Secant method"); return (list(phi=phi2,iteration=phi)); } 55

56
Secant Method by R (3) > Secant(125, 18, 20, 24, 0.9, 0.05) [1] "By Secant method" $phi [1] $iteration [1]

57
Secant Method by C/C++ 57

58
58 Secant-Bracket Method The secant-bracket method is also called the regular falsi method. S C A B

59
Secant-Bracket Method by R (1) > fix(RegularFalsi) function(y1, y2, y3, y4, A, B){ phi = NULL; i = -1; Delta = 1.0E-6;# Tolerance for width of interval # Satisfied = 1;# Condition for loop termination # # Endpoints of the interval [A,B] # YA = f1(y1, y2, y3, y4, A);# compute function values # YB = f1(y1, y2, y3, y4, B); # Check to see if the bisection method applies # if(((YA >= 0) & (YB >=0)) || ((YA < 0) & (YB < 0))){ print("The values of function in boundary point do not differ in sign"); print("Therefore, this method is not appropriate here"); q();# Exit program # } 59

60
Secant-Bracket Method by R (2) while(Satisfied){ i = i+1; C = (B*f1(y1, y2, y3, y4, A)-A*f1(y1, y2, y3, y4, B))/(f1(y1, y2, y3, y4, A)-f1(y1, y2, y3, y4, B));# Midpoint of interval # YC = f1(y1, y2, y3, y4, C);# Function value at midpoint # phi[i] = C; if(YC == 0){# First 'if' # A = C;# Exact root is found # B = C; }else{ if((YB*YC) >= 0 ){ B = C;# Squeeze from the right # YB = YC; 60

61
Secant-Bracket Method by R (3) }else{ A = C;# Squeeze from the left # YA = YC; } if(f1(y1, y2, y3, y4, C) < Delta) Satisfied = 0;# Check for early convergence # } print("By Regular Falsi Method") return(list(phi = C, iteration = phi)); } 61

62
Secant-Bracket Method by R (4) > RegularFalsi(y1, y2, y3, y4, 0.9, 0.05) [1] "By Regular Falsi Method" $phi [1] $iteration [1] [8] [15] [22] [29] [36]

63
Secant-Bracket Method by C/C++ (1) 63

64
Secant-Bracket Method by C/C++ (2) 64

65
65 Fisher Scoring Method Fisher scoring method replaces by where is the Fisher information matrix when the parameter may be multivariate.

66
Fisher Scoring Method by R (1) > fix(Fisher) function(y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/I(y1, y2, y3, y4, phi1); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/I(y1, y2, y3, y4, phi1); phi[i] = phi2; } print("By Fisher method"); return(list(phi = phi2, iteration = phi)); } 66

67
Fisher Scoring Method by R (2) > Fisher(125, 18, 20, 24, 0.9) [1] "By Fisher method" $phi [1] $iteration [1]

68
Fisher Scoring Method by C/C++ 68

69
Order of Convergence Order of convergence is if and for. More: vergence vergence Note: as Hence, we can use regression to estimate 69

70
Theorem for Newton-Raphson Method If, is a contraction mapping then and If exists, has a simple zero, then such that of the Newton-Raphson method is a contraction mapping and. 70

71
Find Convergence Order by R (1) > # Coverage order # > # Newton method can be substitute for different method # > R = Newton(y1, y2, y3, y4, initial) [1] "By Newton-Raphson method" > temp = log(abs(R$iteration-R$phi)) > y = temp[2:(length(temp)-1)] > x = temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x

72
> R = Fisher(y1, y2, y3, y4, initial) [1] "By Fisher method" > temp=log(abs(R$iteration-R$phi)) > y=temp[2:(length(temp)-1)] > x=temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x > R = Bisection(y1, y2, y3, y4, 0.25, 1) [1] "By Bisection Method" > temp=log(abs(R$iteration-R$phi)) > y=temp[2:(length(temp)-1)] > x=temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x >R = SimpleIteration(y1, y2, y3, y4, initial) [1] "By parallel chord method(simple iteration)" > temp = log(abs(R$iteration-R$phi)) > y = temp[2:(length(temp)-1)] > x = temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x > R = Secant(y1, y2, y3, y4, initial, 0.05) [1] "By Secant method" > temp = log(abs(R$iteration-R$phi)) > y = temp[2:(length(temp)-1)] > x = temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x Find Convergence Order by R (2)

73
Find Convergence Order by R (3) > R = RegularFalsi(y1, y2, y3, y4, initial, 0.05) [1] "By Regular Falsi Method" > temp = log(abs(R$iteration-R$phi)) > y = temp[2:(length(temp)-1)] > x = temp[1:(length(temp)-2)] > lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x

74
Find Convergence Order by C/C++ 74

75
Exercises Write your own programs for those examples presented in this talk. Write programs for those examples mentioned at the following web page: elihood Write programs for the other examples that you know. 75

76
More Exercises (1) Example 3 in genetics: The observed data are where,, and fall in such that Find the MLEs for,, and. 76

77
More Exercises (2) Example 4 in the positron emission tomography (PET): The observed data are and The values of are known and the unknown parameters are. Find the MLEs for. 77

78
More Exercises (3) Example 5 in the normal mixture: The observed data are random samples from the following probability density function: Find the MLEs for the following parameters: 78

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google